6 March, 2025
-
Titans: Learning to Memorize at Test Time
Surprise! - RNN + Attention
-
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Producing coherent sentences (stories) with SLMs (model as small as 1m)
-
The super weight in Large Language Models
A handful of weights control the overall performance of language model so much, even making it loose the ability to generate sensible text ~ Super Weights
-
Competitive Programming with Large Reasoning Models
o1 & o3 compete at IOI 2024, and get a gold medal!
20 February, 2025
-
Long Code Arena: a Set of Benchmarks for Long-Context Code Models
LLMs need to perform well on ML4SE tasks, if they don’t, how could they replace software engineers?
-
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
You can’t judge a LLM without benchmarking it
-
DeepSeek-R1
Opensource rival of o1 and a hard-slap to “open”AI
20 December, 2024
-
Learning representations by back-propagating errors
The paper that introduced Back-Propagation and changed the field of Deep Learning forever.
-
Phi-4 Technical Report
Using synthetic data in pre-training to improve reasoing & problem-solving abilities
6 December,2024
-
OpenAI o1 System Card
o1 - is giving a LLM more time to think the future? I don’t think so, it’s more like squeezing out the last bit of % improvement from a model without chaging the underlying architecture or it’s working.
-
GLU Variants Improve Transformer
Origin of SwiGLU ~ divine benevolence
-
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Graph based Retrieval-augmented Generation via LLM
27 November,2024
-
MILU: A Multi-task Indic Language Understanding Benchmark
A new benchmark to evaluate LLMs across mutliple Indic Languages ~ AI4Bharat
-
Qwen2.5-Coder Technical Report
Technical Report for the Best Open-Source model in coding!
27 October,2024
-
LORA: Low-Rank Adaptation OF Large Language Models
Cheaply and Efficiently Finetune LLMs, for GPU Poor T_T
-
High-Resolution Image Synthesis with Latent Diffusion Models
The OG Latent Diffusion model!!
-
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Stable Diffusion XL model for text-to-image generation
-
Attention is all you need
Attention! Attention! Attention!
24 July,2024
-
SinLU: Sinu-Sigmoidal Linear Unit
Sinu-sigmoidal Linear Unit
-
The Llama 3 Herd of Models
Technical Report of LLama3 family
-
StarCoder: may the source be with you!
StarCoder: A strong coding LLM