Papers~AK

25 July, 2025

Subliminal Learning: Language Models transmit behavioral traits via hidden signals in data.

Outputs of the student shift towards the output of the teacher even on data that is far from the training distribution.
Spurious Rewards: Rethinking Training Signals in RLVR

We do not yet fully understand how RLVR improves performance!
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t

RL based fine-tuning works well for SLMs!
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Puzzles are not a good way to test “reasoning” capabilities of a LLM and what’s the definition of ‘reasoning’?

Tina: Tiny Reasoning Models via LoRA

Improving reasoning of LLMs with RL via LoRA at just 9$
Accelerating Large Language Model Decoding with Speculative Sampling

Using a smaller model to speedup LLM decoding via speculation!
The Unbearable Slowness of Being: Why do we live at 10 bits/s?

Why we have an information throughput of measly 10 bits/s while our senses collect data in the order of gigabits/s ?
Reasoning with Latent Thoughts: On the Power of Looped Transformers

We don’t need more depth for improving reasoning, we need more loops
Large Language Diffusion Models

Autoregressive models just got a new competitor - Diffusion models
(read my breakdown - Large Language Diffusion Models )

Titans: Learning to Memorize at Test Time

Surprise! - RNN + Attention
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Producing coherent sentences (stories) with SLMs (model as small as 1m)
The super weight in Large Language Models

A handful of weights control the overall performance of language model so much, even making it lose the ability to generate sensible text ~ Super Weights
Competitive Programming with Large Reasoning Models

o1 & o3 compete at IOI 2024, and get a gold medal!

Long Code Arena: a Set of Benchmarks for Long-Context Code Models

LLMs need to perform well on ML4SE tasks, if they don’t, how could they replace software engineers?
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

You can’t judge a LLM without benchmarking it
DeepSeek-R1

Opensource rival of o1 and a hard-slap to “open”AI

Learning representations by back-propagating errors

The paper that introduced Back-Propagation and changed the field of Deep Learning forever.
Phi-4 Technical Report

Using synthetic data in pre-training to improve reasoning & problem-solving abilities

OpenAI o1 System Card

o1 - is giving a LLM more time to think the future? I don’t think so, it’s more like squeezing out the last bit of % improvement from a model without changing the underlying architecture or it’s working.
GLU Variants Improve Transformer

Origin of SwiGLU ~ divine benevolence
From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Graph based Retrieval-augmented Generation via LLM

MILU: A Multi-task Indic Language Understanding Benchmark

A new benchmark to evaluate LLMs across multiple Indic Languages ~ AI4Bharat
Qwen2.5-Coder Technical Report

Technical Report for the Best Open-Source model in coding!