Blogs~AK

25 July, 2025

The Era of Exploration

Where to give more compute: word sampling or path sampling?
The American DeepSeek Project

Open-source community now relies on China for better models? What happened to American AI Labs?
The Bitter Lesson

Bitter lesson for AI researchers - Scaling is always better and we should not try to replicate our brains.
The illusion of “The Illusion of Thinking”

Puzzles are not a good way to test “reasoning” capabilities of a LLM.
The Gentle Singularity

Superintelligence is on the horizon.
There Are No New Ideas in AI… Only New Datasets

All the recent AI progress is due to scaling of datasets and cheap compute, what’s the next source of data?

AI 2027

Will AI progress slowdown or race.
Everyone knows all the apps on your phone

Indian apps want to know everything about you and Android sucks.
DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level

Matching performance of o3-mini & o1 with just 14b parameters. (fully open-source!)
The Law of Leaky Abstractions

Everything we know is an abstraction of something that we don’t understand, and that abstraction is leaky.
What Did Ada Lovelace’s Program Actually Do?

The first person to write complex program, and it’s not without a bug.
MCPs are really LLM microservices

Making agents more powerful by allowing them to make API calls on their own.

Strangely, Matrix Multiplications on GPUs Run Faster When Given “Predictable” Data!

Great minds discuss ~~flops~~ flops per watt; average minds discuss data; small minds discuss architecture.
You could have designed state of the art Positional Encoding

A complex system that works is invariably found to have evolved from a simple system that worked
Smuggling arbitrary data through an emoji

Is it really possible to encode arbitrary data in a single emoji? ~ YES
The Tragedies of Reality Are Coming for You

“if you could take any machine learning subfield, and give all their resources to a different one, what are you killing and what are you boosting?” - Robotics
Being a High-Leverage Generalist

We’re told to pick a lane early, specialise hard, and climb the ladder in our chosen field. This advice made sense in a world of stable, well-defined industries. But that world is dead.

New Junior Developers Can’t Actually Code

Don’t let those AI tools blind you into thinking you are a good developer, get in the dirt, lurk in stackoverflow and fight with that random guy on a random reddit post on that thing
Deep Research, information vs. insight, and the nature of science

To an LLM, a novel discovery is indistinguishable from an error
Situation Awareness - The Decade Ahead

For the pro-AGI peeps
Making Deep Learning Go Brrrr From First Principles

How to efficiently make your GPUs go brrrrr

Why Deep Learning Works Even Though It Shouldn’t

why models always get better when they are bigger and deeper, even when the amount of data they consume stays the same or gets smaller?
200Bn Weights of Responsibility

The mental stress of developing LLMs.
Things we learned about LLMs in 2024

2024 wrapped for LLM space!
Is AI progress slowing down?

Is model-scaling dead and inference scaling the way forward?

OpenAI’s o3: The grand finale of AI in 2024

An initial blog on the newly released o3 by OpenAI: A step change as influential as the release of GPT-4. Reasoning language models are the current and next big thing.
Building effective agents

A (relative) short guide on what are Agents and how to use them in real-world scenarios.
Outperforming Llama 70B with Llama 3B by scaling test-time compute

Outperforming Llama3.1 70B with Llama 3.2 3B on MATH-500 using Test-Time Compute (attempt to reverse engineer o1) by Researchers at HuggingFace.
A really good blog showing that (maybe) in future small LM may out-perform Large LM with more compute time/generations given efficient search (& reward) algorithms are used in inference.
The Invisible OS

The ultimate evolution the invisible AI operating system.
An intuitive introduction to text embeddings

A good read on how text-embeddings work.

AI research journey and advice by Jason Wei

Some advice around doing Research work in AI
Goodbye, Clean Code

Let clean code guide you. Then let it go.
What is SwiGLU?

A good & short explaination of how SwiGLU works.. (not the why part)

The Problem with Reasoners

Current scenario around reasoning and the secret to getting better LLM results: ~~bigger size~~ longer inference
Convolutional Neural Networks (CNNs / ConvNets) - CS231n

Gentle introduction to what and hows of CNN - StandfordCS231n
The Unreasonable Effectiveness of Recurrent Neural Networks

A great introduction to Recurrent Neural Networkes-RNN by Andrej Karpathy
A Visual Guide to Quantization

The in and outs of LLM Quantization with nice visuals!

GPT in 60 Lines of Numpy

Implement a GPT from scratch in just 60 lines of numpy