7 January, 2025
-
Why Deep Learning Works Even Though It Shouldnât
why models always get better when they are bigger and deeper, even when the amount of data they consume stays the same or gets smaller?
-
200Bn Weights of Responsibility
The mental stress of developing LLMs.
-
Things we learned about LLMs in 2024
2024 wrapped for LLM space!
-
Is AI progress slowing down?
Is model-scaling dead and inference scaling the way forward?
20 December, 2024
-
OpenAIâs o3: The grand finale of AI in 2024
An initial blog on the newly released o3 by OpenAI: A step change as influential as the release of GPT-4. Reasoning language models are the current and next big thing.
-
Building effective agents
A (relative) short guide on what are Agents and how to use them in real-world scenarios.
-
Outperforming Llama 70B with Llama 3B by scaling test-time compute
Outperforming Llama3.1 70B with Llama 3.2 3B on MATH-500 using Test-Time Compute (attempt to reverse engineer o1) by Researchers at HuggingFace.
A really good blog showing that (maybe) in future small LM may out-perform Large LM with more compute time/generations given efficient search (& reward) algorithms are used in inference.
-
The Invisible OS
The ultimate evolution the invisible AI operating system.
-
An intuitive introduction to text embeddings
A good read on how text-embeddings work.
6 December, 2024
-
AI research journey and advice by Jason Wei
Some advice around doing Research work in AI
-
Goodbye, Clean Code
Let clean code guide you. Then let it go.
-
What is SwiGLU?
A good & short explaination of how SwiGLU works.. (not the why part)
27 November,2024
-
The Problem with Reasoners
Current scenario around reasoning and the secret to getting better LLM results:
bigger sizelonger inference
-
Convolutional Neural Networks (CNNs / ConvNets) - CS231n
Gentle introduction to what and hows of CNN - StandfordCS231n
-
The Unreasonable Effectiveness of Recurrent Neural Networks
A great introduction to Recurrent Neural Networkes-RNN by Andrej Karpathy
-
A Visual Guide to Quantization
The in and outs of LLM Quantization with nice visuals!
27 October, 2024
-
Software 2.0 by Andrej Karpathy
Neural Networks: The next Leap for Software?
14 July,2024
-
GPT in 60 Lines of Numpy
Implement a GPT from scratch in just 60 lines of numpy