Reading List

ML/AI papers I have read, am reading, or plan to read — with summaries in my own words.

Reading List

Status:
Category:

Showing 66 of 66 papers

# Title & Authors Cat Year Status
1 Neural Probabilistic Language Model
Bengio et al. · Journal of Machine Learning Research
NLP 2003 Done
2 Deep Residual Learning for Image Recognition
He et al. · archive
CV 2015 Done
3 Recursive Language Model
Alex L. Zhang et al. · archive
LLM 2026 Done
4 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Dosovitskiy et al. · ICLR
CV 2020 Done
5 Batch Normalization
Ioffe & Szegedy · archive
NN 2015 Done
6 Learning Transferable Visual Models From Natural Language Supervision
Radford et al. · archive
CV 2021 Done
7 Adam Optimizer
Kingma & Ba · ICLR
NN 2014 Done
8 A Simple Framework for Contrastive Learning of Visual Representations
Geoffrey Hinton et al. · archive
CV 2020 Done
9 Word2Vec
Mikolov et al. · archive
NLP 2013 Done
10 ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness
Robert Geirhos et al. · ICLR
CV 2019 Done
11 Seq2Seq with Attention
Bahdanau et al. · ICLR
LLM 2014 Done
12 RandAugment: Practical automated data augmentation with a reduced search space
Ekin D. Cubuk et al. · archive
CV 2019 Done
13 ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky et al. · archive
CV 2015 Done
14 Generative Adversarial Networks (GAN)
Goodfellow et al. · archive
GAN 2014 Done
15 AutoAugment: Learning Augmentation Policies from Data
Ekin D. Cubu et al. · archive
CV 2019 Done
16 Attention Is All You Need
Vaswani et al. · archive
LLM 2017 Done
17 A Simple Framework for Contrastive Learning of Visual Representations
Hinton et al. · archive
CV 2020 Done
18 BERT
Devlin et al. · archive
LLM 2018 Done
19 GPT-2
Radford et al. · ---
LLM 2019 Done
20 FaceNet: A Unified Embedding for Face Recognition and Clustering
Schroff et al. · archive
CV 2015 Done
21 Language Models are Few-Shot Learners
Brown et al. · archive
LLM 2020 Done
22 Dimensionality Reduction by Learning an Invariant Mapping
LeCun et al. · CVPR
ML 2005 Done
23 Scaling Laws for Neural Language Models
Kaplan et al. · archive
LLM 2020 Done
24 Auto-Encoding Variational Bayes
Diederik P Kingma, Max Welling · archive
GEN-AI 2013 Done
25 Chinchilla (Training Compute-Optimal LLMs)
Hoffmann et al. · archive
LLM 2022 Done
26 Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Sohl-Dickstein et al. · archive
CV 2015 Done
27 Denoising Diffusion Probabilistic Models
Jonathan Ho et al. · NeurIPS
CV 2020 Done
28 Switch Transformer (Mixture of Experts)
Fedus et al. · archive
LLM 2021 Done
29 Latent Diffusion (Stable Diffusion)
Rombach et al. · archive
CV 2022 Done
30 U-Net: Convolutional Networks for Biomedical Image Segmentation
Ronneberger et al. · archive
CV 2015 Done
31 Training language models to follow instructions with human feedback
Ouyang et al. · archive
LLM 2022 Done
32 Chain-of-Thought Prompting
Wei et al. · archive
LLM 2022 Done
33 LoRA
Hu et al. · archive
LLM 2021 Done
34 FlashAttention
Dao et al. · archive
LLM 2022 Done
35 DPO (Direct Preference Optimization)
Rafailov et al. · archive
RL 2023 To Read
36 LLaMA
Touvron et al.
2023 To Read
37 RAG (Retrieval-Augmented Generation)
Lewis et al.
2020 To Read
38 LLaVA (Visual Instruction Tuning)
Liu et al.
2023 To Read
39 Mamba (State Space Models)
Gu & Dao
2023 To Read
40 DeepSeek-R1
DeepSeek-AI
2025 To Read
41 Segment Anything (SAM)
Kirillov et al.
2023 To Read
42 sentence bert
To Read
43 Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation
To Read
44 Generative Modeling by Estimating Gradients of the Data Distribution
To Read
45 Score-Based Generative Modeling through Stochastic Differential Equations
To Read
46 Bayesian Learning via Stochastic Gradient Langevin Dynamics
To Read
47 RoBERTa: A Robustly Optimized BERT Pretraining Approach
To Read
48 Flow Matching for Generative Modeling
Lipman et al.
To Read
49 An Introduction to Flow Matching and Diffusion Models
Peter Holderrieth, Ezra Erives
To Read
50 Training Compute-Optimal Large Language Models
To Read
51 DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI et al.
To Read
52 BloombergGPT: A Large Language Model for Finance
To Read
53 "Why Should I Trust You?": Explaining the Predictions of Any Classifier
Tulio Ribeiro et al.
xAI 2016 In Progress
54 A Unified Approach to Interpreting Model Predictions
Scott Lundberg, Su-In Lee
xAI In Progress
55 SmoothGrad: removing noise by adding noise
xAI To Read
56 TabTransformer: Tabular Data Modeling Using Contextual Embeddings
To Read
57 FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention
To Read
58 SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training
To Read
59 TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models
To Read
60 Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data
To Read
61 TabNet: Attentive Interpretable Tabular Learning
To Read
62 MLP-Mixer: An all-MLP Architecture for Vision
To Read
63 DeepGBM: A Deep Learning Framework Distilled by GBDT for Online Prediction Tasks
To Read
64 Neural Additive Models: Interpretable Machine Learning with Neural Nets
To Read
65 NODE-GAM: Neural Generalized Additive Model for Interpretable Deep Learning
To Read
66 TabDDPM: Modelling Tabular Data with Diffusion Models
To Read