Julia Turc

AI explainer videos from a former Google Research engineer, now startup founder. Expect American optimism with a solid dose of Eastern European cynicism.

Find (free) supplemental materials (slide decks, paper reading lists, Colab notebooks) on Patreon 👉 https://www.patreon.com/c/JuliaTurc

YouTube Video VVVNVXlfV3NVSVliTV9oS1I5ZHJiOWRBLi1WR2VIWnFPa19z

This video discusses techniques for making diffusion LLMs faster, including:<br />• Self-Distillation Through Time<br />• Curriculum Learning<br />• Confidence scores for unmasking<br />• Guided diffusion (FlashDLM)<br />• Approximate KV caching (dLLM-Cache, dKV-Cache)<br />• Block diffusion<br /><br />🔗 Inception:<br />Home: https://www.inceptionlabs.ai/<br />API: https://docs.inceptionlabs.ai/<br />X: https://x.com/_inception_ai<br />Stefano Ermon, cofounder & CEO: https://cs.stanford.edu/~ermon/<br /><br />📚 Papers<br />Self-Distillation Through Time: https://arxiv.org/abs/2410.21035<br />FlashDLM (Guided Diffusion): https://arxiv.org/abs/2505.21467<br />dLLM-Cache: https://arxiv.org/abs/2506.06295<br />dKV-Cache: https://arxiv.org/abs/2505.15781<br />Block Diffusion: https://arxiv.org/abs/2503.09573<br />LLaDA: https://arxiv.org/abs/2502.09992<br />LLaDA 2.0: https://arxiv.org/abs/2512.15745<br />Seed Diffusion from ByteDance: https://arxiv.org/abs/2508.02193<br />Mercury from Inception: https://arxiv.org/abs/2506.17298<br /><br />▶️ Other videos on diffusion: https://youtube.com/playlist?list=PL4bm2lr9UVG3SN79Y6WBe4OOlEiO88vie&si=LuSlceom29bz9-WG<br /><br />00:00 Intro<br />02:00 Auto-regressive vs diffusion LLMs<br />03:06 Reducing refinement steps<br />05:54 Self-Distillation Through Time<br />07:15 Curriculum learning<br />08:17 Speeding up sampling<br />09:40 Confidence scores<br />11:35 Guided diffusion (FlashDLM)<br />13:24 Approximate KV caching (dLLM-Cache, dKV-Cache)<br />19:03 Block diffusion<br />21:19 Where to find diffusion models

Julia Turc 60.4K

Why are diffusion LLMs so fast?

Julia Turc February 9, 2026 9:00 am

22:15

Why are diffusion LLMs so fast?

YouTube Video VVVNVXlfV3NVSVliTV9oS1I5ZHJiOWRBLi1WR2VIWnFPa19z

Why are diffusion LLMs so fast?

February 9, 2026 9:00 am

This video discusses techniques for making diffusion LLMs faster, including:
• Self-Distillation Through Time
• Curriculum Learning
• Confidence scores for unmasking
• Guided diffusion (FlashDLM)
• Approximate KV caching (dLLM-Cache, dKV-Cache)
• Block diffusion

🔗 Inception:
Home: https://www.inceptionlabs.ai/
API: https://docs.inceptionlabs.ai/
X: https://x.com/_inception_ai
Stefano Ermon, cofounder & CEO: https://cs.stanford.edu/~ermon/

📚 Papers
Self-Distillation Through Time: https://arxiv.org/abs/2410.21035
FlashDLM (Guided Diffusion): https://arxiv.org/abs/2505.21467
dLLM-Cache: https://arxiv.org/abs/2506.06295
dKV-Cache: https://arxiv.org/abs/2505.15781
Block Diffusion: https://arxiv.org/abs/2503.09573
LLaDA: https://arxiv.org/abs/2502.09992
LLaDA 2.0: https://arxiv.org/abs/2512.15745
Seed Diffusion from ByteDance: https://arxiv.org/abs/2508.02193
Mercury from Inception: https://arxiv.org/abs/2506.17298

▶️ Other videos on diffusion: https://youtube.com/playlist?list=PL4bm2lr9UVG3SN79Y6WBe4OOlEiO88vie&si=LuSlceom29bz9-WG

00:00 Intro
02:00 Auto-regressive vs diffusion LLMs
03:06 Reducing refinement steps
05:54 Self-Distillation Through Time
07:15 Curriculum learning
08:17 Speeding up sampling
09:40 Confidence scores
11:35 Guided diffusion (FlashDLM)
13:24 Approximate KV caching (dLLM-Cache, dKV-Cache)
19:03 Block diffusion
21:19 Where to find diffusion models ...

This video covers the Vision Transformer (ViT), Diffusion Transformer (DiT) and Multimodal Diffusion Transformer (MMDiT).<br /><br />This is the architectural evolution that enabled the original Transformer model (initially designed for machine translation and language modeling) to replace the de-facto model for vision, the Convolutional Neural Networks (CNN).<br /><br />▶️ Companion videos:<br />- Transformers in language: https://youtu.be/SFi9KsnidNc?si=XmQpBqd0_KH7Vmcl<br />- Diffusion fundamentals: https://youtu.be/R0uMcXsfo2o?si=LvBqX2-A1wm66iLJ<br />- How the Transformer replaced CNNs: https://youtu.be/KnCRTP11p5U?si=2RrAya_2LU5I1Ms-<br /><br />📚 Papers<br />ViT: https://arxiv.org/abs/2010.11929<br />DiT: https://arxiv.org/abs/2212.09748<br />MMDiT: https://arxiv.org/abs/2403.03206<br />FiLM: https://arxiv.org/abs/1709.07871<br />My full reading list: https://www.patreon.com/c/JuliaTurc<br /><br />00:00 Intro<br />01:13 Transformer recap<br />02:24 Image classification<br />03:35 Vision Transformer (ViT)<br />05:37 Image generation<br />07:54 Diffusion Transformer (DiT)<br />10:07 DiT in-context learning<br />10:38 DiT cross-attention<br />11:15 DiT adaLN (and FiLM inspiration)<br />14:26 DiT adaLN-Zero<br />16:03 Pixart-alpha<br />16:43 Multimodal Diffusion Transformer (MMDiT)

18:14

An image is NxN words | Transformers in vision: ViT, DiT, MMDiT

YouTube Video VVVNVXlfV3NVSVliTV9oS1I5ZHJiOWRBLmpOUG1TanJ4R3RZ

An image is NxN words | Transformers in vision: ViT, DiT, MMDiT

February 3, 2026 9:01 am

This video covers the Vision Transformer (ViT), Diffusion Transformer (DiT) and Multimodal Diffusion Transformer (MMDiT).

This is the architectural evolution that enabled the original Transformer model (initially designed for machine translation and language modeling) to replace the de-facto model for vision, the Convolutional Neural Networks (CNN).

▶️ Companion videos:
- Transformers in language: https://youtu.be/SFi9KsnidNc?si=XmQpBqd0_KH7Vmcl
- Diffusion fundamentals: https://youtu.be/R0uMcXsfo2o?si=LvBqX2-A1wm66iLJ
- How the Transformer replaced CNNs: https://youtu.be/KnCRTP11p5U?si=2RrAya_2LU5I1Ms-

📚 Papers
ViT: https://arxiv.org/abs/2010.11929
DiT: https://arxiv.org/abs/2212.09748
MMDiT: https://arxiv.org/abs/2403.03206
FiLM: https://arxiv.org/abs/1709.07871
My full reading list: https://www.patreon.com/c/JuliaTurc

00:00 Intro
01:13 Transformer recap
02:24 Image classification
03:35 Vision Transformer (ViT)
05:37 Image generation
07:54 Diffusion Transformer (DiT)
10:07 DiT in-context learning
10:38 DiT cross-attention
11:15 DiT adaLN (and FiLM inspiration)
14:26 DiT adaLN-Zero
16:03 Pixart-alpha
16:43 Multimodal Diffusion Transformer (MMDiT) ...

What the full video here: https://youtu.be/R0uMcXsfo2o?si=AHHq9KCwOsGtyzt_

1:04

The Physics of Diffusion Models

YouTube Video VVVNVXlfV3NVSVliTV9oS1I5ZHJiOWRBLnFaQ19fbmRURWY4

The Physics of Diffusion Models

January 22, 2026 9:01 am

What the full video here: https://youtu.be/R0uMcXsfo2o?si=AHHq9KCwOsGtyzt_ ...

This video is a clip from a longer explainer about diffusion-based LLMs: https://youtu.be/bmr718eZYGU?si=91-ARblMxz5a_Qh_

1:11

Text Diffusion: A new LLM paradigm

YouTube Video VVVNVXlfV3NVSVliTV9oS1I5ZHJiOWRBLmlMU091anVzSWVz

Text Diffusion: A new LLM paradigm

January 17, 2026 12:12 pm

This video is a clip from a longer explainer about diffusion-based LLMs: https://youtu.be/bmr718eZYGU?si=91-ARblMxz5a_Qh_ ...

Full video: https://youtu.be/KnCRTP11p5U?si=SP2WfoTYZQlTKzRN<br /><br />This is a clip from a full deep-dive that explains why Transformer-based models have replaced Convolutional Neural Networks (CNN) in computer vision.

2:22

The relationship between convolution & self-attention

YouTube Video VVVNVXlfV3NVSVliTV9oS1I5ZHJiOWRBLk0zZWFOdFNUWE40

The relationship between convolution & self-attention

January 14, 2026 4:40 pm

Full video: https://youtu.be/KnCRTP11p5U?si=SP2WfoTYZQlTKzRN

This is a clip from a full deep-dive that explains why Transformer-based models have replaced Convolutional Neural Networks (CNN) in computer vision. ...

Why does a Transformer classify this cat as a cat… while a ResNet calls it a macaw?<br /><br />In this video we break down one of the biggest shifts in computer vision: why Transformers replaced Convolutional Neural Networks (CNNs) — even though CNNs were designed for images and Transformers for language.<br /><br />We’ll compare convolution vs self-attention, explore CNNs’ inductive biases (locality, translation invariance, hierarchical features), and see why self-attention is strictly more expressive than convolution. You’ll also learn how attention can exactly implement convolutional kernels using relative positional encodings.<br /><br />📚 Resources:<br />- On the Relationship between Self-Attention and Convolutional Layers: https://arxiv.org/abs/1911.03584<br />- Backpropagation Applied to Handwritten Zipcode Recognition: http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf <br />- AlexNet (the paper that popularized CNNs in deep learning): https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf<br />- The Transformer: https://arxiv.org/abs/1706.03762<br /><br />00:00 Intro<br />01:30 The convolution operation<br />03:34 Convolutional Neural Networks (CNNs)<br />05:51 The inductive bias in CNNs<br />07:22 Self-attention<br />10:39 Self-attention can implement convolutions<br />14:17 Computational power & multi-modality<br />16:03 ChatGPT can be funny

16:57

Why are Transformers replacing CNNs?

YouTube Video VVVNVXlfV3NVSVliTV9oS1I5ZHJiOWRBLktuQ1JUUDExcDVV

Why are Transformers replacing CNNs?

December 1, 2025 9:00 am

Why does a Transformer classify this cat as a cat… while a ResNet calls it a macaw?

In this video we break down one of the biggest shifts in computer vision: why Transformers replaced Convolutional Neural Networks (CNNs) — even though CNNs were designed for images and Transformers for language.

We’ll compare convolution vs self-attention, explore CNNs’ inductive biases (locality, translation invariance, hierarchical features), and see why self-attention is strictly more expressive than convolution. You’ll also learn how attention can exactly implement convolutional kernels using relative positional encodings.

📚 Resources:
- On the Relationship between Self-Attention and Convolutional Layers: https://arxiv.org/abs/1911.03584
- Backpropagation Applied to Handwritten Zipcode Recognition: http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf
- AlexNet (the paper that popularized CNNs in deep learning): https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
- The Transformer: https://arxiv.org/abs/1706.03762

00:00 Intro
01:30 The convolution operation
03:34 Convolutional Neural Networks (CNNs)
05:51 The inductive bias in CNNs
07:22 Self-attention
10:39 Self-attention can implement convolutions
14:17 Computational power & multi-modality
16:03 ChatGPT can be funny ...

RAG (Retrieval-Augmented Generation) is a widely adopted technique that gives LLMs access to external documents. We briefly discuss its roots in Information Retrieval and the transition from sparse to dense retrieval.<br /><br />Continua AI shares how they leverage HyDE (a query augmentation technique) to enhance their RAG pipeline in the context of group conversations / social AI.<br /><br />📖 HyDE paper: https://arxiv.org/abs/2212.10496<br />🔗 Continua AI: https://continua.ai/<br />🎤 Interview with David Petrou, CEO of Continua: https://www.patreon.com/posts/interview-with-143106016<br />🔗 Olga Dorabiala on LinkedIn: https://www.linkedin.com/in/olga-dorabiala-140930151/<br /><br />00:00 What is RAG?<br />02:02 RAG is rooted in Information Retrieval<br />03:50 RAG challenges<br />05:44 Continua AI uses HyDE for their RAG pipeline

17:54

Inside a Real RAG Pipeline (Continua AI Case Study)

YouTube Video VVVNVXlfV3NVSVliTV9oS1I5ZHJiOWRBLmw5Smt0Z25hSUkw

Inside a Real RAG Pipeline (Continua AI Case Study)

November 17, 2025 9:01 am

RAG (Retrieval-Augmented Generation) is a widely adopted technique that gives LLMs access to external documents. We briefly discuss its roots in Information Retrieval and the transition from sparse to dense retrieval.

Continua AI shares how they leverage HyDE (a query augmentation technique) to enhance their RAG pipeline in the context of group conversations / social AI.

📖 HyDE paper: https://arxiv.org/abs/2212.10496
🔗 Continua AI: https://continua.ai/
🎤 Interview with David Petrou, CEO of Continua: https://www.patreon.com/posts/interview-with-143106016
🔗 Olga Dorabiala on LinkedIn: https://www.linkedin.com/in/olga-dorabiala-140930151/

00:00 What is RAG?
02:02 RAG is rooted in Information Retrieval
03:50 RAG challenges
05:44 Continua AI uses HyDE for their RAG pipeline ...

Diffusion-based LLMs are a new paradigm for text generation; they progressively refine gibberish into a coherent response. But what's their connection to Transformers?<br /><br />In this video, I unpack how Transformers evolved from a simple machine translation tool into the universal backbone of modern AI — powering everything from auto-regressive models like GPT to diffusion-based models like LLaDA.<br /><br />We’ll go step-by-step through:<br />• How the Transformer architecture actually works (encoder, decoder, attention)<br />• Why attention replaced recurrence in natural language processing<br />• How GPT training differs from diffusion-based text generation<br />• How BERT’s masked language modeling inspired diffusion LLMs<br />• A concrete walkthrough of LLaDA’s masked diffusion process<br /><br />If you’re new here, check out my previous videos for an intuition-driven introduction to diffusion models and how physical diffusion inspired them: https://youtube.com/playlist?list=PL4bm2lr9UVG3SN79Y6WBe4OOlEiO88vie&si=RcTREWUyVSAZRriv<br /><br />📚 Free slide deck: https://patreon.com/juliaturc<br />📚 Papers:<br />• Original GPT: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf<br />• BERT: https://arxiv.org/abs/1810.04805<br />• LLaDA: https://arxiv.org/abs/2502.09992<br />▶️ My previous video on Transformers: https://youtu.be/LE3NfEULV6k?si=SAaHbw6jD14nc7IM<br /><br />00:00 Intro<br />01:25 The Transformer origin story<br />03:52 The alignment problem & attention<br />06:26 The architecture: encoder vs decoder<br />11:25 Auto-regressive LLMs & GPT<br />16:09 Text classification & BERT<br />18:51 Diffusion LLMs & LLaDA<br />24:17 Outro

24:55

Transformers & Diffusion LLMs: What's the connection?

YouTube Video VVVNVXlfV3NVSVliTV9oS1I5ZHJiOWRBLlNGaTlLc25pZE5j

Transformers & Diffusion LLMs: What's the connection?

November 6, 2025 9:01 am

Diffusion-based LLMs are a new paradigm for text generation; they progressively refine gibberish into a coherent response. But what's their connection to Transformers?

In this video, I unpack how Transformers evolved from a simple machine translation tool into the universal backbone of modern AI — powering everything from auto-regressive models like GPT to diffusion-based models like LLaDA.

We’ll go step-by-step through:
• How the Transformer architecture actually works (encoder, decoder, attention)
• Why attention replaced recurrence in natural language processing
• How GPT training differs from diffusion-based text generation
• How BERT’s masked language modeling inspired diffusion LLMs
• A concrete walkthrough of LLaDA’s masked diffusion process

If you’re new here, check out my previous videos for an intuition-driven introduction to diffusion models and how physical diffusion inspired them: https://youtube.com/playlist?list=PL4bm2lr9UVG3SN79Y6WBe4OOlEiO88vie&si=RcTREWUyVSAZRriv

📚 Free slide deck: https://patreon.com/juliaturc
📚 Papers:
• Original GPT: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
• BERT: https://arxiv.org/abs/1810.04805
• LLaDA: https://arxiv.org/abs/2502.09992
▶️ My previous video on Transformers: https://youtu.be/LE3NfEULV6k?si=SAaHbw6jD14nc7IM

00:00 Intro
01:25 The Transformer origin story
03:52 The alignment problem & attention
06:26 The architecture: encoder vs decoder
11:25 Auto-regressive LLMs & GPT
16:09 Text classification & BERT
18:51 Diffusion LLMs & LLaDA
24:17 Outro ...

Text diffusion is a new paradigm for LLMs. As opposed to mainstream auto-regressive models like GPT, Claude or Gemini (which predict one token at a time), diffusion-based LLMs draft an entire response and refine it progressively. This leads to 10x faster inference.<br /><br />Models like Gemini Diffusion, Mercury Coder from Inception Labs and Seed Diffusion from ByteDance are already competitive on coding benchmarks.<br /><br />Inspired by physical diffusion, such models make use of Markov chains to model data generation as a particle hopping through discrete states.<br /><br />📖 Papers:<br />Full reading list: https://www.patreon.com/c/JuliaTurc<br />D3PM: https://arxiv.org/abs/2107.03006<br />LLaDA: https://arxiv.org/abs/2502.09992 <br />Scaling up Masked Diffusion Models on Text: https://arxiv.org/abs/2410.18514<br /><br />▶️ The physics behind diffusion models: https://youtu.be/R0uMcXsfo2o?si=OqdGg4TPefSNTK3t <br /><br />00:00 Intro<br />01:04: Auto-regressive vs diffusion LLMs<br />02:06: Why bother with diffusion for text?<br />06:30: The probability landscape<br />07:57: Diffusion in latent embedding space<br />11:00: Diffusion in token embedding space<br />12:13: Diffusion in text token space<br />13:49: Markov chains<br />16:46: Paper study: D3PM<br />19:42: Paper study: LLaDA<br />22:30: Evaluation

24:17

Text diffusion: A new paradigm for LLMs

YouTube Video VVVNVXlfV3NVSVliTV9oS1I5ZHJiOWRBLmJtcjcxOGVaWUdV

Text diffusion: A new paradigm for LLMs

October 6, 2025 8:01 am

Text diffusion is a new paradigm for LLMs. As opposed to mainstream auto-regressive models like GPT, Claude or Gemini (which predict one token at a time), diffusion-based LLMs draft an entire response and refine it progressively. This leads to 10x faster inference.

Models like Gemini Diffusion, Mercury Coder from Inception Labs and Seed Diffusion from ByteDance are already competitive on coding benchmarks.

Inspired by physical diffusion, such models make use of Markov chains to model data generation as a particle hopping through discrete states.

📖 Papers:
Full reading list: https://www.patreon.com/c/JuliaTurc
D3PM: https://arxiv.org/abs/2107.03006
LLaDA: https://arxiv.org/abs/2502.09992
Scaling up Masked Diffusion Models on Text: https://arxiv.org/abs/2410.18514

▶️ The physics behind diffusion models: https://youtu.be/R0uMcXsfo2o?si=OqdGg4TPefSNTK3t

00:00 Intro
01:04: Auto-regressive vs diffusion LLMs
02:06: Why bother with diffusion for text?
06:30: The probability landscape
07:57: Diffusion in latent embedding space
11:00: Diffusion in token embedding space
12:13: Diffusion in text token space
13:49: Markov chains
16:46: Paper study: D3PM
19:42: Paper study: LLaDA
22:30: Evaluation ...

📚 Free resources (reading list + visuals): https://www.patreon.com/c/JuliaTurc<br />📃 HRM paper: https://arxiv.org/abs/2506.21734<br />▶️ Yacine's YouTube channel: https://www.youtube.com/@deeplearningexplained<br /><br />In this video, we dive into the Hierarchical Reasoning Model (HRM), a new architecture from Sapient Intelligence that challenges scaling as the only way to advance AI. With only 27M parameters, 1000 training examples, and no pretraining, HRM still manages to place on the notoriously difficult ARC-AGI leaderboard, right next to models from OpenAI and Anthropic.<br /><br />Together with Yacine Mahdid (neuroscience researcher & ML practitioner), we’ll explore:<br /> • Why vanilla Transformers plateau on tasks like Sudoku and Maze solving<br /> • How latent recurrence and hierarchical loops give HRM more reasoning depth<br /> • The neuroscience inspiration (theta–gamma coupling in the hippocampus 🧠)<br /> • HRM’s controversial evaluation on ARC-AGI: was it a breakthrough or bending the rules?<br /> • What this means for the future of reasoning in AI models<br /><br />Timestamps:<br />00:00 Introducing HRM<br />01:23 Why Sudoku breaks Transformers<br />03:07 Recurrence via Chain-of-Thought<br />04:22 HRM: bird's eye view<br />06:30 Latent recurrence<br />08:23 The neuroscience backing<br />11:43 The H and L modules<br />12:32 Backprop-through-time approximation<br />13:48 The outer loop<br />19:31 Training data augmentation<br />22:59 Evaluation on Sudoku<br />24:07 Evaluation on ARC-AGI

27:50

Hierarchical Reasoning Model: Substance or Hype?

YouTube Video VVVNVXlfV3NVSVliTV9oS1I5ZHJiOWRBLmNROEowZ3ZXYm44

Hierarchical Reasoning Model: Substance or Hype?

September 8, 2025 8:01 am

📚 Free resources (reading list + visuals): https://www.patreon.com/c/JuliaTurc
📃 HRM paper: https://arxiv.org/abs/2506.21734
▶️ Yacine's YouTube channel: https://www.youtube.com/@deeplearningexplained

In this video, we dive into the Hierarchical Reasoning Model (HRM), a new architecture from Sapient Intelligence that challenges scaling as the only way to advance AI. With only 27M parameters, 1000 training examples, and no pretraining, HRM still manages to place on the notoriously difficult ARC-AGI leaderboard, right next to models from OpenAI and Anthropic.

Together with Yacine Mahdid (neuroscience researcher & ML practitioner), we’ll explore:
• Why vanilla Transformers plateau on tasks like Sudoku and Maze solving
• How latent recurrence and hierarchical loops give HRM more reasoning depth
• The neuroscience inspiration (theta–gamma coupling in the hippocampus 🧠)
• HRM’s controversial evaluation on ARC-AGI: was it a breakthrough or bending the rules?
• What this means for the future of reasoning in AI models

Timestamps:
00:00 Introducing HRM
01:23 Why Sudoku breaks Transformers
03:07 Recurrence via Chain-of-Thought
04:22 HRM: bird's eye view
06:30 Latent recurrence
08:23 The neuroscience backing
11:43 The H and L modules
12:32 Backprop-through-time approximation
13:48 The outer loop
19:31 Training data augmentation
22:59 Evaluation on Sudoku
24:07 Evaluation on ARC-AGI ...

Full reading list: https://www.patreon.com/posts/physics-behind-136741238<br /><br />Diffusion models build on the same mathematical framework as physical diffusion. In this video, we get to the core of the connection between the physics of motion and generative AI.<br /><br />Topics covered:<br /> • The intuition of probability landscapes (data as peaks, noise as valleys)<br /> • Forward diffusion: how real data is gradually noised into chaos<br /> • Brownian motion, Wiener processes, and the physics of particle motion<br /> • Stochastic differential equations (SDEs) and the noise schedule<br /> • Training a score function model (a “compass” in the probability landscape)<br /> • Reverse diffusion and Anderson’s reverse SDE (sampling from noise to data)<br /> • Probability flow ODEs for faster, deterministic sampling<br /><br />🔗 Main resources:<br /> • Full reading list: https://www.patreon.com/posts/physics-behind-136741238<br /> • DDPM: Denoising Diffusion Probabilistic Models (https://arxiv.org/abs/2006.11239)<br /> • Score-Based Generative Modeling through Stochastic Differential Equations (https://arxiv.org/abs/2011.13456)<br /><br />00:00 Intro<br />01:06 Diffusion as a time-variant probability landscape<br />04:03 Where diffusion fits in the life of a model<br />04:34 Forward diffusion (training data generation)<br />06:25 The physics of diffusion<br />08:23 The forward SDE (Stochastic Differential Equation)<br />10:24 Case study: DDPM and noise schedules<br />13:17 The ML model as a local compass<br />14:43 Reverse diffusion and the reverse SDE<br />16:15 Samplers<br />17:27 Probability-flow ODE (Ordinary Differential Equation)<br />19:26 Outro

20:28

The physics behind diffusion models

YouTube Video VVVNVXlfV3NVSVliTV9oS1I5ZHJiOWRBLlIwdU1jWHNmbzJv

The physics behind diffusion models

August 18, 2025 8:00 am

Full reading list: https://www.patreon.com/posts/physics-behind-136741238

Diffusion models build on the same mathematical framework as physical diffusion. In this video, we get to the core of the connection between the physics of motion and generative AI.

Topics covered:
• The intuition of probability landscapes (data as peaks, noise as valleys)
• Forward diffusion: how real data is gradually noised into chaos
• Brownian motion, Wiener processes, and the physics of particle motion
• Stochastic differential equations (SDEs) and the noise schedule
• Training a score function model (a “compass” in the probability landscape)
• Reverse diffusion and Anderson’s reverse SDE (sampling from noise to data)
• Probability flow ODEs for faster, deterministic sampling

🔗 Main resources:
• Full reading list: https://www.patreon.com/posts/physics-behind-136741238
• DDPM: Denoising Diffusion Probabilistic Models (https://arxiv.org/abs/2006.11239)
• Score-Based Generative Modeling through Stochastic Differential Equations (https://arxiv.org/abs/2011.13456)

00:00 Intro
01:06 Diffusion as a time-variant probability landscape
04:03 Where diffusion fits in the life of a model
04:34 Forward diffusion (training data generation)
06:25 The physics of diffusion
08:23 The forward SDE (Stochastic Differential Equation)
10:24 Case study: DDPM and noise schedules
13:17 The ML model as a local compass
14:43 Reverse diffusion and the reverse SDE
16:15 Samplers
17:27 Probability-flow ODE (Ordinary Differential Equation)
19:26 Outro ...

The first comprehensive explainer for the GGUF quantization ecosystem.<br /><br />GGUF quantization is currently the most popular tool for Post-Training Quantization. GGUF is actually a binary file format for quantized models, sitting on top of GGML (a lean PyTorch alternative) and llama.cpp (an LLM inference engine).<br /><br />Due to its ad-hoc open-source nature, GGUF is poorly documented and misunderstood. Currently, information is scattered across Reddit threads and GitHub pull requests.<br /><br />📌 Main topics covered in this video:<br />- The ecosystem: GGML, llama.cpp, GGUF<br />- Legacy quants vs K-quants vs I-quants<br />- The importance matrix<br />- Mixed precision (_S, _M, _L, _XL variants)<br /><br />If you enjoyed this video, watch my entire series on model quantization: https://www.youtube.com/playlist?list=PL4bm2lr9UVG0HvePBXvsceO4yuLC8HhUh<br /><br />📬 Have feedback or spotted an error? Contribute to the GitHub repo or leave a comment!<br />https://github.com/iuliaturc/gguf-docs<br /><br />00:00 Intro<br />01:36 The stack: GGML, llama.cpp, GGUF<br />04:05 End-to-end workflow<br />05:29 Overview: Legacy, K-quants, I-quants<br />06:03 Legacy quants (Type 0, Type1)<br />10:57 K-quants<br />13:43 I-quants<br />17:42 Importance Matrix<br />22:51 Recap<br />23:35 Mixed precision (_S, _M, _L, _XL)

25:07

Reverse-engineering GGUF | Post-Training Quantization

YouTube Video VVVNVXlfV3NVSVliTV9oS1I5ZHJiOWRBLnZXMzBvNFU5QkZF

Reverse-engineering GGUF | Post-Training Quantization

July 14, 2025 8:01 am

The first comprehensive explainer for the GGUF quantization ecosystem.

GGUF quantization is currently the most popular tool for Post-Training Quantization. GGUF is actually a binary file format for quantized models, sitting on top of GGML (a lean PyTorch alternative) and llama.cpp (an LLM inference engine).

Due to its ad-hoc open-source nature, GGUF is poorly documented and misunderstood. Currently, information is scattered across Reddit threads and GitHub pull requests.

📌 Main topics covered in this video:
- The ecosystem: GGML, llama.cpp, GGUF
- Legacy quants vs K-quants vs I-quants
- The importance matrix
- Mixed precision (_S, _M, _L, _XL variants)

If you enjoyed this video, watch my entire series on model quantization: https://www.youtube.com/playlist?list=PL4bm2lr9UVG0HvePBXvsceO4yuLC8HhUh

📬 Have feedback or spotted an error? Contribute to the GitHub repo or leave a comment!
https://github.com/iuliaturc/gguf-docs

00:00 Intro
01:36 The stack: GGML, llama.cpp, GGUF
04:05 End-to-end workflow
05:29 Overview: Legacy, K-quants, I-quants
06:03 Legacy quants (Type 0, Type1)
10:57 K-quants
13:43 I-quants
17:42 Importance Matrix
22:51 Recap
23:35 Mixed precision (_S, _M, _L, _XL) ...

This error message is only visible to WordPress admins

Error 400: API key not valid. Please pass a valid API key..

Domain code: global
Reason code: badRequest

Julia Turc

Julia Turc

Why are diffusion LLMs so fast?

About us

Company