AbstractPhil
·
AI & ML interests
datasets, research papers, experimentation, vision, classification, text encoders, tokenization, llms, diffusion, distillation, and more.
Recent Activity
reacted to OzTianlu's post with 🧠 about 23 hours ago ResNet is Explicit Euler. GPT is Implicit Euler. What Else is Hiding in Plain Sight?
Read online: https://datawhalechina.github.io/learning-terrain/
I wrote an open-source monograph on learning dynamics — The Terrain of Learning. Bilingual (Chinese/English), 4 volumes, 12 chapters, 30+ print-grade figures. Completely free (CC BY-NC-SA 4.0).
The core argument: gradient descent is not optimization. It's terrain motion. The loss function is a landscape. The gradient is the direction of slope. The optimizer is how you choose each step. Once you see it this way, everything clicks:
ResNet = explicit Euler integration on a vector field. The residual branch is the vector field. Each layer takes one Euler step.
GPT autoregression = implicit-state Euler iteration. Stable where explicit Euler explodes. That's why transformers handle long-range dependencies.
DEQ = the Banach fixed-point theorem in production. The forward pass is root-finding. There are no layers to backprop through.
KL divergence = a Bregman divergence on the entropy landscape. Your belief space is curved, not flat.
Chain-of-thought reasoning = hidden states flowing along a reasoning field toward an attractor basin. Correct answers have wide basins. The number of reasoning steps is determined by the terrain, not by the problem.
Diffusion models = systems flowing downhill along a score vector field, from noise to structure, from high energy to low energy.
The book traces one idea across 337 years — from F=ma (Newton, 1687) to H=T+V (Hamilton, 1833) to loss landscape + gradient field (2020s). Hamilton replaced a catalog of forces with one geometric object. This book does the same for deep learning.
GitHub: https://github.com/datawhalechina/learning-terrain
Discussion: https://github.com/datawhalechina/learning-terrain/discussions/2
Convergence is not hope. Convergence is geometry. You see. repliedto OzTianlu's post about 23 hours ago ResNet is Explicit Euler. GPT is Implicit Euler. What Else is Hiding in Plain Sight?
Read online: https://datawhalechina.github.io/learning-terrain/
I wrote an open-source monograph on learning dynamics — The Terrain of Learning. Bilingual (Chinese/English), 4 volumes, 12 chapters, 30+ print-grade figures. Completely free (CC BY-NC-SA 4.0).
The core argument: gradient descent is not optimization. It's terrain motion. The loss function is a landscape. The gradient is the direction of slope. The optimizer is how you choose each step. Once you see it this way, everything clicks:
ResNet = explicit Euler integration on a vector field. The residual branch is the vector field. Each layer takes one Euler step.
GPT autoregression = implicit-state Euler iteration. Stable where explicit Euler explodes. That's why transformers handle long-range dependencies.
DEQ = the Banach fixed-point theorem in production. The forward pass is root-finding. There are no layers to backprop through.
KL divergence = a Bregman divergence on the entropy landscape. Your belief space is curved, not flat.
Chain-of-thought reasoning = hidden states flowing along a reasoning field toward an attractor basin. Correct answers have wide basins. The number of reasoning steps is determined by the terrain, not by the problem.
Diffusion models = systems flowing downhill along a score vector field, from noise to structure, from high energy to low energy.
The book traces one idea across 337 years — from F=ma (Newton, 1687) to H=T+V (Hamilton, 1833) to loss landscape + gradient field (2020s). Hamilton replaced a catalog of forces with one geometric object. This book does the same for deep learning.
GitHub: https://github.com/datawhalechina/learning-terrain
Discussion: https://github.com/datawhalechina/learning-terrain/discussions/2
Convergence is not hope. Convergence is geometry. You see. View all activity Organizations
view article geolip-aleph-void: The First Relational Geometric Vocabulary Patchwork
AbstractPhil
• view article Reading the Voids: Topological Contribution Signals in Frozen Geometric Codebooks
AbstractPhil
• view article The geolip-svae-transformer
AbstractPhil
• published an article about 1 month ago view article Fused Batched Thin SVD, Part II: Extending the Jacobi Pipeline to N=6 with Configurable Convergence
published an article about 2 months ago view article H2 Omega Confirmed, Paradigm Shift: Attempting to Disprove Omega As A Whole
AbstractPhil
• • 1
published an article about 2 months ago view article The Polygonal Omega: Trained Sphere-Solvers Are Projective Codebooks
AbstractPhil
• • 1
published an article about 2 months ago view article Three Geometric Bands in a Sphere-Normalized Patch Autoencoder
view article The Geometric Engine: Structural Attractors in Neural Network Weight Space
view article FL Hybrid Eigendecomposition Beating cuSOLVER's Mathematical Purity with Compilable PyTorch
view article Ryan Spearman: Geometric Variant Effect Prediction Through Quaternion-Composed Dual Expert Alignment
view article Fused Batched Thin SVD: Engineering a 5000× Speedup with Triton Kernels
view article A geometric encoder's toolkit: deterministic primitives for hyperspherical image encoding
view article Constellation Relay, Geometric Bottleneck, and the Re-Emergence of the Potential 0.29154 Binding Constant
view article Procrustes ViT Shared Manifold Alignment Experimentation
view article Geometric Memory III: Resonant Optimization, Consensus Distillation, and Evolutionary Training Paradigms
view article Geometric Memory II: Sequence Reconstruction, Diffusion Integration, and the Numerical Topology of Alignment
view article Geometric Memory: Context Extension and Cross-Model Alignment Through Pentachoron Regularization
view article Geometric Fusion: Cross-Modal Alignment Through Shared Pentachoron Geometry
view article QWEN 3.5 Residual Thinking Embeddings: How Language Models Transform Text Through Deliberative Generation