General
March 23, 2026
Abstract Large language models (LLMs) trained on next-token prediction objectives exhibit a striking directional asymmetry in factual recall: a model…
Read More
General
March 23, 2026
Abstract The uniform application of computational resources across all tokens in a transformer sequence is a fundamental inefficiency: not every…
Read More
General
March 23, 2026
Abstract Grokking — the phenomenon whereby neural networks first overfit to training data and only later, after extensive additional training,…
Read More
General
March 23, 2026
Abstract Understanding the internal representations of large language models (LLMs) is a central challenge in mechanistic interpretability. A dominant hypothesis…
Read More
General
March 23, 2026
Abstract Memory bandwidth is the dominant constraint on autoregressive inference in large language models. As sequence lengths grow and batch…
Read More
General
March 23, 2026
Abstract The standard self-attention mechanism in transformer models incurs $O(N^2)$ time and space complexity with respect to sequence length $N$,…
Read More
General
March 22, 2026
Abstract Reinforcement Learning from Human Feedback (RLHF) has become the dominant paradigm for aligning large language models with human values.…
Read More
General
March 22, 2026
Abstract Activation engineering has emerged as a principled framework for causally intervening in the internal representations of large language models…
Read More
General
March 22, 2026
Abstract The dominant paradigm in large language model (LLM) development has concentrated intelligence in training: more parameters, more data, more…
Read More