Abstract Chain-of-thought (CoT) prompting—eliciting intermediate reasoning steps from large language models before producing a final answer—has become one of the…
Abstract Multi-head attention is the central computational primitive of transformer architectures, yet the question of what individual attention heads actually…
Abstract Reinforcement Learning from Human Feedback (RLHF) has emerged as the dominant post-training paradigm for aligning large language models (LLMs)…
Abstract Mechanistic interpretability aims to reverse-engineer the algorithms implemented by neural networks by identifying interpretable computational units — circuits, features,…
Abstract Mixture-of-Experts (MoE) architectures have emerged as one of the most computationally compelling approaches to scaling large language models without…