Blog - Page 2 of 6 - Arda Tuğsat

General March 23, 2026

The Reversal Curse and Factual Asymmetry in Large Language Models: Directional Memorization, Training Data Bias, and the Limits of Associative Learning

Abstract Large language models (LLMs) trained on next-token prediction objectives exhibit a striking directional asymmetry in factual recall: a model…

General March 23, 2026

Mixture-of-Depths in Transformers: Dynamic Compute Allocation, Token Routing, and the Efficiency Frontier of Adaptive Computation

Abstract The uniform application of computational resources across all tokens in a transformer sequence is a fundamental inefficiency: not every…

General March 23, 2026

Grokking and Delayed Generalization in Neural Networks: Phase Transitions, Weight Norm Dynamics, and the Mechanisms of Late-Stage Representation Learning

Abstract Grokking — the phenomenon whereby neural networks first overfit to training data and only later, after extensive additional training,…

General March 23, 2026

Sparse Autoencoders for Mechanistic Interpretability: Feature Discovery, Superposition, and the Dictionary Learning Approach to Language Model Internals

Abstract Understanding the internal representations of large language models (LLMs) is a central challenge in mechanistic interpretability. A dominant hypothesis…

General March 23, 2026

Grouped Query Attention and Multi-Query Attention: KV Cache Compression, Inference Efficiency, and the Memory Bandwidth Bottleneck in Large Language Models

Abstract Memory bandwidth is the dominant constraint on autoregressive inference in large language models. As sequence lengths grow and batch…

General March 23, 2026

FlashAttention and IO-Aware Algorithm Design: Recomputation, Tiling, and the Memory Hierarchy of Efficient Transformers

Abstract The standard self-attention mechanism in transformer models incurs $O(N^2)$ time and space complexity with respect to sequence length $N$,…

General March 22, 2026

Direct Preference Optimization: Bypassing the Reward Model in RLHF and the Mathematics of Implicit Reward Learning

Abstract Reinforcement Learning from Human Feedback (RLHF) has become the dominant paradigm for aligning large language models with human values.…

General March 22, 2026

Activation Engineering and Steering Vectors: Causal Intervention in the Residual Stream of Language Models

Abstract Activation engineering has emerged as a principled framework for causally intervening in the internal representations of large language models…

General March 22, 2026

Test-Time Compute Scaling in Large Language Models: Search, Verification, and the Inference-Time Intelligence Frontier

Abstract The dominant paradigm in large language model (LLM) development has concentrated intelligence in training: more parameters, more data, more…