This article explores how BluelightAI uses Topological Data Analysis (TDA) to understand internal mechanisms of large language models, specifically the Qwen3 family.
Core Argument
The researchers address the "black box" nature of LLMs by introducing Cross-Layer Transcoders (CLTs) and their proprietary Cobalt software. They argue that "understanding how models construct concepts is vital for control and diagnosis" and present their work as advancing mechanistic interpretability.
Methodology
The team:
- Identified feature clusters within single layers showing coactivation patterns
- Computed mean encoder vectors for target clusters
- Scanned preceding layers for influential features
- Filtered for frequently-activating features (>1 per 10,000 tokens)
Case Study 1 โ Software Exceptions
Traced how "problem severity" concepts (medical context) evolve through validation and problem-fixing stages before materializing as software exception handling in code.
Case Study 2 โ Progress Metaphors
Demonstrated how physical movement features transform into abstract concepts like "one step further" and "step in the right direction" through intermediate layers handling comparative analysis and pathfinding.
Key Insight
"Topological approaches respect high-dimensional data shape" rather than forcing rigid clustering, revealing transitional states and branching concept paths across model layers.
The authors released CLTs for Qwen3-0.6B and Qwen3-1.7B models alongside an interactive explorer tool at qwen3.bluelightai.com.