Cross Layer Transcoders for the Qwen3 LLM Family

Digging Into Interpretable Features

This post was originally published on LessWrong

Sparse autoencoders SAEs and cross layer transcoders CLTs have recently been used to decode the activation vectors in large language models into more interpretable features. Analyses have been performed by Goodfire, Anthropic, DeepMind, and OpenAI. BluelightAI has constructed CLT features for the Qwen3 family, specifically Qwen3-0.6B Base and Qwen3-1.7B Base, which are made available for exploration and discovery here. In addition to the construction of the features themselves, we enable the use of topological data analysis (TDA) methods for improved interaction and analysis of the constructed features.

We have found anecdotally that it is easier to find clearer and more conceptually abstract features in the CLT features we construct than what we have observed in other analyses. Here are a couple of examples from Qwen3-1.7B-Base:

Layer 20, feature 847: Meta-level judgment of conceptual or interpretive phrases, often with strong evaluative language. It fires on text that evaluates how something is classified, framed, or interpreted, especially when it says that a commonly used label or interpretation is wrong.

You might be tempted to paraphrase Churchill and say it was the end of the beginning, but it wasn’t that either.
This is peculiar objection to imprisonment – rather like complaining that your TV is not working because it does not defrost chickens
Well, yeah, that’s like saying that you owe money on your mortgage because you borrowed it. The real question is “why do we have to keep running such large deficits?”

Layer 20, feature 179: Fires on phrases about criteria or conditions that must be fulfilled, and is multilingual.

Also, strong skin pigmentation or tattoo at the measurement location was regarded as exclusion criterion as it might interfere with the green light-based PPG.
Protect doctrine should conditions be favorable and calling for unilateral limited military efforts to establish safe-zones in February 2012
Computerprogramme sind jedoch nur von der Patentierbarkeit ausgeschlossen, soweit sie nicht die allgemeinen Patentierbarkeitskrifterien erfüllen
Es realizado por los pediatras que atienden al neonato siguiendo los criterios protocolizados

In addition, a number of features are preferentially highly active on the CLTs and show high activation for concepts specifically isolated to stop words and punctuation, as was observed in this analysis.

Topological data analysis methods are used to enable the identification and analysis of groups of features. Even though the CLT features we construct are often meaningful by themselves, it is certainly the case that ideas and concepts will be more precisely identified by groups of features. TDA enables the determination of groups of features that are close in a similarity measure through a visual interface.

Here is an illustration of the interface. Each node in the graph corresponds to a group of features, so groups of nodes also correspond to groups of features. The circled group is at least partially explained by the phrases on the right.

We also believe that TDA can be used effectively as a tool for circuit-tracing in LLMs. Circuit tracing is now very much a manual procedure that selects individual features and looks at individual features in subsequent layers that they connect to. Connections between groups are something one would very much like to analyze, and we will return to that in a future post.

Cross Layer Transcoders for the Qwen3 LLM Family

Digging Into Interpretable Features

Try it: https://qwen3.bluelightai.com