BluelightAI Research Fellowship

At BluelightAI we believe that understanding how AI models work will be a key factor in ensuring that these models benefit humanity. We’re using topological data analysis and mechanistic interpretability to get insights into models’ internal functioning, and building tools to leverage those insights in real-world scenarios. Some things we’ve been working on recently include training cross-layer transcoders for Qwen 3, using CLT/SAE features to train interpretable classifiers, and using SAE features to investigate patterns in model performance.

We’re excited to open a number of research fellowship positions for students, postdocs, or others who are interested in getting deeper into mechinterp and TDA. These will be high-velocity collaborations with BluelightAI team members to make discoveries about how AI models work. These will be remote collaborations - applicants from all around the world will be accepted. If you have experience with TDA, LLM training, or mechanistic interpretability, we’d love to work with you!

Scope

Research projects lasting from 1-3 months, leveraging topological data analysis and mechanistic interpretability to answer questions about large language models. We expect projects to involve 10-20 hours of work per week.
What we’ll provide:
A one-time research stipend of $5000
At least weekly 1-1 mentorship meetings to advance the research project
Access to compute resources
Early access to Cobalt (our TDA toolkit) and our mechanistic interpretability tooling
Each project will at minimum produce a blog post that will be shared on our website and LessWrong, and we anticipate that many projects will develop into publishable research papers.

Application process

You’ll need to provide a CV/resume, a personal statement, a short (1-2 paragraph) proposal for a research project (see project ideas for inspiration), and references to any related work you’ve done previously (including informal things like blog posts). We’ll have one or two interviews, and if you’re selected, we’ll flesh out a research plan and begin ASAP. Applications will be taken on a rolling basis, based on applicant quality and our current capacity. We expect to have 3-4 participants in our first batch of fellowships.

Project ideas

Here are a few things we’ve been thinking about that might serve as inspiration for your proposals.
Use Cobalt to develop a thorough taxonomy of features for one of our cross-layer transcoder models
Identify how different LLMs differ in “vibes” or capabilities using libraries of features from SAEs or other interpreter models
Develop generalizations of sparse autoencoders that take into account feature geometry
Help automate circuit tracing by developing techniques to automatically group features into “supernodes” of related features
Improve feature autointerpretation by incorporating information about related features
Fine-tune SAEs or CLTs on domain specific data to investigate particular behaviors in more detail
Mechanistic investigation of prompt injections or jailbreaks
Search for feature manifolds like those uncovered in When Models Manipulate Manifolds
Analyze the latent space evolution of reasoning chain-of-thought
Investigate components of models with some degree of built-in sparsity (e.g. expert routing, MLP gating, LoRA adapters) to identify interpretable patterns
Investigate a specific model capability of interest, e.g. basic arithmetic, tracking parts of speech, maintaining indentation/nesting state in code
Develop techniques for engineering new features and injecting them into AI models

Apply here

Questions? Email jakob@bluelightai.com