揭秘AI黑箱:Anthropic开源Circuit Tracer工具,助力大语言模型可解释性研究

人工智能领域,模型决策的透明性一直是核心挑战。Anthropic推出的开源工具Circuit Tracer通过可视化大语言模型(LLMs)的内部计算逻辑,为研究者提供了破解AI“黑箱”的钥匙。该工具能生成归因图谱(attribution graphs),清晰展示模型从输入到输出的推理路径,支持Gemma-2-2b、Llama-3.2-1b等主流开源模型,并集成Neuronpedia交互式前端,实现用户友好的图谱分析与标注。

Circuit Tracer由Anthropic Fellows计划与Decode Research联合开发,其独特功能包括:
1. 动态假设验证:通过修改特征值观察输出变化,辅助因果推理研究;
2. 复杂行为解析:已应用于多步推理、多语言表征等前沿课题;
3. 社区协作生态:提供Demo笔记本和代码仓库,鼓励开发者共同推进AI可解释性

Anthropic强调,当前AI能力的飞速发展远超人类对其内部机制的理解,而开源此工具正是为了弥合这一鸿沟。研究人员现可通过Neuronpedia生成个性化归因图谱,或深入代码库开展高阶研究,推动AI透明化进程。


In the world of artificial intelligence, understanding how models make decisions is crucial. Anthropic’s Circuit Tracer is an innovative open-source tool designed to help researchers visualize the internal computations of large language models (LLMs). By generating attribution graphs, users can trace the steps a model takes to arrive at a specific output, shedding light on the often opaque processes of AI.

The Circuit Tracer library supports popular open-weight models and is complemented by an interactive frontend hosted on Neuronpedia. This allows users to explore their generated attribution graphs in a user-friendly manner. The tool was developed by participants in Anthropic’s Fellows program in collaboration with Decode Research, showcasing a community-driven effort to enhance AI interpretability.

With the Circuit Tracer, researchers can trace circuits on supported models, visualize and annotate graphs, and even test hypotheses by modifying feature values to observe changes in model outputs. This capability is not just theoretical; Anthropic has already utilized these tools to study complex behaviors like multi-step reasoning and multilingual representations in models such as Gemma-2-2b and Llama-3.2-1b. Interested users can access a demo notebook for examples and further analysis.

By open-sourcing these tools, Anthropic aims to bridge the gap in our understanding of AI’s inner workings, which currently lags behind the rapid advancements in AI capabilities. The hope is that the broader community will leverage these tools to explore model behaviors and contribute to the ongoing development of AI transparency. For those eager to dive deeper, the Neuronpedia interface allows for generating and viewing personalized attribution graphs, while the code repository offers resources for more sophisticated research.

Explore the potential of the Circuit Tracer and join the movement towards greater transparency in AI by visiting Neuronpedia .

×
广告图片
滚动至顶部