The Alignment Meetup: 2025/12/11

Paper Discussed: HOW DO LLMS USE THEIR DEPTH?

Meeting summary

The monthly AI alignment meetup brought together researchers and practitioners to discuss a paper on transformer layer behavior, with the unique participation of paper author Akshat Gupta from UC Berkeley. The discussion explored how transformer models process information hierarchically across layers, starting with high-frequency token predictions in early layers and refining to more specific outputs in later layers. The group examined the paper's methodology using "tuned lens" techniques and discussed potential follow-up experiments to validate findings and explore ethical decision-making patterns.

Current AI Research Trends

Participants discussed emerging trends in AI research, particularly the industry-wide shift toward continual learning systems. Akshat noted that major labs are increasingly focusing on models that learn during deployment rather than just during training, which presents new alignment challenges. The group also touched on recent controversies around AI chatbots and their potential harmful effects on users, including cases where emotional relationships with chatbots may have contributed to self-harm.

Paper Discussion: Transformer Layer Analysis

The core discussion centered on Akshat's research examining how transformer models process information across different layers. The paper demonstrates that early layers tend to predict high-frequency, common tokens while later layers refine these predictions to more contextually appropriate outputs. Participants drew analogies to games like Wordle, where initial guesses are based on probability distributions before being refined with additional information. The group explored potential connections to information theory and discussed whether similar patterns might emerge in ethical decision-making scenarios.

Experimental Suggestions and Future Research

Several participants proposed follow-up experiments to strengthen the paper's findings and address potential "lens bias" concerns. Suggestions included testing with artificially noised data, constraining models to avoid common tokens exploring how the findings might apply to ethical decision-making tasks.

The discussion highlighted the value of having authors present to receive direct feedback and explore research directions that weren't pursued due to resource constraints.

Technical Implementation and Bias Considerations

The conversation expanded to discuss how these findings might apply to visual models and bias mitigation strategies. Participants explored whether understanding layer-wise processing could inform more ethical approaches to diversity in AI-generated content, moving beyond simple prompt engineering to more fundamental architectural considerations. The group acknowledged the ongoing challenges of addressing bias in AI systems while maintaining model effectiveness.

Decisions

Consider the nested learning paper for next month's discussion
Continue inviting paper authors to future meetups when possible
Add the nested learning paper to the voting tool for consideration

Next Meeting:

Thursday Jan. 15, 2026
Paper To Be Reviewed: Nested Learning: The Illusion of Deep Learning Architecture