The Alignment Meetup

A Seattle-based group exploring the "Alignment Problem" in AI and machine learning systems. We welcome all—non-technical and technical folks alike, from all fields—who are interested in understanding how to ensure advanced AI systems remain aligned with human values.

When & Where

Schedule: Monthly, typically the second Thursday
Time: 5:00 PM – 7:00 PM PST
Location: Amazon - Doppler, 2021 7th Ave, Seattle, WA 98121
Format: Hybrid (in-person and online via Zoom)

Join us on Meetup

What We Do

Our meetups center on paper reading and discussion. A typical session includes:

  • 5:00–5:15 PM: Arrival and registration
  • 5:15–5:30 PM: Introductions and networking
  • 5:30–6:45 PM: Paper discussion (hybrid via Zoom)
  • 6:45–7:00 PM: Wrap-up and planning next session

Beyond learning, we aim to identify promising areas of research and find collaborators for related work.

Papers Reviewed (Chronological)

2024/01/25 What Does It 'Feel' Like to Be a Chatbot?

Exploring the question of consciousness and subjective experience in AI systems.

2024/02/08 Toy Models of Superposition

Understanding how neural networks represent more features than they have dimensions.

2024/03/07 Model Organisms of Misalignment

A case for studying misalignment through deliberately constructed model organisms.

2024/04/04 Sleeper Agents

Training deceptive LLMs that persist through safety training.

2024/05/08 Weak to Strong Generalization

Eliciting strong capabilities from models using weak supervision.

2024/06/05 Practices for Governing Agentic AI Systems

Guidelines for safely deploying AI systems that can take actions autonomously.

2024/07/11 Filler Tokens in Chain-of-Thought

Investigating the role of filler tokens in reasoning chains.

2024/08/22 Value Graph

A method for specifying and representing values in AI systems.

2024/09/18 Scaling Monosemanticity

Extracting interpretable features from Claude 3 Sonnet.

2024/10/16 Circuit Breakers

Representation engineering for preventing harmful outputs.

2024/12/18 Introducing the Model Spec

OpenAI's framework for specifying model behavior and values.

2025/01/22 Rules Based Rewards

Using rule-based approaches for reward specification.

2025/03/12 Alignment Faking

Studying how models might fake alignment during evaluation.

2025/04/23 Utility Engineering

Engineering approaches to utility specification in AI systems.

2025/05/28 Values in the Wild

Studying how values emerge and manifest in deployed AI systems.

2025/06/25 Monitoring Reasoning Models via Chain-of-Thought

Using reasoning traces to monitor model behavior and alignment.

2025/08/06 Neural Self-Other Overlap

Investigating self-other representations in neural networks.

2025/09/10 Persona Vectors

Understanding and controlling model personas through vector representations.

2025/10/08 Subliminal Learning

Investigating implicit learning patterns in language models.

2025/11/13 Misalignment Behavior Study

Empirical analysis of misaligned behavior in AI systems.

2025/12/11 Interpretability Research

Advancing mechanistic interpretability techniques.

Papers by Category

1. Goals/Purpose/Values

What should AI systems ultimately optimize for? This category explores the fundamental question of defining objectives—both immediate operational goals and longer-term aspirations for AI that genuinely benefits humanity.

No papers reviewed yet in this category.

2. Consciousness/Awareness/Agency

Do AI systems have subjective experiences, and does it matter for alignment? This category examines questions of machine consciousness, self-awareness, and what it means for an AI to be an agent with its own perspective.

3. Understanding Components and Behavior

How do neural networks actually work internally, and what are they learning? This category covers research into the mechanisms, representations, and emergent properties of AI systems—essential for predicting and controlling their behavior.

Interpretability

Opening the black box to understand what models are computing. Research on extracting meaningful features, understanding circuits, and making model internals human-comprehensible.

Emergent Value Learning & Expression

How do values spontaneously arise in trained models? Research on understanding what preferences and behaviors emerge from training, and how models express learned values in practice.

4. Misalignment Testing

How do we proactively discover alignment failures before deployment? This category covers adversarial testing, red teaming methodologies, and systematic approaches to finding cases where AI systems behave contrary to intended values.

No papers reviewed yet in this category.

5. Studying Misaligned Behavior

What does misalignment look like in practice, and how does it arise? This category examines empirical studies of deceptive behavior, goal misgeneralization, and other failure modes where AI systems pursue unintended objectives.

6. Alignment Techniques: Science

The scientific foundations for building aligned AI systems. This category covers theoretical frameworks and empirical methods for specifying what we want AI to do and verifying that it actually does it.

Value Specification

How do we formally express human values in a form AI can use? Research on representing complex, contextual human preferences in structured formats that can guide AI behavior.

Learning Generally

How can AI systems learn robust values that generalize beyond training? Research on transferring alignment from weaker to stronger systems and ensuring learned values apply broadly.

Learning to Manage Edge Cases

How should AI handle unusual or adversarial inputs safely? Research on building robustness to distribution shift, handling ambiguous situations, and failing gracefully when uncertain.

Monitoring

How do we detect alignment failures in deployed systems? Research on runtime monitoring, anomaly detection, and using model outputs like chain-of-thought to verify aligned behavior.

7. Alignment Techniques: Engineering

Practical engineering approaches for building aligned systems. This category covers implementation patterns, architectural choices, and development practices that make alignment easier to achieve and maintain in production systems.

No papers reviewed yet in this category.

8. Governance

How should organizations and society manage AI development responsibly? This category covers frameworks for AI governance, organizational practices, policy recommendations, and standards for safe deployment of capable AI systems.

9. Co-Alignment/Existence

How do humans and AI systems align with each other over time? This category explores the long-term dynamics of human-AI collaboration, mutual adaptation, and what a future of beneficial coexistence might look like.

No papers reviewed yet in this category.

Join The Alignment Meetup

Whether you're a researcher, engineer, or simply curious about AI alignment, you're welcome to join our discussions.

Join on Meetup