ARBOx - OAISI

(Alignment Research Bootcamp Oxford)

Applications closed on May 25th, 2025. You can express your interest in future bootcamps here.

When: June 30th - July 11th, 2025

What: A 2-week intensive bootcamp to rapidly build skills in ML safety, including building gpt-2-small, learning interpretability techniques, understanding RLHF, and replicating key research papers.

Who: Ideal for those new to mechanistic interpretability, with basic familiarity in linear algebra, Python, and AI safety (e.g. AI Safety Fundamentals). Oxford students are particularly encouraged to apply.

Programme Details

Dates: June the 30th to July the 11th, 2025
What we provide: Central Oxford accommodation for non-Oxford residents, lunch, and potential support with travel expenses.

The curriculum includes content from ARENA, whose alumni have gone on to become MATS scholars, LASR participants, and AI safety engineers at organizations like Apollo Research, Anthropic, METR, and OpenAI, and even founders of their own AI safety initiatives.

Who should apply?

We’re looking for applicants who are new to mechanistic interpretability. Basic familiarity with linear algebra, Python programming, and AI safety is expected (e.g., having completed AI Safety Fundamentals or another fellowship).

You do not need to be an Oxford student to participate. ARBOx is designed to upskill participants in ML safety, targeting those who would benefit from this training, regardless of their background.

We think you would be a good fit if you are a postgraduate student or a working professional, though we will also consider strong undergraduate applicants.

Syllabus

We’ll cover:

building gpt-2-small from scratch
attention
replicating a paper or two (e.g. Redwood’s IOI paper)
and a brief introduction to RL and RLHF.

We’ll have lectures in the morning covering aspects of the syllabus, and the rest of the day will be spent pair-programming. During lunch break there will be short talks from experts in the field, and we plan to run a couple of socials for the participants.