ARBOx Logo

(Alignment Research Bootcamp Oxford)


When: January 6th-17th, 2025

What: A 2-week intensive bootcamp to rapidly build skills in ML safety, including building gpt-2-small, learning interpretability techniques, understanding RLHF, and replicating key research papers.

Who: Ideal for those new to mechanistic interpretability, with basic familiarity in linear algebra, Python, and AI safety (e.g. AI Safety Fundamentals). Oxford students are particularly encouraged to apply.

Applications have now closed.

Express interest in future iterations here.


Programme Details

The curriculum includes content from ARENA, whose alumni have gone on to become MATS scholars, LASR participants, and AI safety engineers at organizations like Apollo Research, Anthropic, METR, and OpenAI, and even founders of their own AI safety initiatives.

Who should apply?

We’re looking for applicants who are new to mechanistic interpretability. Basic familiarity with linear algebra, Python programming, and AI safety is expected (e.g., having completed AI Safety Fundamentals or another fellowship).

You do not need to be an Oxford student to participate. ARBOx is designed to upskill participants in ML safety, targeting those who would benefit from this training, regardless of their background.


Syllabus

We’ll cover:

We’ll have some talks covering aspects of the syllabus, but most of the day will be spent pair-programming, and we plan to run a couple of socials for the participants.


Teaching assistants

David Quarel's profile picture

David Quarel - Teaching assistant

David Quarel is a PhD student at the Australian National University (ANU), supervised by Marcus Hutter, focusing on AI safety, Universal Artificial Intelligence and Mechanistic Interpretability. He holds a BSc in physics and mathematics (2013-2017) and a MComp, specialising in AI and ML (2017-2019). Prior to starting his PhD, he spent two years teaching full-time at the ANU in mathematics, theoretical computer science, and digital hardware design. He delivered guest lectures and has years of experience with developing and delivering course content, and co-authored a textbook. David has also taught at previous iterations of ARENA as well as CaMLAB, and worked as a research assistant with KASL, an AI safety lab based at Cambridge University.

David enjoys road cycling, rock climbing, teaching, rationalism, and is trying to get more into the habit of writing.


Nicky Pochinkov's profile picture

Nicky Pochinkov - Teaching assistant

Nicky is an independent AI safety researcher, focusing mostly on higher-level mechanistic interpretability of Language Models. In particular, he’s developing frameworks for long-term behaviour modelling in language models, and exploring modularity and capability separability (including Machine Unlearning).

Before getting started in AI Safety and taking part in SERI MATS in 2022 under John Wentworth, Nicky competed internationally in both the Mathematical and Chemistry Olympiads, studied Theoretical Physics, interned in Software Engineering, and took part in a startup accelerator program (Patch).

He enjoys cooking and eating vegan cuisine, rock-climbing, travelling, listening to audiobooks, viewing anime (at 3x speed), playing video and board games, and self-hosting various services.