(Alignment Research Bootcamp Oxford)
When: January 6th-17th, 2025
What: A 2-week intensive bootcamp to rapidly build skills in ML safety, including building gpt-2-small, learning interpretability techniques, understanding RLHF, and replicating key research papers.
Who: Ideal for those new to mechanistic interpretability, with basic familiarity in linear algebra, Python, and AI safety (e.g. AI Safety Fundamentals). Oxford students are particularly encouraged to apply.
(Applications close end of day anywhere on Earth, 13 December 2024.)
Programme Details
- Dates: January 6th-17th, 2025
- What we provide: Accommodation in central Oxford, provided meals, and support for partial travel costs
The curriculum includes content from ARENA, whose alumni have gone on to become MATS scholars, LASR participants, and AI safety engineers at organizations like Apollo Research, Anthropic, METR, and OpenAI, and even founders of their own AI safety initiatives.
Who should apply?
We’re looking for applicants who are new to mechanistic interpretability. Basic familiarity with linear algebra, Python programming, and AI safety is expected (e.g., having completed AI Safety Fundamentals or another fellowship).
You do not need to be an Oxford student to participate. ARBOx is designed to upskill participants in ML safety, targeting those who would benefit from this training, regardless of their background.
Syllabus
We’ll cover:
- building gpt-2-small from scratch
- attention
- replicating a paper or two (e.g. Redwood’s IOI paper)
- and a brief introduction to RL and RLHF.
We’ll have some talks covering aspects of the syllabus, but most of the day will be spent pair-programming, and we plan to run a couple of socials for the participants.
Teaching assistants
David Quarel - Teaching assistant
David Quarel is a PhD student at the Australian National University (ANU), supervised by Marcus Hutter, focusing on AI safety, Universal Artificial Intelligence and Mechanistic Interpretability. He holds a BSc in physics and mathematics (2013-2017) and a MComp, specialising in AI and ML (2017-2019). Prior to starting his PhD, he spent two years teaching full-time at the ANU in mathematics, theoretical computer science, and digital hardware design. He delivered guest lectures and has years of experience with developing and delivering course content, and co-authored a textbook. David has also taught at previous iterations of ARENA as well as CaMLAB, and worked as a research assistant with KASL, an AI safety lab based at Cambridge University.
David enjoys road cycling, rock climbing, teaching, rationalism, and is trying to get more into the habit of writing.
Nicky Pochinkov - Teaching assistant
Nicky is an independent AI safety researcher, focusing mostly on higher-level mechanistic interpretability of Language Models. In particular, he’s developing frameworks for long-term behaviour modelling in language models, and exploring modularity and capability separability (including Machine Unlearning).
Before getting started in AI Safety and taking part in SERI MATS in 2022 under John Wentworth, Nicky competed internationally in both the Mathematical and Chemistry Olympiads, studied Theoretical Physics, interned in Software Engineering, and took part in a startup accelerator program (Patch).
He enjoys cooking and eating vegan cuisine, rock-climbing, travelling, listening to audiobooks, viewing anime (at 3x speed), playing video and board games, and self-hosting various services.