Angelica Chen

Selected Papers | Invited Talks | Google Scholar | Twitter

Hi! I’m a PhD student at NYU Center for Data Science in the Machine Learning for Language group, advised by Kyunghyun Cho. My research primarily focuses on understanding and improving how large language models (LLMs) learn from online feedback through the lens of training dynamics. Recently, I’ve also become interested in the use of LLMs for biological applications.

I have worked as a PhD intern at Google Research on streaming models, as a student researcher at Google Brain on evolution with LLMs, and as a research intern at Prescient Design on LLMs for discrete sequence optimization. Prior to NYU, I worked as a SWE at Google and graduated with high honors from Princeton Computer Science. Sebastian Seung advised my senior thesis at Princeton, for which I received an Outstanding Computer Science Thesis award.

Outside of my research, I enjoy running and baking more pastries than I can feasibly eat. I also volunteer as a NYSDOH-certified rape and domestic violence crisis counselor/victim advocate for the NYC Crime Victims Treatment Center at local hospital EDs.

News

I gave a recent plenary talk at the ICML 2024 Workshop for High-Dimensional Learning Dynamics (HiLD) on the kinds of misleading conclusions we might come to if we neglect to analyze LLM training dynamics. Video is here (requires an ICML login, starts at ~52:00) and slides are here.
As of fall 2024, I am also a Visiting Researcher (20% time) at Meta AI NYC and a part-time ML Scientist (20% time) at Prescient Design. In summer 2025, I will be joining Google DeepMind as a Senior Research Scientist (based in the NYC office) working on Gemini training.

Selected Papers

My work is largely split into three general directions – understanding LLM training, improving how LLMs learn from feedback, and evaluating LLMs. For a more complete list of my papers, please see Semantic Scholar.

Understanding LLM Training

Preference Learning Algorithms Do Not Learn Preference Rankings
NeurIPS 2024
Oral at ICML 2024 Workshop on Models of Human Feedback for AI Alignment (MHFAIA)
Chen, Angelica, Sadhika Malladi, Lily H. Zhang, Xinyi Chen, Qiuyi Zhang, Rajesh Ranganath, Kyunghyun Cho.
[Arxiv] [GitHub]

Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
ICLR 2024 (Spotlight)
Chen, Angelica, Ravid Shwartz-Ziv, Kyunghyun Cho, Matthew L. Leavitt, Naomi Saphra.
[OpenReview] [Arxiv] [GitHub]

Latent State Models of Training Dynamics
Transactions on Machine Learning Research
Michael Y. Hu, Angelica Chen, Naomi Saphra, Kyunghyun Cho
[Arxiv] [OpenReview]

Improving How LLMs Learn From Feedback

Generalists vs. Specialists: Evaluating LLMs on Highly-Constrained Biophysical Sequence Optimization Tasks
ICML 2025
(Spotlight) NeurIPS 2024 Workshop on AI for New Drug Modalities (AIDrugX)
Angelica Chen, Samuel D. Stanton, Frances Ding, Robert G. Alberstein, Andrew M. Watkins, Richard Bonneau, Vladimir Gligorijević, Kyunghyun Cho, Nathan C. Frey
[Arxiv] [Github]

EvoPrompting: Language Models for Code-Level Neural Architecture Search
NeurIPS 2023 (poster)
Chen, Angelica, David M. Dohan and David R. So
[OpenReview] [Arxiv]

Learning from Natural Language Feedback
Transactions on Machine Learning Research
Chen, Angelica^*, Jérémy Scheurer^*, Tomasz Korbak, Jon Ander Campos, Jun Shern Chan, Samuel R. Bowman, Kyunghyun Cho, Ethan Perez
[OpenReview] [GitHub]

Pretraining Language Models with Human Preferences
ICML 2023 (oral)
Korbak, Tomasz, Kejian Shi, Angelica Chen, Rasika Bhalerao, Christopher L. Buckley, Jason Phang, Sam Bowman and Ethan Perez
[Arxiv]

Teaching BERT to Wait: Balancing Accuracy and Latency for Streaming Disfluency Detection
NAACL 2022 (oral)
Chen, Angelica, Victoria Zayats, Daniel David Walker and Dirk Ryan Padfield
[ACL Anthology]

Evaluating LLMs

Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs
Transactions on Machine Learning Research
Chen, Angelica, Jason Phang, Alicia Parrish, Vishakh Padmakumar, Chen Zhao, Samuel R. Bowman, Kyunghyun Cho.
[Arxiv] [OpenReview]

QuALITY: Question Answering with Long Input Texts, Yes!
NAACL 2022
Richard Yuanzhe Pang, Alicia Parrish, Nitish Joshi, Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, Samuel Bowman
[ACL Anthology]

BBQ: A hand-built bias benchmark for question answering
ACL Findings 2022
Parrish, Alicia, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Thompson, Phu Mon Htut and Sam Bowman
[ACL Anthology]

Invited Talks

“Understanding LLM Training” AI@NYU CILVR Seminar, October 23, 2024 (Video)
“Misleading Endpoints – Lessons from LLM Training Dynamics” (Plenary) ICML 2024 Workshop on High-Dimensional Learning Dynamics (HiLD), July 26, 2024
“Preference Learning Algorithms Do Not Learn Preference Rankings,” ICML 2024 Workshop on Models of Human Feedback for AI Alignment, July 26, 2024
“Interpreting Model Training,” Princeton Language and Intelligence Lunch Seminar Series, April 18, 2024 (Slides)
“Sudden Drops in the Loss,” Mechanistic Interpretability Reading Group, October 25, 2023
“Learning From Natural Language Feedback,” Cohere, August 2, 2023 (Video)
“Learning From Natural Language Feedback,” Mosaic ML, April 16, 2023
“Teaching BERT To Wait,” NYU Natural Language Understanding DS-GA 1012/LING-GA 1012, March 10, 2022 (Slides)