Our Publications

2025

Intelligence Per Watt: A Study of Local Intelligence Efficiency

Jon Saad-Falcon*, Avanika Narayan*, Hakki Orhun Akengin, J. Wes Griffin, Herumb Shandilya, Adrian Gamarra Lafuente, Medhya Goel, Rebecca Joseph, Shlok Natarajan, Etash Kumar Guha, Shang Zhu, Ben Athiwaratkun, John Hennessy, Azalia Mirhoseini, Christopher Ré

Preprint, 2025

Weaver: Shrinking the Generation-Verification Gap with Weak Verifiers

Jon Saad-Falcon*, E. Kelly Buchanan*, Mayee F. Chen*, Tzu-Heng Huang, Brendan McLaughlin, Tanvir Bhathal, Shang Zhu, Ben Athiwaratkun, Frederic Sala, Scott Linderman, Azalia Mirhoseini, Christopher Ré

Conference on Neural Information Processing Systems, 2025
ES-FoMo III @ICML 2025

SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Models

Emil Biju*, Shayan Talaei*, Zhemin Huang*, Mohammadreza Pourreza, Amin Saberi, Azalia Mirhoseini

Conference on Neural Information Processing Systems, 2025

Cartridges: Lightweight and general-purpose long context representations via self-study

Sabri Eyuboglu*, Ryan Ehrlich*, Simran Arora*, Neel Guha, Dylan Zinsley, Emily Liu, Will Tennien, Atri Rudra, James Zou, Azalia Mirhoseini, Christopher Ré

ES-FoMo (Oral) III @ICML 2025

Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use

Anna Goldie*, Azalia Mirhoseini*, Hao Zhou, Irene Cai, Christopher D. Manning

Conference on Language Modeling, 2025

Think, Prune, Train, Improve: Scaling Reasoning Without Scaling Models

Caia Costello, Simon Guo, Anna Goldie, Azalia Mirhoseini

International Conference on LLM-Aided Design, 2025
SSI-FM @ICLR 2025

How Do Large Language Monkeys Get Their Power (Laws)?

Rylan Schaeffer, Joshua Kazdan, John Hughes, Jordan Juravsky, Sara Price, Aengus Lynch, Erik Jones, Robert Kirk, Azalia Mirhoseini, Sanmi Koyejo

International Conference on Machine Learning, 2025
Oral

KernelBench: Can LLMs Write Efficient GPU Kernels?

Anne Ouyang*, Simon Guo*, Simran Arora, Alex L. Zhang, William Hu, Christopher Ré, Azalia Mirhoseini

International Conference on Machine Learning, 2025
DL4C (Best Paper) & SSI-FM @ ICLR 2025

2024

Archon: An Architecture Search Framework for Inference-Time Techniques

Jon Saad-Falcon, Adrian Gamarra Lafuente, Shlok Natarajan, Nahum Maru, Hristo Todorov, Etash Guha, E. Kelly Buchanan, Mayee Chen, Neel Guha, Christopher Ré, Azalia Mirhoseini

International Conference on Machine Learning, 2024
SSI-FM (Oral) @ ICLR 2025

CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

Je-Yong Lee*, Donghyun Lee*, Genghan Zhang, Mo Tiwari, Azalia Mirhoseini

Conference on Language Modeling, 2024

CHESS: Contextual Harnessing for Efficient SQL Synthesis

Shayan Talaei, Mohammadreza Pourreza, Yu-Chen Chang, Azalia Mirhoseini, Amin Saberi

MAS@ICML 2025

Selected Prior Publications

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosuite, Liane Lovitt, Michael Sellitto, Nelson Elhage, Nicholas Schiefer, Noemi Mercado, Nova DasSarma, Robert Lasenby, Robin Larson, Sam Ringer, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Tamera Lanham, Timothy Telleen-Lawton, Tom Conerly, Tom Henighan, Tristan Hume, Samuel R. Bowman, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, Jared Kaplan

Anthropic, 2022

Learning to Design Accurate Deep Learning Accelerators with Inaccurate Multipliers

Paras Jain, Safeen Huda, Martin Mass, Joseph Gonzalez, Ion Stoica, Azalia Mirhoseini

Design, Automation and Test in Europe Conference and Exhibition, 2022

A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators

Dan Zhang, Safeen Huda, Ebrahim Songhori, Kartik Prabhu, Quoc Lee, Anna Goldie, Azalia Mirhoseini

International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

A graph placement methodology for fast chip design

Azalia Mirhoseini*, Anna Goldie*, Mustafa Yazgan, Joe Wenjie Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Azade Nazi, Jiwoo Pak, Andy Tong, Kavya Srinivasa, William Hang, Emre Tuncer, Quoc V. Le, James Laudon, Richard Ho, Roger Carpenter, Jeff Dean

Nature, 2021

Representing Long-Range Context for Graph Neural Networks with Global Attention

Zhanghao Wu, Paras Jain, Matthew Wright, Azalia Mirhoseini, Joseph E. Gonzalez, Ion Stocia

Conference on Neural Information Processing Systems, 2021

Transferable Graph Optimizers for ML Compilers

Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Wong, Peter Ma, Qiumin Xu, Hanxiao Liu, Phitchaya Phothilimthana, Shen Wang, Anna Goldie, Azalia Mirhoseini, James Laudon

Conference on Neural Information Processing Systems, 2020

A HIERARCHICAL MODEL FOR DEVICE PLACEMENT

Azalia Mirhoseini*, Anna Goldie*, Hieu Pham, Benoit Steiner, Quoc V. Le, Jeff Dean

International Conference on Learning Representations, 2018

Device Placement Optimization with Reinforcement Learning

Azalia Mirhoseini*, Hieu Pham*, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jeff Dean

International Conference on Machine Learning, 2017

OUTRAGEOUSLY LARGE NEURAL NETWORKS: THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER

Noam Shazeer*, Azalia Mirhoseini*, Krzysztof Maziarz*, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean

International Conference on Learning Representations, 2017