Our Publications
2025
Weaver: Shrinking the Generation-Verification Gap with Weak Verifiers
Conference on Neural Information Processing Systems, 2025
ES-FoMo III @ICML 2025
PDF · Paper · Codebase · Blog post · Datasets and Models
RoboMonkey: Scaling Test-Time Sampling and Verification for Vision-Language-Action Models
Conference on Robot Learning, 2025
PDF · Paper · Codebase · Datasets and Models · Serving Engine
SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Models
Conference on Neural Information Processing Systems, 2025
Cartridges: Lightweight and general-purpose long context representations via self-study
ES-FoMo (Oral) III @ICML 2025
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
Conference on Language Modeling, 2025
Think, Prune, Train, Improve: Scaling Reasoning Without Scaling Models
International Conference on LLM-Aided Design, 2025
SSI-FM @ICLR 2025
How Do Large Language Monkeys Get Their Power (Laws)?
International Conference on Machine Learning, 2025
Oral
KernelBench: Can LLMs Write Efficient GPU Kernels?
International Conference on Machine Learning, 2025
DL4C (Best Paper) & SSI-FM @ ICLR 2025
PDF · Paper · KernelBench Codebase · KernelBench Dataset · Blog post
2024
Archon: An Architecture Search Framework for Inference-Time Techniques
International Conference on Machine Learning, 2024
SSI-FM (Oral) @ ICLR 2025
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
Conference on Language Modeling, 2024
Selected Prior Publications
Learning to Design Accurate Deep Learning Accelerators with Inaccurate Multipliers
Design, Automation and Test in Europe Conference and Exhibition, 2022
PDF · Learning to Design Accurate Deep Learning Accelerators with Inaccurate Multipliers
A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators
International Conference on Architectural Support for Programming Languages and Operating Systems, 2021
PDF · A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators
Representing Long-Range Context for Graph Neural Networks with Global Attention
Conference on Neural Information Processing Systems, 2021
PDF · Representing Long-Range Context for Graph Neural Networks with Global Attention
Transferable Graph Optimizers for ML Compilers
Conference on Neural Information Processing Systems, 2020
A HIERARCHICAL MODEL FOR DEVICE PLACEMENT
International Conference on Learning Representations, 2018
Device Placement Optimization with Reinforcement Learning
International Conference on Machine Learning, 2017
PDF · Device Placement Optimization with Reinforcement Learning
OUTRAGEOUSLY LARGE NEURAL NETWORKS: THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER
International Conference on Learning Representations, 2017
PDF · OUTRAGEOUSLY LARGE NEURAL NETWORKS: THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER