Scaling Verification Can Be More Effective than Scaling Policy Learning for Vision-Language-Action Alignment

Jacky Kwok* Stanford University

Xilun Zhang* Stanford University

Mengdi Xu Stanford University

Yuejiang Liu Stanford University

Azalia Mirhoseini Stanford University

Chelsea Finn Stanford University

Marco Pavone Stanford, NVIDIA

Preprint, 2026


CoVer-VLA introduces a contrastive verifier for vision-language-action alignment, demonstrating that scaling test-time verification yields larger gains than scaling policy pre-training.