research lab
👤
H3L-BENCH (Human-in-Loop Benchmark)
4/13/2026Muhammad Elfsaid, Tu Trinh, Kelvin Lian, Guanpé Lun, Nathan Hunt, Ernesto Hernández, Nandan Marasha, Yan...
Research🛡️
Defensive Refusal Bias: How Safety Alignment Fails Cyber Defenders
3/12/2026David Campbell, Neil Kale, Udari Madhushani Sehwag, Bert Huang, Nick Price, Dan Borges, Alex Levine...
Safety🧬
LLM Novice Uplift on Dual-Use, In Silico Biology Tasks
2/26/2026Chen Bo Calvin Zhang, Christina Q. Knight, Nicholas Kruus, Jason Hausenloy, Pedro Medeiros, Nathaniel...
Safety🤖
VeRO: An Evaluation Harness for Agents to Optimize Agents
2/25/2026Varun Ursekar, Apaar Shanker, Veronica Chatrath, Yuan Xue, Sam Denton
Agents, Post-Training, Evaluation and Alignment🎯
LHAW: Controllable Underspecification for Long-Horizon Tasks
2/12/2026George Pu, Michael S. Lee, Udari Madhushani Sehwag, David Lee, Bryan ZHU, Yash Maurya, Mohit Raghavend...
Agents, Safety, Evaluation and Alignment🔬
SciPredict: Can LLMs Predict the Outcomes of Research Experiments in Natural Sciences?
1/15/2026Udari Madhushani Sehwag, Elaine Lau, Haniyeh Ehsani Oskoule, Shayan Shabbi, Erich Liang, Andrea Tokad
Safety, Evaluation and Alignment✅
Agentic Rubrics as Contextual Verifiers for SWE Agents
1/6/2026Mohit Raghavendra, Arosha denul, Bing Liu, Tonghong He
Agents, Safety, Evaluation and Alignment⚖️
MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
12/22/2025Yu Yang Chiu, Michael S. Lee, Rachel Calcott, Brandon Handoko, Paul de Font-Reaulx, Paula Rodriguez, C...
Reasoning, Safety