research lab

👤

H3L-BENCH (Human-in-Loop Benchmark)

4/13/2026

Muhammad Elfsaid, Tu Trinh, Kelvin Lian, Guanpé Lun, Nathan Hunt, Ernesto Hernández, Nandan Marasha, Yan...

Research
🛡️

Defensive Refusal Bias: How Safety Alignment Fails Cyber Defenders

3/12/2026

David Campbell, Neil Kale, Udari Madhushani Sehwag, Bert Huang, Nick Price, Dan Borges, Alex Levine...

Safety
🧬

LLM Novice Uplift on Dual-Use, In Silico Biology Tasks

2/26/2026

Chen Bo Calvin Zhang, Christina Q. Knight, Nicholas Kruus, Jason Hausenloy, Pedro Medeiros, Nathaniel...

Safety
🤖

VeRO: An Evaluation Harness for Agents to Optimize Agents

2/25/2026

Varun Ursekar, Apaar Shanker, Veronica Chatrath, Yuan Xue, Sam Denton

Agents, Post-Training, Evaluation and Alignment
🎯

LHAW: Controllable Underspecification for Long-Horizon Tasks

2/12/2026

George Pu, Michael S. Lee, Udari Madhushani Sehwag, David Lee, Bryan ZHU, Yash Maurya, Mohit Raghavend...

Agents, Safety, Evaluation and Alignment
🔬

SciPredict: Can LLMs Predict the Outcomes of Research Experiments in Natural Sciences?

1/15/2026

Udari Madhushani Sehwag, Elaine Lau, Haniyeh Ehsani Oskoule, Shayan Shabbi, Erich Liang, Andrea Tokad

Safety, Evaluation and Alignment

Agentic Rubrics as Contextual Verifiers for SWE Agents

1/6/2026

Mohit Raghavendra, Arosha denul, Bing Liu, Tonghong He

Agents, Safety, Evaluation and Alignment
⚖️

MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

12/22/2025

Yu Yang Chiu, Michael S. Lee, Rachel Calcott, Brandon Handoko, Paul de Font-Reaulx, Paula Rodriguez, C...

Reasoning, Safety