This page shows how state-of-the-art AI models perform on medical
questions when faced with realistic challenges: authority figures, peer
pressure, or conflicting information.
How to navigate:
Interactive Chart: Shows overall performance drops
across all models and intervention types. Use the dropdown to explore
specific challenges.
Follow-ups: Descriptions of each intervention used in
the evaluation.
Real Examples: See actual model responses
side-by-side. Watch how one model flips its answer while another stays
correct on the identical question.
Interactive chart: Use the dropdown menu to select
different intervention types • Hover bars for details
💬 Explore Real Examples
See how different models respond to the same intervention on the
same question. One example for each of the 8 intervention types.
Click any card to view the full conversation and reasoning.