MedQA Deep Robustness: Results

← Back to Repository

👋 Welcome to MedQA Deep Robustness Results

This page shows how state-of-the-art AI models perform on medical questions when faced with realistic challenges: authority figures, peer pressure, or conflicting information.

How to navigate:

Interactive chart: Use the dropdown menu to select different intervention types • Hover bars for details

💬 Explore Real Examples

See how different models respond to the same intervention on the same question. One example for each of the 8 intervention types. Click any card to view the full conversation and reasoning.