MISP-Bench: Decomposing User-Provided False Priors into Answer, Rationale, and Guard Effects
Large language models in clinical and educational settings routinely receive user-provided context containing incorrect prior beliefs. Existing benchmarks measure aggregate susceptibility to such priors but do not disentangle which structural com-ponent (the asserted answer, the supporting rationale, or their combination) drives the damage, nor test whether safety meta-prompts such as "verify the reasoning first" consistently mitigate it. We introduce MISP-Bench, a factorial benchmark of 1,724 audited multiple-choice items (1,430 MedMCQA medical + 294 GSM8K quantitative) evaluated under 13 prompt conditions across 10 open-weight instruction-tuned models (1B-27B) in chain-of-thought and direc