medRxiv preprint

Asymmetry between warmth and clinical substance in multilingual consumer health AI

The same patient question can yield different clinical quality across languages. Across 504 forum-derived patient queries in six languages and four chatbots, language-matched clinicians rated responses on five clinical dimensions (1,008 ratings; 5,040 dimension scores). Patient language outweighed chatbot identity across the four clinical-substance dimensions (composite language partial {superscript 2} 0.275 vs chatbot 0.035; robust to investigator-rating exclusion: {superscript 2} 0.260) but not for empathy ({superscript 2} 0.029): clinical substance was language-associated; warmth was relatively preserved. Catastrophic safety ratings ranged 4.3-fold by language (3.6% English, 15.5% Thai an

health informatics