medRxiv preprint

Prompt-engineering improves clinical safety of large language models for opioid equipotency conversion

BackgroundLarge language models (LLMs) are increasingly used in medical education and clinical decision-making, but their reliability in high-risk medication dosing remains unclear. Opioid rotation is a common task requiring precise calculations where errors may result in overdose or inadequate pain relief. MethodsThirteen LLMs were tested using an API-based framework to ensure independent queries across trials. First, fictional clinical scenarios were tested to simulate real-world clinical situations involving opioid rotation; to test the effects of changes in wording, scenarios were revised into 4 "vignettes" showing the same clinical situation. Next, opioid pairs were tested with a random

pain medicine