Standardised evaluation and monitoring of site-specific AI performance with physical CT phantoms
Artificial intelligence (AI) applications in computed tomography (CT) imaging require objective and continuous testing, yet standardised methods for this purpose have not been established. Here, we present a framework using physical phantoms for standardised testing and monitoring of AI, demonstrated in liver lesion detection. We begin by designing phantoms tailored to the anatomical input domain expected by AI algorithms, and then systematically assess how AI performance is affected by variations in scanner technology and operation across two clinical CT systems. Next, we perform longitudinal monitoring, yielding consistent results over fifteen months on both systems. Finally, we validate c