medRxivpreprint

Developing an OMOP-Standardized Prostate Cancer Database and Improving Data Quality Using NLP and PSA-Based Algorithms

Objective: To develop and evaluate an Observational Medical Outcomes Partnership (OMOP) standardized prostate cancer database from the University of Texas Medical Branch (UTMB) Epic Electronic Health Record (EHR) and improve data quality using natural language processing (NLP) and prostate-specific antigen (PSA) based algorithms. Materials and Methods: We built a data pipeline to transform UTMB Epic EHR data from 2010 to 2021 into OMOP Common Data Model (CDM) v5.4. Data quality was assessed by comparing the OMOP-standardized data with Galveston Cancer Registry data using availability agreement, Cohen's kappa, and Intraclass Correlation Coefficient. NLP was used to extract PSA, Gleason score,

cancerhealth informatics