Predicting county-level diagnosed diabetes prevalence in the United States using explainable gradient boosting and geographic interpretation
Diagnosed diabetes affects approximately 38.4 million Americans, but its burden is not evenly distributed across U.S. counties. Existing machine-learning studies have mainly focused on individual risk prediction using biometric, clinical, or survey variables. These approaches are less suited to explaining why diagnosed diabetes prevalence differs geographically across counties. We developed an explainable gradient-boosting framework for predicting county-level diagnosed diabetes prevalence across 2,957 U.S. counties using an ecological cross-sectional design. The analysis integrated food-environment, socioeconomic, occupational, demographic, health-behavior, and clinical indicators from five