medRxivpreprint

Complex structural variation, phylogeny, and disease associations of the mucin pangenome

Mucins are large glycoproteins that provide hydration and barrier function to epithelial tissues. Although genetically heterogeneous, all mucins harbor a large exon composed of variable number tandem repeats (VNTRs). Short-read sequencing has limited our understanding of mucin VNTR diversity and makes disease association studies challenging. We leverage 296 long-read phased genome assemblies to characterize 14 mucin family members, achieving [≥]97% accuracy across 572 haplotypes. Phylogenetic haplogroup analysis reveals extraordinary structural heterozygosity, with MUC4 harboring the greatest allelic diversity (n=240 distinct lengths) and MUC12 the greatest size range ({Delta} = 55,233 bp;

cell biologygenetic and genomic medicinegenomics