bioRxiv preprint 2016-09-06

A Common Class of Transcripts with 5′-Intron Depletion, Distinct Early Coding Sequence Features, and N1-Methyladenosine Modification

Introns are found in 5 untranslated regions (5UTRs) for 35% of all human transcripts. These 5UTR introns are not randomly distributed: genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5UTR intron status, we developed a classifier that can predict 5UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5 proximal-intron-minus-like-coding regions (\"5IM\" transcript

Bioinformatics

原文来源： https://doi.org/10.1101/057455