Accurate, comprehensive gene annotation and ortholog identification across thousands of vertebrate genomes with TOGA2
Inferring orthologs and annotating coding genes remain central challenges in genomics, evident by the growing gap between assembled and annotated genomes. TOGA (Tool to infer Orthologs from Genome Alignments) addresses this challenge by integrating gene annotation and orthology inference. Here, we present TOGA2, the next generation of TOGA, which substantially improves annotation completeness, accuracy, scalability, and orthology inference. TOGA2 leverages exon-level orthology and introduces an exon-wise annotation procedure that reduces memory usage 513-fold and runtime 6.1-fold. We show that human-trained deep learning models for splice site prediction generalize across vertebrates. Integr