bioRxiv preprint 2016-08-14

Peptide partitions and protein identification: a computational analysis

Peptide sequences from a proteome can be partitioned into N mutually exclusive sets and used to identify their parent proteins in a sequence database. This is illustrated with the human proteome (http://www.uniprot.org; id UP000005640), which is partitioned into eight subsets KZ*R, KZ*D, KZ*E, KZ*, Z*R, Z*D, Z*E, and Z*, where Z [isin] {A, N, C, Q, G, H, I, L, M, F, P, S, T, W, Y, V} and Z* {equiv} 0 or more occurrences of Z. If the full peptide sequence is known then over 98% of the proteins in the proteome can be identified from such sequences. The rate exceeds 78% if the positions of four internal residue types are known. When the standard set of 20 amino acids is replaced with an alphabe

Bioinformatics

原文来源： https://doi.org/10.1101/069526