Statistics

Datasets

We used IsoRank & IsoRankN on five eukaryotic PPI networks: H. sapiens (Human), M. musculus (Mouse), D. melanogaster (Fly), C. elegans (Worm), and S. cerevisiae (Yeast). Two forms of data were required as inputs, PPI networks and sequence similarity scores. The PPI networks were constructed by combining data from the DIP, BioGRID, and HPRD databases. In total, these five networks contained 87,737 proteins and 98,945 known interactions. The sequence similarity scores of pairs of proteins were the BLAST Bit-values of the sequences as retrieved from Ensembl.

 

Species Number of Proteins Number of Interactions
H. sapiens 22369 36387
M. musculus 24855 255
D. melanogaster 14098 25831
C. elegans 19756 4752
S. cerevisiae 6659 31899

Evaluation

We evaluated the biological relevance of our results against two gene ontology databases: GO and KEGG. We first measured the consistency of the predicted network alignment by computing the mean entropy of the predicted clusters. The entropy of a given cluster S*v is:

where pi is the fraction of S*v with GO or KEGG group ID i. Thus a cluster has lower entropy if its GO and KEGG annotations are more within-cluster consistent. We also measured the fraction of clusters which are exact, i.e. those in which all proteins have the same GO or KEGG ID. For GO annotation, we restricted to the deepest categories, removing questions of multiplicity and specificity of annotations. Note that only 60-70% of the proteins in any of the aligned networks have an assigned GO or KEGG ID, comparable to the fraction of all known proteins included in GO or KEGG. Additionally the relative performance under either consistency measure does not change when restricted to GO or KEGG individually.

Consistency IsoRank & IsoRankN
Mean entropy 0.274
Exact cluster ratio* 0.380(3079 of 8095)
Exact protein ratio* 0.261(9284 of 35604)
*The fraction of predicted clusters (proteins) which are exact.

Coverage* (# of species) IsoRank & IsoRankN
Total 12848/48978
2 3844/8739
3 4022/13533
4 2926/13991
5 2056/12715
*The number of predicted clusters containing exactly # species and number of constituent proteins in those clusters (#cluster / #proteins)

GO/KEGG IsoRank & IsoRankN
p-value 1.28 e-90
GO/KEGG category 712/2490
Human 632/2200
Mouse 605/2124
Fly 574/1787
Worm 552/1698
Yeast 368/938

The number of GO/KEGG categories enriched by IsoRank & IsoRankN. As computed by GO TermFinder, we remark that this excludes those proteins tagged IEA (inferred from electronic annotation).