1 http://www.uniprot.org/locations/?query=*.

2 http://www.proteinatlas.org/.

3 http://www.uniprot.org/.

4 Note that a real protein sequence should have more than six amino acid residues. For ease of presentation, we truncate the length to six.

5 To distinguish the PairAA-based vector from the AA-based vector, we use [fi,21,fi,22,…] instead of [fi,1,fi,2,…] to represent the elements of a PairAA-based vector. Similar notations apply to GapAA.

6 When m = 1, the feature vectors represent the sequence-order correlation factor of dipeptides.

7 http://www.geneontology.org.

8 ftp://ftp.ebi.ac.uk/pub/databases/interpro/interpro2go.

9 http://www.ebi.ac.uk/GOA

10 http://www.ebi.ac.uk/GOA.

11 http://www.geneontology.org.

12 ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/.

13 http://geneontology.org/page/experimental-evidence-codes.

14 Note that these four types of features are used independently, rather than being combined for classification. Through performance evaluation and experimentation, we would like to find out which type of the four features performs the best.

15 http://www.ebi.ac.Uk/Tools/pfa/iprscan/#.

16 SVM scores larger than one means that the test proteins fall beyond the margin of separation; therefore, the confidence is fairly high.

17 Strictly speaking, image should be image, where ki is the ki-th homolog used to retrieve the GO terms in Section 4.1.2 for the i-th protein. To simplify notations, we write it as image.

18 http://www.geneontology.org.

19 http://www.ebi.ac.uk/GOA

20 http://www.ncbi.nlm.nih.gov/Web/Newsltr/Spring04/blastlab.html.

21 Here, N = 207 for the virus dataset, N = 978 for the plant dataset, and N = 7766 for the eukaryotic dataset.

22 The dimensionality of the original feature vectors for the plant dataset is 1541.

23 http://geneontology.org/page/go-consortium-contributors-list.

24 Note that in our proposed methods we do not incorporate this information. This is because we have done some experiments (results not shown) by using this information in our feature vectors, but the performance remains almost the same.

25 We excluded Resnik’s measure because it ignores the distance between the terms and their common ancestors in the GO hierarchy.

26 xi here is equivalent to qi in equations (4.2)(4.5).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset