[1] H. Abdi and L. J. Williams, Jackknife, Encyclopedia of Research Design, Sage, 2010, pp. 655–660.
[2] D. Achlioptas, Database-friendly random projections: Johnson-Lindenstrauss with binary coins, Journal of Computer and Systems Sciences 66 (2003), 671–687.
[3] A. Adelfio, V. Volpato and G. Pollastri, SCLpredT: Ab initio and homology-based prediction of subcellular localization by N-to-1 neural networks, SpringerPlus 2 (2013), 1–11.
[4] B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts and P. Walter, Molecular biology of the cell, 4, ch. 10-18, Garland Science, 2002.
[5] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller and D. J. Lipman, Gapped BLAST and PSI-BLAST: A new generation of protein database search programs, Nucleic Acids Res. 25 (1997), 3389–3402.
[6] R. Apweiler, A. Bairoch, C. H. Wu, W. C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopez, M. Magrane, M. J. Martin, D. A. Ntale, C. O’Donovan, N. Redaschi and L. S. Yeh, UniProt: the Universal Protein knowledgebase, Nucleic Acids Res 32 (2004), D115–D119.
[7] T. K. Attwood, M. D. R. Croning, D. R. Flower, A. P. Lewis, J. E. Mabey, P. Scordis, J. Selley and W. Wright, PRINTS-S: the database formerly known as PRINTS, Nucleic 28 (2000), 225–227.
[8] R. Auckenthaler, M. Carey and H. Lloyd-Thomas, Score normalization for text-independent speaker verification systems, Digital Signal Processing 10 (2000), 42–54.
[9] T. Bakheet and A. Doig, Properties and identification of human protein drug targets, Bioinformatics 25 (2009), 451–457.
[10] D. Barrel, E. Dimmer, R. P. Huntley, D. Binns, C. O’Donovan and R. Apweiler, The GOA database in 2009 – an integrated Gene Ontology Annotation resource, Nucl. Acids Res. 37 (2009), D396–D403.
[11] Z. Barutcuoglu, R. E. Schapire and O. G. Troyanskaya, Hierarchical multi-label prediction of gene function, Bioinformatics 22 (2006), 830–836.
[12] A. Bateman, E. Birney, R. Durbin, S. Eddy, K. L. Howe and E. L. Sonnhammer, The Pfam Protein Families Database, Nucleic 28 (2000), 263–266.
[13] J. D. Bendtsen, H. Nielsen, G. von Heijne and S. Brunak, Improved prediction of signal peptides: SignalP 3.0, J. Mol. Biol. 340 (2004), 783–795.
[14] Y. Bengio, O. Delalleau and N. L. Roux, The curse of dimensionality for local kernel machines , Universite de Montreal, Report, 2005.
[15] M. Bhasin and G. P. S. Raghava, ESLpred: SVM based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST, Nucleic Acids Res. 32 (2004), 414–419.
[16] E. Bingham and H. Mannila, Random projection in dimension reduction: Applications to image and text data, in: the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’01), pp. 245–250, 2001.
[17] D. Binns, E. Dimmer, R. Huntley, D. Barrell, C. O’Donovan and R. Apweiler, QuickGO: a web-based tool for Gene Ontology searching, Bioinformatics 25 (2009), 3045–3046.
[18] T. Blum, S. Briesemeister and O. Kohlbacher, MultiLoc2: Integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction, BMC Bioinformatics 10 (2009), 274.
[19] O. Bodenreider, M. Aubry and A. Burgun, Non-lexical approaches to identifying associative relations in the gene ontology, in: Pac. Symp. Biocomput., pp. 91–102, 2005. 178
[20] B. Boeckmann, A. Bairoch, R. Apweiler, M. C. Blatter, A. Estreicher, E. Gasteiger, M. J. Martin, K. Michoud, C. O’Donovan, I. Phan, S. Pilbout and M. Schneider, The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003, Nucleic Acids Res. 31 (2003), 365–370.
[21] M. Boutell, J. Luo, X. Shen and C. Brown, Learning multi-label scene classification, Pattern Recognition 37 (2004), 1757–1771.
[22] S. Brady and H. Shatkay, EpiLoc: a (working) text-based system for predicting protein subcellular location, in: Pac. Symp. Biocomput., pp. 604–615, 2008.
[23] C. L. Branden and J. Tooze, Introduction to protein structure, pp. 251–281, Garland Science, 1991.
[24] S. Briesemeister, T. Blum, S. Brady, Y. Lam, O. Kohlbacher and H. Shatkay, SherLoc2: A high-accuracy hybrid method for predicting subcellular localization of proteins, Journal of Proteome Research 8 (2009), 5363–5366.
[25] G. S. Butler and C. M. Overall, Proteomic identification of multitasking proteins in unexpected locations complicates drug targeting, Nat. Rev. Drug Discov. 8 (2009), 935–948.
[26] E. Camon, M. Magrane, D. Barrel, D. Binns, W. Fleischnann, P. Kersey, N. Mulder, T. Oinn, J. Maslen and A. Cox, The gene ontology annotation (GOA) project: Implementation of GO in SWISS-PROT, TrEMBL and InterPro, Genome Res. 13 (2003), 662–672.
[27] E. J. Candes and T. Tao, Near-optimal signal recovery from random projections: Universal encoding strategies?, IEEE Transactions on Information Theory 52 (2006), 5406–5425.
[28] J. Cedano, P. Aloy, J. A. Perez-Pons and E. Querol, Relation between amino acid composition and cellular location of proteins, J. Mol. Biol. 266 (1997), 594–600.
[29] J. Chabalier, J. Mosser and A. Burgun, A trasversal approach to predict gene product networks from ontology-based similarity, BMC Bioinformatics 8 (2007), 235.
[30] Y. Chen, C. F. Chen, D. J. Riley, D. C. Allred, P. L. Chen, D. V. Hoff, C. K. Osborne and W. H. Lee, Aberrant Subcellular Localization of BRCA1 in Breast Cancer, Science 270 (1995), 789–791.
[31] J. Cheng, M. Cline, J. Martin, D. Finkelstein, T. Awad and et al., A knowledge-based clustering algorithm driven by gene ontology, Journal of Biopharmaceutical Statistics 14 (2004), 687–700.
[32] S.-M. Chi and D. Nam, WegoLoc: accurate prediction of protein subcellular localization using weighted Gene Ontology terms, Bioinformatics 28 (2012), 1028–1030.
[33] K. C. Chou, Prediction of protein structural classes and subcellular locations, Current Protein Peptide Science 1 (2000), 171–208.
[34] K. C. Chou, Prediction of Protein Subcellular Locations by Incorporating Quasi-Sequence-Order Effect, Biochem. Biophys. Res. Commun. 278 (2000), 477–483.
[35] K. C. Chou, Prediction of protein cellular attributes using pseudo amino acid composition, Proteins: Structure, Function, and Genetics 43 (2001), 246–255.
[36] K. C. Chou, Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics 21 (2005), 10–19.
[37] K. C. Chou, Some remarks on protein attribute prediction and pseudo amino acid composition (50th Anniversary Year Review), Journal of Theoretical Biology 273 (2011), 236–247.
[38] K. C. Chou, Some remarks on predicting multi-label attributes in molecular biosystems, Molecular BioSystems 9 (2013), 1092–1100.
[39] K. C. Chou and Y. D. Cai, Prediction and classification of protein subcellular location: sequence-order effect and pseudo amino acid composition, J. of Cell. Biochem. 90 (2003), 1250–1260.
[40] K. C. Chou and Y. D. Cai, Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition, Journal of Cellular Biochemistry 91 (2004), 1197–1203. 179
[41] K. C. Chou and Y. D. Cai, Prediction of protein subcellular locations by GO-FunD-PseAA predictor, Biochem. Biophys. Res. Commun. 320 (2004), 1236–1239.
[42] K. C. Chou and Y. D. Cai, Predicting protein localization in budding yeast, Bioinformatics 21 (2005), 944–950.
[43] K. C. Chou and D. W. Elord, Protein subcellular location prediction, Protein Eng. 12 (1999), 107–118.
[44] K. C. Chou and H. B. Shen, Hum-PLoc: A novel ensemble classifier for predicting human protein subcellular localization, Biochem Biophys Res Commun 347 (2006), 150–157.
[45] K. C. Chou and H. B. Shen, Large-scale predictions of gram-negative bacterial protein subcellular locations, Journal of Proteome Research 5 (2006), 3420–3428.
[46] K. C. Chou and H. B. Shen, Predicting Eukaryotic Protein Subcellular Location by Fusing Optimized Evidence-Theoretic K-Nearest Neighbor Classifiers, J. of Proteome Research 5 (2006), 1888–1897.
[47] K. C. Chou and H. B. Shen, Euk-mPLoc: A fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites, Journal of Proteome Research 6 (2007), 1728–1734.
[48] K. C. Chou and H. B. Shen, Cell-PLoc: A package of web-servers for predicting subcellular localization of proteins in various organisms, Nature Protocols 3 (2008), 153–162.
[49] K. C. Chou and H. B. Shen, A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple site: Euk-mPLoc 2.0, PLoS ONE 5 (2010), e9931.
[50] K. C. Chou and H. B. Shen, Cell-PLoc 2.0: An improved package of web-servers for predicting subcellular localization of proteins in various organisms, Nat. Sci. 2 (2010), 1090–1103.
[51] K. C. Chou and H. B. Shen, Plant-mPLoc: A top-down strategy to augment the power for predicting plant protein subcellular localization, PLoS ONE 5 (2010), e11335.
[52] K. C. Chou, Z. C. Wu and X. Xiao, iLoc-Euk: A multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins, PLoS ONE 6 (2011), e18258.
[53] K. C. Chou, Z. C. Wu and X. Xiao, iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites, Molecular BioSystems 8 (2012), 629–641.
[54] K. C. Chou and C. T. Zhang, Review: Prediction of protein structural classes, Critical Reviews in Biochemistry and Molecular Biology 30 (1995), 275–349.
[55] A. Clare and R. D. King, Knowledge discovery in multi-label phenotype data, in: Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 42–53, 2001.
[56] T. I. Consortium, The InterPro database, an integrated documentation resource for protein families, domain and functional sites, Nucleic 29 (2001), 37–40.
[57] F. Corpet, F. Servant, J. Gouzy and D. Kahn, ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons, Nucleic 28 (2000), 267–269.
[58] F. M. Couto, M. J. Silva and P. M. Coutinho, Semantic similarity over the gene ontology: Family correlation and selecting disjunctive ancestors, in: Proceedings of 14-th International ACM Conference in Information and Knowledge Management, pp. 343–344, 2005.
[59] S. Dasgupta, Learning mixtures of Gaussians, in: 40th Annual IEEE Symposium on Foundations of Computer Science,, pp. 634–644, 1999.
[60] K. Dembczynski, W. Waegeman, W. Cheng and E. Hullermeier, On label dependence and loss minimization in multi-label classification, Machine Learning 88 (2012), 5–45.
[61] T. G. Dietterich and g. Bakari, Solving multiclass learning problem via error-correcting output codes, Journal of Artificial Intelligence Research (1995), 263–286. 180
[62] A. Elisseeff and J. Weston, Kernel methods for Multi-labelled classification and Categorical regression problems, in: In Advances in Neural Information Processing Systems 14, pp. 681–687, MIT Press, 2001.
[63] O. Emanuelsson, Predicting protein subcellular localisation from amino acid sequence information, Briefings in Bioinformatics 3 (2002), 361–376.
[64] O. Emanuelsson, S. Brunak, G. von Heijne and H. Nielsen, Locating proteins in the cell using TargetP, SignalP, and related tools, Nature Protocols 2 (2007), 953–971.
[65] O. Emanuelsson, H. Nielsen, S. Brunak and G. von Heijne, Predicting subcellular localization of proteins based on their N-terminal amino acid sequence, J. Mol. Biol. 300 (2000), 1005–1016.
[66] O. Emanuelsson, H. Nielsen and G. von Heijne, ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites, Protein Science 8 (1999), 978–984.
[67] O. Emanuelsson, G. von Heijne and G. Schneider, Analysis and prediction of mitochondrial targeting peptides, Methods in Cell Biology 65 (2001), 175–187.
[68] D. Faria, C. Pesquita, F. M. Couto and A. Falcão, ProtelnOn: A web tool for protein semantic similarity, 2007.
[69] G. Forman, An extensive empirical study of feature selection metrics for text classification, The Journal of Machine Learning Research 3 (2003), 1289–1305.
[70] L. J. Foster, C. L. D. Hoog, Y. Zhang, Y. Zhang, X. Xie, V. K. Mootha and M. Mann, A mammalian organelle map by protein correlation profiling, Cell 125 (2006), 187–199.
[71] P. Frankl and H. Maehara., The Johnson-Lindenstrauss lemma and the sphericity of some graphs, Journal of Combinatorial Theory, Series Β 44 (1988), 355–362.
[72] Y. Freund and R. Schapire, A short introduction to boosting, Journal of Japanese Society for Artificial Intelligence 14 (1999), 1612.
[73] A. Fyshe, Y. Liu, D. Szafron, R. Greiner and P. Lu, Improving subcellular localization prediction using text classification and the gene ontology, Bioinformatics 24 (2008), 2512–2517.
[74] W. Gao and Z.-H. Zhou, On the consistency of multi-label learning, Artificial Intelligence 199–200 (2013), 22–44.
[75] A. Garg, M. Bhasin and G. P. S. Raghava, Support Vector Machine-based method for subcellular localization of human proteins using amino acid compositions, their order and similarity search, J. of Biol. Chem. 280 (2005), 14427–14432.
[76] N. Ghamrawi and A. McCallum, Collective multi-label classification, in: Proceedings of the 2005 ACM Conference on Information and Knowledge Management (CIKM’05), pp. 195–200, 2005.
[77] L. M. Gierasch, Signal sequences, Biochemistry 28 (1989), 923–930.
[78] L. Gillick and S. J. Cox, Some statistical issues in the comparison of speech recognition algorithms, in: 1989 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’89), IEEE, pp. 532–535, 1989.
[79] S. Godbole and S. Sarawagi, Discriminative Methods for Multi-Labeled Classification, in: Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 22–30, Springer, 2004.
[80] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri et al., Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, Science 286 (1999), 531–537.
[81] B. M. Gumbiner, Cell adhesion: the molecular basis of tissue architecture and morphogenesis, Cell 84 (1996), 345–357. 181
[82] X. Guo, R. Liu, C. D. Shriver, H. Hu and M. N. Liebman, Assessing semantic similarity measures for the characterization of human regulatory pathways, Bioinformatics 22 (2006), 967–973.
[83] I. Guyon, J. Weston, S. Barnhill and V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learning 46 (2002), 389–422.
[84] P. H. Guzzi, M. Mina, C. Guerra and M. Cannataro, Semantic similarity analysis of protein data: assessment with biological features and issues, Briefings in Bioinformatics 13 (2012), 569–585 (eng).
[85] A. Hadgu, An application of ridge regression analysis in the study of syphilis data, Statistics in Medicine 3 (1984), 293–299.
[86] T. Hastie, R. Tibshirani and J. Friedman, The element of statistical learning, Springer-Verlag, 2001.
[87] A. Hayama, T. Rai, S. Sasaki and S. Uchida, Molecular mechanisms of Bartter syndrome caused by mutations in the BSND gene, Histochem. and Cell Biol. 119 (2003), 485–493.
[88] J. He, H. Gu and W. Liu, Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites, PLoS ONE 7 (2011), e37155.
[89] G. Heijne, J. Steppuhn and R. G. Herrmann, Domain structure of mitochondrial and chloroplast targeting peptides, European Journal of Biochemistry 180 (1989), 535–545.
[90] K. Hiller, A. Grote, M. Scheer, R. Munch and D. Jahn, PrediSi: Prediction of signal peptides and their cleavage positions, Nucleic Acids Research 32 (2004), 375–379.
[91] K. Hofmann, P. Bucher, L. Falquet and A. Bairoch, The PROSITE database, its status in 1999, Nucleic 27 (1999), 215–219.
[92] A. Hoglund, P. Donnes, T. Blum, H. Adolph and O. Kohlbacher, MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition, Bioinformatics 22 (2006), 1158–1165.
[93] T. P. Hopp and K. R. Woods, Prediction of protein antigenic determinants from amino acid sequences, Proceedings of the National Academy of Sciences 78 (1981), 3824–3828.
[94] P. Horton, K. J. Park, T. Obayashi and K. Nakai, Protein subcellular localization prediction with WOLF PSORT, in: Proc. 4th Annual Asia Pacific Bioinformatics Conference (APBC06), pp. 39–48, 2006.
[95] D. W. Hosmer and S. Lemeshow, Applied Logistic Regression, second ed, Wiley, 2000.
[96] D. Hsu, S. M. Kakade, J. Langford and T. Zhang, Multi-label prediction via compressed sensing, in: Advances in Neural Information Processing Systems 22, pp. 772–780, 2009.
[97] Y. Hu, T. Li, J. Sun, S. Tang, W. Xiong, D. Li, G. Chen and P. Cong, Predicting Gram-positive bacterial protein subcellular localization based on localization motifs, Journal of Theoretical Biology 308 (2012), 135–140.
[98] D. W. Huang, B. T. Sherman, Q. Tan, J. R. Collins, W. G. Alvord, J. Roayaei, R. Stephens, M. W. Baseler, H. C. Lane and R. A. Lempicki, The DAVID Gene Functional Classification Tool:a novel biological module-centric algorithm to functionally analyze large gene lists, Genome Biology 8 (2007).
[99] W. L. Huang, C. W. Tung, S. W. Ho, S. F. Hwang and S. Y. Ho, ProLoc-GO: Utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization, BMC Bioinformatics 9 (2008), 80.
[100] W. L. Huang, C. W. Tung, S. W. Ho, S. F. Hwang and S. Y. Ho, Predicting protein subnuclear localization using GO-amino-acid composition features, Biosystems 98 (2009), 73–79.
[101] Y. Huang and Y. D. Li, Prediction of protein subcellular locations using fuzzy K-NN method, Bioinformatics 20 (2004), 21–28. 182
[102] M. C. Hung and W. Link, Protein localization in disease and therapy, J. of Cell Sci. 124 (2011), 3381–3392.
[103] J. J. Jiang and D. W. Conrath, Semantic similarity based on corpus statistics and lexical taxonomy, in: Proceedings of International Conference Research on Computational Linguistics (ROCLING X), pp. 19–33, 1997.
[104] W. B. Johnson and J. Lindenstrauss, Extensions of Lipschitz mappings into a Hilbert space, in: Conference in Modern Analysis and Probability, pp. 599–608, 1984.
[105] I. Katakis, G. Tsoumakas and I. Vlahavas, Multilabel text classification for automated tag suggestion, in: Proceedings of the ECML/PKDD 2008 Discovery Challenge, 2008.
[106] M. D. Kaytor and S. T. Warren, Aberrant Protein Deposition and Neurological Disease, J. Biol. Chem. 274 (1999), 37507–37510.
[107] J. K. Kim, G. P. S. Raghava, S. Y. Bang and S. Choi, Prediction of subcellular localization of proteins using pairwise sequence alignment and support vector machine, Pattern Recog. Lett. 27 (2006), 996–1001.
[108] J. Kittler, M. Hatef, R. P. W. Duin and J. Matas, On combining classifiers, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998), 226–239.
[109] T. Kleffmann, D. Russenberger, A. von Zychlinski, W. Christopher, K. Sjolander, W. Gruissem and S. Baginsky, The Arabidopsis thaliana chloroplast proteome reveals pathway abundance and novel protein functions, Current Biology 14 (2004), 354–362.
[110] U. Kressel, Pairwise classification and support vector machines, in: Advances in Kernel Methods: Support Vector Learning, Chap. 15. MIT Press, 1999.
[111] V. Krutovskikh, G. Mazzoleni, N. Mironov, Y. Omori, A. M. Aguelon, M. Mesnil, F. Berger, C. Partensky and H. Yamasaki, Altered homologous and heterologous gap-junctional intercellular communication in primary human liver tumors associated with aberrant protein localization but not gene mutation of connexin 32, Int. J. Cancer 56 (1994), 87–94.
[112] S. Y. Kung and M. W. Mak, On consistent fusion of multimodal biometrics, in: 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’06), 5, IEEE, pp. V1085–V1088, 2006.
[113] S. Y. Kung, M. W. Mak and S. H. Lin, Biometric authentication: a machine learning approach, Prentice Hall Press, 2005.
[114] M. Kurimo, Indexing audio documents by using latent semantic analysis and SOM, in: Kohonen Maps, pp. 363–374, Elsevier, 1999.
[115] K. Y. Lee, H. Y. Chuang, A. Beyer, M. K. Sung, W. K. Huh, B. Lee and T. Ideker, Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species, Nucleic Acids Research 36 (2008), e136.
[116] K. Y. Lee, D. W. Kim, D. K. Na, K. H. Lee and D. H. Lee, PLPD: Reliable protein localization prediction from imbalanced and overlapped datasets, Nucleic Acids Research 34 (2006), 4655–4666.
[117] X. Lee, J. C. J. Keith, N. Stumm, I. Moutsatsos, J. M. McCoy, C. P. Crum, D. Genest, D. Chin, C. Ehrenfels, R. Pijnenborg, F. A. V. Assche and S. Mi, Downregulation of placental syncytin expression and abnormal protein localization in pre-eclampsia, Placenta 22 (2001), 808–812.
[118] Z. Lei and Y. Dai, Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction, BMC Bioinformatics 7 (2006), 491.
[119] F. M. Li and Q. Z. Li, Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach, Protein and Peptide Letters 15 (2008), 612–616.
[120] L. Q. Li, Y. Zhang, L. Y. Zou, C. Q. Li, B. Yu, X. Q. Zheng and Y. Zhou, An ensemble classifier for eukaryotic protein subcellular location prediction using Gene Ontology categories and amino acid hydrophobicity, PLoS ONE 7 (2012), e31057. 183
[121] L. Q. Li, Y. Zhang, L. Y. Zou, Y. Zhou and X. Q. Zheng, Prediction of protein subcellular multi-localization based on the general form of Chou’s pseudo amino acid composition, Protein and Peptide Letters 19 (2012), 375–387.
[122] T. Li and M. Ogihara, Toward intelligent music information retrieval, IEEE Transactions on Multimedia 8 (2006), 564–574.
[123] D. Lin, An information-theoretic definition of similarity, in: Proceedings of the 15th International Conference on Machine Learning, pp. 296–304, 1998.
[124] H. Lin, H. Ding, F. B. Guo, A. Y. Zhang and J. Huang, Predicting subcellular localization of mycobacterial proteins by using Chou’s pseudo amino acid composition, Protein and Peptide Letters 15 (2008), 739–744.
[125] T. Liu, X. Geng, X. Zheng, R. Li and J.Wang, Accurate prediction of protein structural class using auto covariance transformation of PSI-BLAST profiles, Amino Acids 42 (2011), 2243–2249.
[126] P. W. Lord, R. D. Stevens, A. Brass and C. A. Goble, Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation , Bioinformatics 19 (2003), 1275–1283.
[127] R. Lotlikar and R. Kothari, Adaptive linear dimensionality reduction for classification, Pattern Recognition 33 (2000), 185–194.
[128] Z. Lu and L. Hunter, GO molecular function terms are predictive of subcellular localization, in: In Proc. of Pac. Symp. Biocomput. (PSB’05), pp. 151–161, 2005.
[129] Z. Lu, D. Szafron, R. Greiner, P. Lu, D. S. Wishart, B. Poulin, J. Anvik, C. Macdonell and R. Eisner, Predicting subcellular localization of proteins using machine-learned classifiers, Bioinformatics 20 (2004), 547–556.
[130] G. Lubec, L. Afjehi-Sadat, J. W. Yang and J. P. John, Searching for hypothetical proteins: Theory and practice based upon original data and literature, Prog. Neurobiol 77 (2005), 90–127.
[131] S. R. Maetschke, M. Towsey and M. B. Boden, BLOMAP: An encoding of amino acids which improves signal peptide cleavage site prediction, in: 3rd Asia Pacific Bioinformatics Conference (Y. P. P. Chen and L. Wong, eds.), pp. 141–150, Singapore, 17-21 Jan 2005.
[132] M. W. Mak, J. Guo and S. Y. Kung, PairProSVM: Protein Subcellular Localization Based on Local Pairwise Profile Alignment and SVM, IEEE/ACM Trans. on Computational Biology and Bioinformatics 5 (2008), 416 – 422.
[133] M. W. Mak and S. Y. Kung, Conditional random fields for the prediction of signal peptide cleavage sites, in: 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’09), IEEE, pp. 1605–1608, 2009.
[134] M. W. Mak, W. Wang and S. Y. Kung, Fast subcellular localization by cascaded fusion of signal-based and homology-based methods, Proteome science 9 (2011), S8.
[135] W. Margolin, Green fluorescent protein as a reporter for macromolecular localization in bacterial cells, Methods 20 (2000), 62–72.
[136] D. W. Marquardt and R. D. Snee, Ridge regression in practice, The American Statistician 29 (1975), 3–20.
[137] B. Martoglio and B. Dobberstein, Signal sequences: more than just greasy peptides, Trends in cell biology 8 (1998), 410–415.
[138] B. Matthews, Comparison of predicted and observed secondary structure of T4 phage lysozyme, Biochem. Biophys. Acta 405 (1975), 442–451.
[139] T. M. Mayhew and J. M. Lucocq, Developments in cell biology for quantitative immunoelectron microscopy based on thin sections: a review, Histochemistry and Cell Biology 130 (2008), 299–313.
[140] Q. McNemar, Note on the sampling error of the difference between correlated proportions or percentages, Psychometrika 12 (1947), 153–157. 184
[141] S. Mei, Multi-label multi-kernel transfer learning for human protein subcellular localization, PLoS ONE 7 (2012), e37716.
[142] S. Y. Mei, W. Fei and S. G. Zhou, Gene ontology based transfer learning for protein subcellular localization, BMC Bioinformatics 12 (2011), 44.
[143] A. H. Millar, C. Carrie, B. Pogson and J. Whelan, Exploring the function-location nexus: using multiple lines of evidence in defining the subcellular location of plant proteins, Plant Cell 21 (2009), 1625–1631.
[144] M. Mistry and P. Pavlidis, Gene ontology term overlap as a measure of gene functional similarity, BMC Bioinformatics 9 (2008), 327.
[145] R. Moskovitch, S. Cohenkashi, U. Dror, I. Levy, A. Maimon and Y. Shahar, Multiple hierarchical classification of free-text clinical guidelines, Artificial Intelligence in Medicine 37 (2006), 177–190.
[146] R. Mott, J. Schultz, P. Bork and C. Ponting, Predicting protein cellular localization using a domain projection method, Genome research 12 (2002), 1168–1174.
[147] J. C. Mueller, C. Andreoli, H. Prokisch and T. Meitinger, Mechanisms for multiple intracellular localization of human mitochondrial proteins, Mitochondrion 3 (2004), 315–325.
[148] N.J. Mulder and R. Apweiler, The InterPro database and tools for protein domain analysis, Current Protocols in Bioinformatics 2 (2008), 1–18.
[149] N.J. Mulder, R. Apweiler, T. K. Attwood, A. Bairoch, D. Barrell, A. Bateman, D. Binns, M. Biswas, P. Bradley and P. Bork, The InterPro Database, 2003 brings increased coverage and new features, Nucleic Acids Res. 31 (2003), 315–318.
[150] R. F. Murphy, Communicating subcellular distributions, Cytometry 77 (2010), 686–92.
[151] R. Nair and B. Rost, Sequence conserved for subcellular localization, Protein Science 11 (2002), 2836–2847.
[152] R. Nair and B. Rost, Protein subcellular localization prediction using artificial intelligence technology, Functional Proteomics, Springer, 2008, pp. 435–463.
[153] K. Nakai, Protein sorting signals and prediction of subcellular localization, Advances in Protein Chemistry 54 (2000), 277–344.
[154] K. Nakai and M. Kanehisa, Expert system for predicting protein localization sites in gram-negative bacteria, Proteins: Structure, Function, and Genetics 11 (1991), 95–110.
[155] H. Nakashima and K. Nishikawa, Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies, J. Mol. Biol. 238 (1994), 54–61.
[156] H. Nielsen, S. Brunak and G. von Heijne, Machine learning approaches for the prediction of signal peptides and other protein sorting signals, Protein Eng. 12 (1999), 3–9.
[157] H. Nielsen, J. Engelbrecht, S. Brunak and G. von Heijne, A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites, Int. J. Neural Sys. 8 (1997), 581–599.
[158] H. Nielsen and A. Krogh, Prediction of signal peptides and signal anchors by a hidden Markov model, in: Proc. Sixth Int. Conf. on Intelligent Systems for Molecular Biology (J. G. et al., ed.), pp. 122–130, AAAI Press, 1998.
[159] J. Odell, Six different kinds of aggression, Advanced object-oriented analysis and design using UML, Cambridge University Press, 1998, pp. 139–149.
[160] Y. X. Pan, Z. Z. Zhang, Z. M. Guo, G. Y. Feng, Z. D. Huang and L. He, Application of pseudo amino acid composition for predicting protein subcellular location: stochastic signal processing approach, J. of Protein Chem. 22 (2003), 395–402.
[161] C. H. Papadimitriou, P. Raghavan, H. Tamaki and S. Vempala, Latent semantic indexing: A probabilistic analysis, in: Proceedings of the 17th ACM Symposium on the Principles of Database Systems, pp. 159–168, 1998.
[162] K. J. Park and M. Kanehisa, Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs, Bioinformatics 19 (2003), 1656–1663.
[163] G. R. Pasha and M. A. A. Shah, Application of ridge regression to multicollinear data, Journal of Research (Science) 15 (2004), 97–106.
[164] C. Pesquita, D. Faria, A. O. Falcao, P. Lord and F. M. Counto, Metrics for GO based protein semantic similarity: a systematic evaluation, BMC Bioinformatics 9 (2008), S4.
[165] C. Pesquita, D. Faria, A. O. Falcao, P. Lord and F. M. Counto, Semantic similarity in biomedical ontologies, PLoS Computational Biology 5 (2009), e1000443.
[166] C. Pesquita, D. Pessoa, D. Faria and F. Couto, CESSM: Collaborative evaluation of semantic similarity measures, JB2009: Challenges in Bioinformatics 157 (2009).
[167] A. Pierleoni, P. Luigi, P. Fariselli and R. Casadio, BaCelLo: a balanced subcellular localization predictor, Bioinformatics 22 (2006), e408–e416.
[168] A. D. Pozo, F. Pazos and A. Valencia, Defining functional distances over gene ontology, BMC Bioinformatics 9 (2008), 50.
[169] J. R. Quinlan, C4.5: programs for machine learning, 1, Morgan Kaufmann, 1993.
[170] H. Rangwala and G. Karypis, Profile-based direct kernels for remote homology detection and fold recognition, Bioinformatics 21 (2005), 4239–4247.
[171] S. Rea and D. James, moving GLUT4: the biogenesis and trafficking of GLUT4 storage vesicles, Diabetes 46 (1997), 1667–1677.
[172] J. Read, B. Pfahringer, G. Holmes and E. Frank, Classifier chains for multi-label classification, in: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 254–269, 2009.
[173] A. Reinhardt and T. Hubbard, Using neural networks for prediction of the subcellular location of proteins, Nucleic Acids Res. 26 (1998), 2230–2236.
[174] P. Resnik, Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language, Journal of Artificial Intelligence Research 11 (1999), 95–130.
[175] R. M. Riensche, B. L. Baddeley, A. P. Sanfilippo, C. Posse and B. Gopalan, XOA: Web-enabled cross-ontological analytics, in: 2007 IEEE Congress on Services, pp. 99–105, 2007.
[176] B. Rost, J. Liu, R. Nair, K. O. Wrzeszczynski and Y. Ofran, Automatic prediction of protein function, Cellular and Molecular Life Sciences 60 (2003), 2637–2650.
[177] J. Rousu, C. Saunders, S. Szedmak and J. Shawe-Taylor, Kernel-based learning of hierarchical multilabel classification methods, Journal of Machine Learning Research 7 (2006), 1601–1626.
[178] R. Russell, R. Bergeron, G. Shulman and H. Young, Translocation of myocardial GLUT-4 and increased glucose uptake through activation of AMPK by AICAR, American Journal of Physiology 277 (1997), H643–649.
[179] R. E. Schapire and Y. Singer, Boostexter: A boosting-based system for text categorization, Machine Learning 39 (2000), 135–168.
[180] G. Schatz and B. Dobberstein, Common principles of protein translocation acrossmembranes, Science 271 (1996), 1519–1526.
[181] A. Schlicker, F. S. Domingues, J. Rahnenfuhrer and T. Lengauer, A new measure for functional similarity of gene products based on Gene Ontology, BMC Bioinformatics 7 (2006), 302.
[182] B. Scholkopf and A. J. Smola, Learning with kernels, in: MIT Press, 2002.
[183] J. Schultz, R. R. Copley, T. Doerks, C. Ponting and P. Bork, SMART: A Web-based tool for the study of genetically mobile domains, Nucleic Acids Research 28 (2000), 231–234.
[184] M. Scott, D. Thomas and M. Hallett, Predicting subcellular localization via protein motif cooccurrence, Genome research 14 (2004), 1957–1966. 186
[185] J. L. Sevilla, V. Segura, A. Podhorski, E. Guruceaga, J. M. Mato, L. A. Martinez-Cruz, F. J. Corrales and A. Rubio, Correlation between gene expression and GO semantic similarity, IEEE/ACM Transactions on Computational Biology and Bioinformatics 2 (2005), 330–338.
[186] B. Sheehan, A. Quigley, B. Gaudin and S. Dobson, A relation based measure of semantic similarity for Gene Ontology annotations, BMC Bioinformatics 9 (2008), 468.
[187] H. B. Shen and K. C. Chou, Virus-PLoc: A fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells, Biopolymers 85 (2006), 233–240.
[188] H. B. Shen and K. C. Chou, PseAAC: A flexible web-server for generating various kinds of protein pseudo amino acid composition, Analytical Biochemistry 373 (2008), 386–388.
[189] H. B. Shen and K. C. Chou, Virus-mPLoc: A fusion classifier for viral protein subcellular location prediction by incorporating multiple sites, J. Biomol. Struct. Dyn. 26 (2010), 175–186.
[190] H. B. Shen and K. Chou, Gpos-PLoc: An ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins, Protein Engineering, Design and Selection 20 (2007), 39–46.
[191] S. K. Shevade and S. S. Keerthi, A simple and efficient algorithm for gen selection using sparse logistic regression, Bioinformatics 19 (2003), 2246–2253.
[192] J. Y. Shi, S. W. Zhang, Q. Pan, Y. M. Cheng and J. Xie, Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition, Amino Acids 33 (2007), 69–74.
[193] I. Small, N. Peeters, F. Legeai and C. Lurin, Predotar: A tool for rapidly screening proteomes for N-terminal targeting sequences, Proteomics 4 (2004), 1581–1590.
[194] T. F. Smith and M. S. Waterman, Comparison of biosequences, Adv. Appl. Math. 2 (1981), 482–489.
[195] C. G. M. Snoek, M. Worring, J. C. van Gemert, J. M. Geusebroek and A. W. M. Smeulders, The challenge problem for automated detection of 101 semantic concepts in multimedia, in: Proceedings of the 14th annual ACM International Conference on Multimedia, pp. 421–430, 2006.
[196] O. Stepanenko, V. V. Verkhusha, I. M. Kuznetsova, V. N. Uversky and K. K. Turoverov, Fluorescent Proteins as Biomarkers and Biosensors: Throwing Color Lights on Molecular and Cellular Processes, Current Protein and Peptide Science 9 (2008), 338–369.
[197] D. L. Swets and J.J. Weng, Efficient content-based image retrieval using automatic feature selection, in: Proceedings of International Symposium on Computer Vision, IEEE, pp. 85–90, 1995.
[198] C. Tanford, Contribution of hydrophobic interactions to the stability of the globular conformation of proteins, Journal of the American Chemical Society 84 (1962), 4240–4247.
[199] Y. Tao, L. Sam, J. Li, C. Friedman and Y. A. Lussier, Information theory applied to the sparse gene ontology annotation network to predict novel gene function, Bioinformatics 23 (2007), i529–i538.
[200] The Gene Ontology Consortium, The Gene Ontology in 2010: extensions and refinements, Nucleic Acids Res. 38 (2010), D331–D335.
[201] The Gene Ontology Consortium, The Gene Ontology: enhancements for 2011, Nucleic Acids Res. 40 (2012), D559–D564.
[202] K. Trohidis, G. Tsoumakas, G. Kalliris and I. Vlahavas, Multilabel classification of music into emotions, in: Proceedings of the 9th International Conference on Music Information Retrieval, pp. 325–330, 2006.
[203] G. Tsoumakas and I. Katakis, Multi-label classification: An overview, International Journal of Data Warehousing and Mining 3 (2007), 1–13. 187
[204] G. Tsoumakas, I. Katakis and I. Vlahavas, Mining multi-label data, in: Data Mining and Knowledge Discovery Handbook, O. Maimon, l. Rokach (Ed.). Springer, 2nd edition, pp. 667–685, 2010.
[205] V. N. Vapnik, Statistical Learning Theory, in: John Wiley and Sons, 1998.
[206] V. N. Vapnik, The nature of statistical learning theory, in: Springer Verlag, 2000.
[207] C. Vens, J. Struyf, L. Schietgat, S. Dzeroski and H. Blockeel, Decision trees for hierarchical multi-label classification, Machine Learning 2 (2008), 185–214.
[208] G. von Heijine, The signal peptides, Journal of Membrane Biology 115 (1990), 195–201.
[209] G.von Heijne, Patterns of amino acids near signal-sequence cleavage sites, Eur J Biochem. 133 (1983), 17–21.
[210] G. von Heijne, A new method for predicting signal sequence cleavage sites, Nucleic Acids Research 14 (1986), 4683–4690.
[211] S. Wan, M. W. Mak and S. Y. Kung, Protein subcellular localization prediction based on profile alignment and Gene Ontology, in: 2011 IEEE International Workshop on Machine Learning for Signal Processing (MLSP’11), pp. 1–6, Sept 2011.
[212] S. Wan, M. W. Mak and S. Y. Kung, GOASVM: Protein subcellular localization prediction based on gene ontology annotation and SVM, in: 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’12), pp. 2229–2232, 2012.
[213] S. Wan, M. W. Mak and S. Y. Kung, mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines, BMC Bioinformatics 13 (2012), 290.
[214] S. Wan, M. W. Mak and S. Y. Kung, Adaptive thresholding for multi-label SVM classification with application to protein subcellular localization prediction, in: 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’13), pp. 3547–3551, 2013.
[215] S. Wan, M. W. Mak and S. Y. Kung, GOASVM: A subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou’s pseudo-amino acid composition, Journal of Theoretical Biology 323 (2013), 40–48.
[216] S. Wan, M. W. Mak and S. Y. Kung, Semantic similarity over gene ontology for multi-label protein subcellular localization, Engineering 5 (2013), 68–72.
[217] S. Wan, M. W. Mak and S. Y. Kung, HybridGO-Loc: Mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins, PLoS ONE 9 (2014), e89545.
[218] S. Wan, M. W. Mak, B. Zhang, Y. Wang and S. Y. Kung, An ensemble classifier with random projection for predicting multi-label protein subcellular localization, in:2013 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 35–42, 2013.
[219] G. L. Wang, J. Dunbrack and R. L. PISCES, A protein sequence culling server, Bioinformatics 19 (2003), 1589–1591.
[220] J. Z. Wang, Z. Du, R. Payattakool, P. S. Yu and C. F. Chen, A new method to measure the semantic similarity of GO terms, Bioinformatics 23 (2007), 1274–1281.
[221] W. Wang, M. W. Mak and S. Y. Kung, Speeding up Subcellular Localization by Extracting Informative Regions of Protein Sequences for Profile Alignment, in: Proc. Computational Intelligence in Bioinformatics and Computational Biology (CIBCB’10), pp. 147–154, 2010.
[222] X. Wang and G. Z. Li, A multi-label predictor for identifying the subcellular locations of singleplex and multiplex eukaryotic proteins, PLoS ONE 7 (2012), e36317.
[223] X. Wang and G. Z. Li, Multilabel Learning via Random Label Selection for Protein Subcellular Multilocations Prediction, IEEE/ACM Transactions on Computational Biology and Bioinformatics 10 (2013), 436–446.
[224] Y. Wang and K. N. Plataniotis, An Analysis of Random Projection for Changeable and Privacy-Preserving Biometric Verification, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 40 (2010), 1280–1293. 188
[225] M. Winston, R. Chaffin and D. Herrmann, A taxonomy of part-whole relations, Cognitive Science 11 (1987), 417–444.
[226] C. H. Wu, H. Huang, L. S. Yeh and W. C. Barker, Protein family classification and functional annotation, Comput. Biol. Chem. 27 (2003), 37–47.
[227] C. H. Wu and J. M. McLarty, Neural Networks and Genome Informatics, Elsevier Science, 2000.
[228] H. Wu, Z. Su, F. Mao, V. Olman and Y. Xu, Prediction of functional modules based on comparative genome analysis and gene ontology application, Nucleic Acids Res. 33 (2005), 2822–2837.
[229] X. Wu, L. Zhu, J. Guo, D. Y. Zhang and K. Lin, Prediction of yeast protein-protein interaction network: insights from the gene ontology and annotations, Nucleic Acids Res. 34 (2006), 2137–2150.
[230] Z. C. Wu, X. Xiao and K. C. Chou, iLoc-Plant: A multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites, Molecular BioSystems 7 (2011), 3287–3297.
[231] Z. C. Wu, X. Xiao and K. C. Chou, iLoc-Gpos: A multi-layer classifier for predicting the subcellular localization of singleplex and multiplex gram-positive bacterial proteins, Protein & Peptide Letters 19 (2012), 4–14.
[232] X. Xiao, Z. C. Wu and K. C. Chou, A multi-label learning classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites, PLoS ONE 6 (2011), e20592.
[233] X. Xiao, Z. C. Wu and K. C. Chou, iLoc-Virus: A multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites, Journal of Theoretical Biology 284 (2011), 42–51.
[234] T. Xu, L. Du and Y. Zhou, Evaluation of GO-based functional similarity measures using S. cerevisiae protein interaction and expression profile data, BMC Bioinformatics 9 (2008), 472.
[235] B. Yang, D. Hartung, K. Simoens and C. Busch, Dynamic random projection for biometric template protection, in: 2010 Fourth IEEE International Conference on Biometrics: Theory Applications and Systems (BTAS), pp. 1–7, 2010.
[236] D. Yang, Y. Li, H. Xiao, Q. Liu, M. Zhang, J. Zhu, W. Ma, C. Yao, J. Wang, D. Wang, Z. Guo and B.Yang, Gaining confidence in biological interpretation of the microarray data: the functional consistence of the significant GO categories, Bioinformatics 24 (2008), 265–271.
[237] G. Yu, F. Li, Y. Qin, X. Bo, Y. Wu and S. Wang, GOSemSim: An R package for measuring semantic similarity among GO terms and gene products, Bioinformatics 26 (2010), 976–978.
[238] H. Yu, L. Gao, K. Tu and Z. Guo, Broadly predicting specific gene function with expression similarity and taxonomy similarity, Gene 352 (2005), 75–81.
[239] R. Yuste, Fluorescence microscopy today, Nature Methods 2 (2005), 902–904.
[240] E. M. Zdobnov and R. Apweiler, InterProScan – an integration platform for the signature-recognition methods in InterPro, Bioinformatics 17 (2001), 847–848.
[241] M. L. Zhang and Z. H. Zhou, A k-nearest neighbor based algorithm for multi-label classification, in: IEEE International Conference on Granular Computing, pp. 718–721, 2005.
[242] S. Zhang, X. F. Xia, J. C. Shen, Y. Zhou and Z. Sun, DBMLoc: A database of proteins with multiple subcellular localizations, BMC Bioinformatics 9 (2008), 127.
[243] S. W.Zhang, Y. L.Zhang, H. F.Yang, C. H. Zhao and Q. Pan, Using the concept of Chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies, Amino Acids 34 (2008), 565–572.
[244] J. Zhu, Classification of gene microarrays by penalized logistic regression, Biostatistics 5 (2004), 427–443. 189
[245] J. Zhu and T. Hastie, Kernel logistic regression and the import vector machine, in: Journal of Computational and Graphical Statistics, pp. 1081–1088, MIT Press, 2001.
[246] M. Zhu, L. Gao, Z. Guo, Y. Li, D. Wang, J. Wang and C. Wang, Globally predicting protein functions based on co-expressed protein-protein interaction networks and ontology taxonomy similarities, Gene 391 (2007), 113–119.