Contents

Preface

List of Abbreviations

1   Introduction

1.1    Proteins and their subcellular locations

1.2    Why computationally predict protein subcellular localization?

1.2.1 Significance of the subcellular localization of proteins
1.2.2 Conventional wet-lab techniques
1.2.3 Computational prediction of protein subcellular localization

1.3    Organization of this book

2   Overview of subcellular localization prediction

2.1    Sequence-based methods

2.1.1 Composition-based methods
2.1.2 Sorting signal-based methods
2.1.3 Homology-based methods

2.2    Knowledge-based methods

2.2.1 GO-term extraction
2.2.2 GO-vector construction

2.3    Limitations of existing methods

2.3.1 Limitations of sequence-based methods
2.3.2 Limitations of knowledge-based methods

3   Legitimacy of using gene ontology information

3.1    Direct table lookup?

3.1.1 Table lookup procedure for single-label prediction
3.1.2 Table-lookup procedure for multi-label prediction
3.1.3 Problems of table lookup

3.2    Using only cellular component GO terms?

3.3    Equivalent to homologous transfer?

3.4    More reasons for using GO information

4   Single-location protein subcellular localization

4.1    Extracting GO from the Gene Ontology Annotation Database

4.1.1 Gene Ontology Annotation Database
4.1.2 Retrieval of GO terms
4.1.3 Construction of GO vectors
4.1.4 Multiclass SVM classification

4.2    FusionSVM: Fusion of gene ontology and homology-based features

4.2.1 InterProGOSVM: Extracting GO from InterProScan
4.2.2 PairProSVM: A homology-based method
4.2.3 Fusion of InterProGOSVM and PairProSVM

4.3    Summary

5   From single- to multi-location

5.1    Significance of multi-location proteins

5.2    Multi-label classification

5.2.1 Algorithm-adaptation methods
5.2.2 Problem transformation methods
5.2.3 Multi-label classification in bioinformatics

5.3    mGOASVM: A predictor for both single- and multi-location proteins

5.3.1 Feature extraction
5.3.2 Multi-label multiclass SVM classification

5.4    AD-SVM: An adaptive decision multi-label predictor

5.4.1 Multi-label SVM scoring
5.4.2 Adaptive decision for SVM (AD-SVM)
5.4.3 Analysis of AD-SVM

5.5    mPLR-Loc: A multi-label predictor based on penalized logistic regression

5.5.1 Single-label penalized logistic regression
5.5.2 Multi-label penalized logistic regression
5.5.3 Adaptive decision for LR (mPLR-Loc)

5.6    Summary

6   Mining deeper on GO for protein subcellular localization

6.1    Related work

6.2    SS-Loc: Using semantic similarity over GO

6.2.1 Semantic similarity measures
6.2.2 SS vector construction

6.3    HybridGO-Loc: Hybridizing GO frequency and semantic similarity features

6.3.1 Hybridization of two GO features
6.3.2 Multi-label multiclass SVM classification

6.4    Summary

7   Ensemble random projection for large-scale predictions

7.1    Random projection

7.2    RP-SVM: A multi-label classifier with ensemble random projection

7.2.1 Ensemble multi-label classifier
7.2.2 Multi-label classification

7.3    R3P-Loc: A compact predictor based on ridge regression and ensemble random projection

7.3.1 Limitation of using current databases
7.3.2 Creating compact databases
7.3.3 Single-label ridge regression
7.3.4 Multi-label ridge regression

7.4    Summary

8   Experimental setup

8.1    Prediction of single-label proteins

8.1.1 Datasets construction
8.1.2 Performance metrics

8.2    Prediction of multi-label proteins

8.2.1 Dataset construction
8.2.2 Datasets analysis
8.2.3 Performance metrics

8.3    Statistical evaluation methods

8.4    Summary

9   Results and analysis

9.1    Performance of GOASVM

9.1.1 Comparing GO vector construction methods
9.1.2 Performance of successive-search strategy
9.1.3 Comparing with methods based on other features
9.1.4 Comparing with state-of-the-art GO methods
9.1.5 GOASVM using old GOA databases

9.2    Performance of FusionSVM

9.2.1 Comparing GO vector construction and normalization methods
9.2.2 Performance of PairProSVM
9.2.3 Performance of FusionSVM
9.2.4 Effect of the fusion weights on the performance of FusionSVM

9.3    Performance of mGOASVM

9.3.1 Kernel selection and optimization
9.3.2 Term-frequency for mGOASVM
9.3.3 Multi-label properties for mGOASVM
9.3.4 Further analysis of mGOASVM
9.3.5 Comparing prediction results of novel proteins

9.4    Performance of AD-SVM

9.5    Performance of mPLR-Loc

9.5.1 Effect of adaptive decisions on mPLR-Loc
9.5.2 Effect of regularization on mPLR-Loc

9.6    Performance of HybridGO-Loc

9.6.1 Comparing different features

9.7    Performance of RP-SVM

9.7.1 Performance of ensemble random projection
9.7.2 Comparison with other dimension-reduction methods
9.7.3 Performance of single random-projection
9.7.4 Effect of dimensions and ensemble size

9.8    Performance of R3P-Loc

9.8.1 Performance on the compact databases
9.8.2 Effect of dimensions and ensemble size
9.8.3 Performance of ensemble random projection

9.9    Comprehensive comparison of proposed predictors

9.9.1 Comparison of benchmark datasets
9.9.2 Comparison of novel datasets

9.10    Summary

10   Properties of the proposed predictors

10.1    Noise data in the GOA Database

10.2    Analysis of single-label predictors

10.2.1 GOASVM vs FusionSVM
10.2.2 Can GOASVM be combined with PairProSVM?

10.3    Advantages of mGOASVM

10.3.1 GO-vector construction
10.3.2 GO subspace selection
10.3.3 Capability of handling multi-label problems

10.4    Analysis for HybridGO-Loc

10.4.1 Semantic similarity measures
10.4.2 GO-frequency features vs SS features
10.4.3 Bias analysis

10.5    Analysis for RP-SVM

10.5.1 Legitimacy of using RP
10.5.2 Ensemble random projection for robust performance

10.6    Comparing the proposed multi-label predictors

10.7    Summary

11   Conclusions and future directions

11.1    Conclusions

11.2    Future directions

A   Webservers for protein subcellular localization

A.1    GOASVM webserver

A.2    mGOASVM webserver

A.3    HybridGO-Loc webserver

A.4    mPLR-Loc webserver

B   Support vector machines

B.1    Binary SVM classification

B.2    One-vs-rest SVM classification

C   Proof of no bias in LOOCV

D   Derivatives for penalized logistic regression

Bibliography

Index

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset