Article (Scientific journals)
On the Relevance of Sophisticated Structural Annotations for Disulfide Connectivity Pattern Prediction
Becker, Julien; Maes, Francis; Wehenkel, Louis
2013In PLoS ONE, 8 (2), p. 56621
Peer Reviewed verified by ORBi
 

Files


Full Text
journal.pone.0056621.pdf
Publisher postprint (1.12 MB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
Disulfide connectivity pattern prediction; extremely randomized trees; feature selection; web service; disulfide bridge prediction
Abstract :
[en] Disulfide bridges strongly constrain the native structure of many proteins and predicting their formation is therefore a key sub-problem of protein structure and function inference. Most recently proposed approaches for this prediction problem adopt the following pipeline: first they enrich the primary sequence with structural annotations, second they apply a binary classifier to each candidate pair of cysteines to predict disulfide bonding probabilities and finally, they use a maximum weight graph matching algorithm to derive the predicted disulfide connectivity pattern of a protein. In this paper, we adopt this three step pipeline and propose an extensive study of the relevance of various structural annotations and feature encodings. In particular, we consider five kinds of structural annotations, among which three are novel in the context of disulfide bridge prediction. So as to be usable by machine learning algorithms, these annotations must be encoded into features. For this purpose, we propose four different feature encodings based on local windows and on different kinds of histograms. The combination of structural annotations with these possible encodings leads to a large number of possible feature functions. In order to identify a minimal subset of relevant feature functions among those, we propose an efficient and interpretable feature function selection scheme, designed so as to avoid any form of overfitting. We apply this scheme on top of three supervised learning algorithms: k-nearest neighbors, support vector machines and extremely randomized trees. Our results indicate that the use of only the PSSM (position-specific scoring matrix) together with the CSP (cysteine separation profile) are sufficient to construct a high performance disulfide pattern predictor and that extremely randomized trees reach a disulfide pattern prediction accuracy of on the benchmark dataset SPX+, which corresponds to +3.2% improvement over the state of the art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3CysBridge​s.
Research center :
GIGA-Bioinformatics
Disciplines :
Computer science
Biotechnology
Author, co-author :
Becker, Julien ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Maes, Francis;  University of Liège > Electrical Engineering and Computer Science > Systèmes et modélisation
Wehenkel, Louis  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Language :
English
Title :
On the Relevance of Sophisticated Structural Annotations for Disulfide Connectivity Pattern Prediction
Publication date :
15 February 2013
Journal title :
PLoS ONE
eISSN :
1932-6203
Publisher :
Public Library of Science, San Franscisco, United States - California
Volume :
8
Issue :
2
Pages :
e56621
Peer reviewed :
Peer Reviewed verified by ORBi
Funders :
FRIA - Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture [BE]
Available on ORBi :
since 17 June 2013

Statistics


Number of views
291 (16 by ULiège)
Number of downloads
198 (7 by ULiège)

Scopus citations®
 
7
Scopus citations®
without self-citations
6
OpenCitations
 
8

Bibliography


Similar publications



Contact ORBi