Doctoral thesis (Dissertations and theses)
Protein Structural Annotation: Multi-Task Learning and Feature Selection
Becker, Julien
2014
 

Files


Full Text
These.pdf
Publisher postprint (4.02 MB)
These
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
Protein structure prediction; Multi-task learning; feature selection; Disrodered regions; secondary structure; disulfide bonds
Abstract :
[en] Experimentally determining the three-dimensional structure of a protein is a slow and expensive process. Nowadays, supervised machine learning techniques are widely used to predict protein structures, and in particular to predict surrogate annotations, which are much less complex than 3D structures. This dissertation presents, on the one hand, methodological contributions for learning multiple tasks simultaneously and for selecting relevant feature representations, and on the other hand, biological contributions issued from the application of these techniques on several protein annotation problems. Our first methodological contribution introduces a multi-task formulation for learning various protein structural annotation tasks. Unlike the traditional methods proposed in the bioinformatics literature, which mostly treated these tasks independently, our framework exploits the natural idea that multiple related prediction tasks should be designed simultaneously. Our empirical experiments on a set of five sequence labeling tasks clearly highlight the benefit of our multi-task approach against single-task approaches in terms of correctly predicted labels. Our second methodological contribution focuses on the best way to identify a minimal subset of feature functions, {\em i.e.}, functions that encode properties of complex objects, such as sequences or graphs, into appropriate forms (typically, vectors of features) for learning algorithms. Our empirical experiments on disulfide connectivity pattern prediction and disordered regions prediction show that using carefully selected feature functions combined with ensembles of extremely randomized trees lead to very accurate models. Our biological contributions are mainly issued from the results obtained by the application of our feature function selection algorithm on the problems of predicting disulfide connectivity patterns and of predicting disordered regions. In both cases, our approach identified a relevant representation of the data that should play a role in the prediction of disulfide bonds (respectively, disordered regions) and, consequently, in protein structure-function relationships. For example, the major biological contribution made by our method is the discovery of a novel feature function, which has - to our best knowledge - never been highlighted in the context of predicting disordered regions. These representations were carefully assessed against several baselines such as the 10th Critical Assessment of Techniques for Protein Structure Prediction (CASP) competition.
Research Center/Unit :
GIGA‐R - Giga‐Research - ULiège
Disciplines :
Computer science
Author, co-author :
Becker, Julien ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Language :
English
Title :
Protein Structural Annotation: Multi-Task Learning and Feature Selection
Defense date :
27 January 2014
Institution :
ULiège - Université de Liège
Degree :
Doctor of Philosophy in Computer Science
Promotor :
Wehenkel, Louis  ;  Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
President :
Dehareng, Dominique ;  Université de Liège - ULiège > Centres généraux > Centre d'ingénierie des protéines
Jury member :
Maes, Francis
Denoyer, Ludovic
Charloteaux, Benoit
Geurts, Pierre  ;  Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
Funders :
FRIA - Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture
Available on ORBi :
since 20 January 2014

Statistics


Number of views
217 (12 by ULiège)
Number of downloads
454 (9 by ULiège)

Bibliography


Similar publications



Contact ORBi