Protein Structural Annotation: Multi-Task Learning and Feature Selection

Becker, Julien

Download

Doctoral thesis (Dissertations and theses)

Protein Structural Annotation: Multi-Task Learning and Feature Selection

Becker, Julien

2014

Permalink
https://hdl.handle.net/2268/161584

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

These.pdf

Publisher postprint (4.02 MB)

These

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Protein structure prediction; Multi-task learning; feature selection; Disrodered regions; secondary structure; disulfide bonds

Abstract :

[en] Experimentally determining the three-dimensional structure of a protein is a slow and expensive process. Nowadays, supervised machine learning techniques are widely used to predict protein structures, and in particular to predict surrogate annotations, which are much less complex than 3D structures. This dissertation presents, on the one hand, methodological contributions for learning multiple tasks simultaneously and for selecting relevant feature representations, and on the other hand, biological contributions issued from the application of these techniques on several protein annotation problems. Our first methodological contribution introduces a multi-task formulation for learning various protein structural annotation tasks. Unlike the traditional methods proposed in the bioinformatics literature, which mostly treated these tasks independently, our framework exploits the natural idea that multiple related prediction tasks should be designed simultaneously. Our empirical experiments on a set of five sequence labeling tasks clearly highlight the benefit of our multi-task approach against single-task approaches in terms of correctly predicted labels. Our second methodological contribution focuses on the best way to identify a minimal subset of feature functions, {\em i.e.}, functions that encode properties of complex objects, such as sequences or graphs, into appropriate forms (typically, vectors of features) for learning algorithms. Our empirical experiments on disulfide connectivity pattern prediction and disordered regions prediction show that using carefully selected feature functions combined with ensembles of extremely randomized trees lead to very accurate models. Our biological contributions are mainly issued from the results obtained by the application of our feature function selection algorithm on the problems of predicting disulfide connectivity patterns and of predicting disordered regions. In both cases, our approach identified a relevant representation of the data that should play a role in the prediction of disulfide bonds (respectively, disordered regions) and, consequently, in protein structure-function relationships. For example, the major biological contribution made by our method is the discovery of a novel feature function, which has - to our best knowledge - never been highlighted in the context of predicting disordered regions. These representations were carefully assessed against several baselines such as the 10th Critical Assessment of Techniques for Protein Structure Prediction (CASP) competition.

Research Center/Unit :

GIGA‐R - Giga‐Research - ULiège

Disciplines :

Computer science

Author, co-author :

Becker, Julien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Language :

English

Title :

Protein Structural Annotation: Multi-Task Learning and Feature Selection

Defense date :

27 January 2014

Institution :

ULiège - Université de Liège

Degree :

Doctor of Philosophy in Computer Science

Promotor :

Wehenkel, Louis ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

President :

Dehareng, Dominique ; Université de Liège - ULiège > Centres généraux > Centre d'ingénierie des protéines

Jury member :

Maes, Francis

Denoyer, Ludovic

Charloteaux, Benoit

Geurts, Pierre ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Funders :

FRIA - Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture

Available on ORBi :

since 20 January 2014

Statistics

Number of views

276 (12 by ULiège)

Number of downloads

550 (9 by ULiège)

More statistics