Prédiction de structures de macromolécules par apprentissage automatique

Machine learning; Optimization; Protein; Simulated annealing; EDA; Bioinformatics; Apprentissage automatique; Optimisation; Protéine; Recuit simulé; Bioinformatique

Abstract :

[en] Proteins are an essential constituent of cellular life whose biggest part of their function is determined by their tridimensional shape. Nowadays, however, no method is able to predict efficiently tridimensional protein structures based only on their amino acids sequence. We propose here an "ab initio" approach based on the concept of learning for search. Protein structure prediction is modeled in the form of an optimization problem solved by an optimization algorithm that follows an iterative framework in which a structure modification operator is selected and then applied to the current structure. The quality of the new structure is then assessed by an oracle that will determine whether or not the structure is accepted. The repetition of this framework will eventually lead to the sought structure. The critical point of this rationale lies in the choice of the modification operator, which has to be done very accurately in order to avoid the classical pitfalls of optimization problems. The operator selection step will then be subjected to machine learning thus legitimizing the term "learning for search" of the proposed method. The goal of this thesis is to show that machine learning can improve the results obtained via a simple optimization procedure. Our experiments show that this goal is fulfilled. We however know that many choices that we did should be questioned regarding both the optimization and the machine learning procedures. Finally, we can notice that the application domain of this work extends beyond the protein structure prediction problem. There exist indeed many optimization problems in the scientific literature for which no exact neither approximation algorithm exists and that are thus still very badly solved. Such problems could greatly benefit from a "learning for search" approach such as the one described in this work.
[fr] Les protéines sont un constituant essentiel de la vie cellulaire dont l'essentiel des fonctions et des propriétés est déterminé par leur structure tridimensionnelle. Il n'existe pourtant, à l'heure actuelle, aucune méthode permettant de prédire efficacement la structure tridimensionnelle des protéines à partir de la description de leur contenu en acides aminés. Nous proposons ici une approche "ab initio" basée sur le concept du "learning for search". La prédiction de structure de protéines est modélisée sous la forme d'un problème d'optimisation dans lequel l'algorithme d'optimisation suit un schéma itératif où un opérateur de modification de structure est sélectionné et appliqué à la structure courante. La qualité de la structure ainsi obtenue est ensuite évaluée à l'aide d'un oracle qui déterminera si celle-ci est acceptée. La répétition de ce schéma a pour objet de trouver, à terme, la structure optimale recherchée. Le point critique de ce raisonnement se situe au niveau du choix de l'opérateur qui doit être réalisé très précisément afin d'éviter les écueils classiques des problèmes d'optimisation. C'est cette étape qui fera l'objet de l'apprentissage automatique justifiant ainsi le qualificatif "learning for search" de l'approche proposée. Le but de ce travail est de prouver que l'apprentissage automatique permet d'améliorer les résultats obtenus via un algorithme d'optimisation naïf. Les résultats de nos expérimentations montrent que cet objectif est atteint. Néanmoins, avant que cette procédure ne soit réellement utile, de nombreux choix doivent être remis en questions aussi bien au niveau de l'algorithme d'optimisation que des procédés d'apprentissage. Finalement, notons que le domaine d'application de ce travail s'étend au-delà de la prédiction de structures de protéines. Il existe de très nombreux problèmes de la littérature scientifique qui sont, à ce jour, encore très mal résolus et pour lesquels aucun algorithme d'approximation n'existe. De tels problèmes pourraient grandement bénéficier d'une approche "learning for search" telle que celle mise en avant dans ce travail.

Disciplines :

Computer science

Author, co-author :

Marcos Alvarez, Alejandro ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Language :

French

Title :

Prédiction de structures de macromolécules par apprentissage automatique

Alternative titles :

[en] Macromolecule structure prediction using machine learning

Defense date :

27 June 2011

Number of pages :

ii - 96

Institution :

ULiège - Université de Liège

Degree :

Master en ingénieur civil électricien, à finalité approfondie

Promotor :

Wehenkel, Louis ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

President :

Destiné, Jacques ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore)

Jury member :

Dehareng, Dominique ; Université de Liège - ULiège > Centres généraux > Centre d'ingénierie des protéines

Geurts, Pierre ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Louveaux, Quentin ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Maes, Francis ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Méthodes stochastiques

Available on ORBi :

since 24 November 2011

Statistics

Number of views

392 (38 by ULiège)

Number of downloads

664 (17 by ULiège)

More statistics