Cracking the genetic code with neural networks (poster)

Joiret, Marc; Leclercq, Marine; Lambrechts, Gaspard; Rapino, Francesca; Close, Pierre; Louppe, Gilles; Geris, Liesbet

Scientific conference in universities or research centers (Scientific conferences in universities or research centers)

Joiret, Marc; Leclercq, Marine; Lambrechts, Gaspard et al.

2023

Dataset

Permalink
https://hdl.handle.net/2268/306809

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

PosterGIGAdayJOIRET2023.pdf

Author postprint (1.59 MB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Artificial Intellignece; Deep learning; genetic code deciphering; Natural language processing; codon bias; data efficiency; embeddings; cross entropy loss; neural networks; multi-layer perceptron

Abstract :

[en] The genetic code is textbook scientific knowledge that was soundly established without resorting to Artificial Intelligence (AI). The goal of our study was to check whether a neural network could re-discover, on its own, the mapping links between codons and amino acids and build the complete deciphering dictionary upon presentation of transcripts proteins data training pairs. We compared different Deep Learning neural network architectures and estimated quantitatively the size of the required human transcriptomic training set to achieve the best possible accuracy in the codon-to-amino-acid mapping. We also investigated the effect of a codon embedding layer assessing the semantic similarity between codons on the rate of increase of the training accuracy. We further investigated the benefit of quantifying and using the unbalanced representations of amino acids within real human proteins for a faster deciphering of rare amino acids codons. Deep neural networks require huge amount of data to train them. Deciphering the genetic code by a neural network is no exception. A test accuracy of 100% and the unequivocal deciphering of rare codons such as the tryptophan codon or the stop codons require a training dataset of the order of 4–22 millions cumulated pairs of codons with their associated amino acids presented to the neural network over around 7–40 training epochs, depending on the architecture and settings. We confirm that the wide generic capacities and modularity of deep neural networks allow them to be customized easily to learn the deciphering task of the genetic code efficiently.

Research Center/Unit :

GIGA In silico medecine-Biomechanics Research Unit

Disciplines :

Computer science
Biochemistry, biophysics & molecular biology
Physical, chemical, mathematical & earth Sciences: Multidisciplinary, general & others

Author, co-author :

Joiret, Marc ; Université de Liège - ULiège > GIGA

Leclercq, Marine ; Université de Liège - ULiège > GIGA > GIGA Stem Cells - Cancer Signaling

Lambrechts, Gaspard ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Rapino, Francesca ; Université de Liège - ULiège > Département de pharmacie

Close, Pierre ; Université de Liège - ULiège > Département de pharmacie

Louppe, Gilles ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Big Data

Geris, Liesbet ; Université de Liège - ULiège > Département d'aérospatiale et mécanique > Génie biomécanique

Language :

English

Title :

Cracking the genetic code with neural networks (poster)

Publication date :

2023

Number of pages :

Event name :

Neural networks: real versus man-made

Event organizer :

GIGA Day September 4th, 2023

Event place :

Liège, Belgium

Event date :

September 4th, 2023

Audience :

International

Name of the research project :

ERC grant agreement 742418 (INSITE)

Data Set :

Deep Learning project: cracking the genetic code with neural networks

Available on ORBi :

since 22 September 2023

Statistics

Number of views

109 (7 by ULiège)

Number of downloads

53 (3 by ULiège)

More statistics