Article (Scientific journals)
Cracking the genetic code with neural networks
Joiret, Marc; Leclercq, Marine; Lambrechts, Gaspard et al.
2023In Frontiers in Artificial Intelligence, 6
Peer Reviewed verified by ORBi
 

Files


Full Text
frai-06-1128153published.pdf
Publisher postprint (3.58 MB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
Artificial intellignece; genetic code deciphering; codon usage; codon embedding; deep neural network; data efficiency; natural language processing
Abstract :
[en] The genetic code is textbook scientific knowledge that was soundly established without resorting to Artificial Intelligence (AI). The goal of our study was to check whether a neural network could re-discover, on its own, the mapping links between codons and amino acids and build the complete deciphering dictionary upon presentation of transcripts proteins data training pairs. We compared different Deep Learning neural network architectures and estimated quantitatively the size of the required human transcriptomic training set to achieve the best possible accuracy in the codon-to-amino-acid mapping. We also investigated the effect of a codon embedding layer assessing the semantic similarity between codons on the rate of increase of the training accuracy. We further investigated the benefit of quantifying and using the unbalanced representations of amino acids within real human proteins for a faster deciphering of rare amino acids’ codons. Deep neural networks require huge amount of data to train them. Deciphering the genetic code by a neural network is no exception. A test accuracy of 100% and the unequivocal deciphering of rare codons such as the tryptophan codon or the stop codons require a training dataset of the order of 4-22 millions cumulated pairs of codons with their associated amino acids presented to the neural network over around 7-40 training epochs, depending on the architecture and settings. We confirm that the wide generic capacities and modularity of deep neural networks allow them to be customized easily to learn the deciphering task of the genetic code efficiently.
Disciplines :
Physical, chemical, mathematical & earth Sciences: Multidisciplinary, general & others
Biochemistry, biophysics & molecular biology
Computer science
Author, co-author :
Joiret, Marc  ;  Université de Liège - ULiège > GIGA > GIGA In silico medecine - Biomechanics Research Unit
Leclercq, Marine  ;  Université de Liège - ULiège > GIGA > GIGA Stem Cells - Cancer Signaling
Lambrechts, Gaspard ;  Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids
Rapino, Francesca  ;  Université de Liège - ULiège > GIGA > GIGA Stem Cells - Cancer Signaling
Close, Pierre  ;  Université de Liège - ULiège > GIGA > GIGA Stem Cells - Cancer Signaling
Louppe, Gilles  ;  Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Big Data
Geris, Liesbet  ;  Université de Liège - ULiège > GIGA > GIGA In silico medecine - Biomechanics Research Unit
Language :
English
Title :
Cracking the genetic code with neural networks
Publication date :
06 April 2023
Journal title :
Frontiers in Artificial Intelligence
eISSN :
2624-8212
Publisher :
Frontiers Media S.A., Switzerland
Volume :
6
Peer reviewed :
Peer Reviewed verified by ORBi
Funders :
F.R.S.-FNRS - Fonds de la Recherche Scientifique [BE]
WELBIO - Walloon Excellence in Life Sciences and Biotechnology [BE]
ERC - European Research Council [BE]
Télévie [BE]
Funding number :
FNRS-FWO EOS grant n° 30480119 (Joint-t-against-Osteoarthritis); FNRS-Welbio grant n° WELBIO-CR-2017S-02 (THERAtRAME)
Funding text :
This work was supported by the FNRS-FWO EOS grant n° 30480119 (Joint-t-against-Osteoarthritis), the FNRS-Welbio grant n° WELBIO-CR-2017S-02 (THERAtRAME) in Belgium and the European Research Council under the European Union's Horizon 2020 Framework program (H2020/2014-2020)/ERC grant agreement n°772418 (INSITE) and FNRS Télévie grant agreement (R.FNRS.5131/Télévie 7.4625.20).
Available on ORBi :
since 01 May 2023

Statistics


Number of views
122 (20 by ULiège)
Number of downloads
84 (5 by ULiège)

Scopus citations®
 
1
Scopus citations®
without self-citations
1
OpenCitations
 
0

Bibliography


Similar publications



Contact ORBi