[en] Today, there is a collection of a tremendous amount of bio-data because of the computerized
applications worldwide. Therefore, scholars have been encouraged to develop effective methods
to extract the hidden knowledge in these data. Consequently, a challenging and valuable area for
research in artificial intelligence has been created. Bioinformatics creates heuristic approaches and
complex algorithms using artificial intelligence and information technology in order to solve
biological problems. Intelligent implication of the data can accelerate biological knowledge
discovery. Data mining, as biology intelligence, attempts to find reliable, new, useful and
meaningful patterns in huge amounts of data. Hence, there is a high potential to raise the
interaction between artificial intelligence and bio-data mining. The present paper argues how
artificial intelligence can assist bio-data analysis and gives an up-to-date review of different
applications of bio-data mining. It also highlights some future perspectives of data mining in
bioinformatics that can inspire further developments of data mining instruments. Important and
new techniques are critically discussed for intelligent knowledge discovery of different types of row
datasets with applicable examples in human, plant and animal sciences. Finally, a broad perception
of this hot topic in data science is given.
Research Center/Unit :
Marie-Curie
Disciplines :
Biotechnology
Author, co-author :
Golestan Hashemi, Farahnaz Sadat ; Université de Liège - ULiège > Agronomie, Bio-ingénierie et Chimie (AgroBioChem) > Ingénierie des productions végétales et valorisation
Ismail, Mohd Razi; b Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia
Yusop, Mohd Rafii; c Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia
Golestan Hashemi, Mahboobe Sadat; Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University
Nadimi Shahrak, Mohammad Hossein; Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University
Rastegar, Hamid; d Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University
Miah, Gous; b Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia,
Aslani, Farzad; c Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia
Language :
English
Title :
Intelligent mining of large-scale bio-data: Bioinformatics applications
Gil Y, Greaves M, Hendler J, et al. Amplify scientific discovery with artificial intelligence. Science. 2014;346:171–172.
Leach SM, Tipney H, Feng W, et al. Biomedical discovery acceleration, with applications to craniofacial development. PLoS Comput Biol. 2009 [cited 2017 Feb 5];5:e1000215. DOI:10.1371/journal.pcbi.1000215
Al-Haggar M, Khair-Allaha B, Islam M, et al. Bioinformatics in high throughput sequencing: application in evolving genetic diseases. J Data Mining Genomics Proteomics. 2013 [cited 2017 Feb 2];4:131. DOI: 10.4172/2153-0602.1000131
He Z. Data mining for bioinformatics applications. Cambridge: Elsevier; 2015. (Woodhead Publishing Series in Biomedicine; 76).
Baxevanis AD. The importance of biological databases in biological discovery. Curr Protoc Bioinformatics. 2011;34:111–116.
Kadkhodaei S, Barantalab F, Taheri S, et al. BioInfoBase: a bioinformatics resourceome. Ithaca (NY): Cornell University Library; 2016. [cited 2017 July 20]. Available from: https://arxiv.org/abs/1607.02974
Bianco AM, Marcuzzi A, Zanin V, et al. Database tools in genetic diseases research. Genomics. 2013;101:75–85.
Vijayarani S, Deepa MS. Protein sequence classification in data mining–a study. Int J Infor Technol Mod Comput. 2014 [cited 2017 Feb 5];2. DOI:10.5121/ijitmc.2014.2201
Lee GW, Kim SS. Genome data mining for everyone. BMB Rep. 2008;41:757–764.
Hunter L. Artificial intelligence and molecular biology. San Jose (CA): AAAI Press; 1992.
Valentini G, Tagliaferri R, Masulli F. Computational intelligence and machine learning in bioinformatics. Artif Intell Med. 2009;45:91–96.
Pitrat J. Artificial intelligence and heuristic methods. Revue Francaise De Recherche Operationnele. 1996;10:137–137.
Kumar S, Banks TW, Cloutier S. SNP discovery through next-generation sequencing and its applications. Int J Plant Genom. 2012 [cited 2017 Feb 10];2012:831460. DOI:10.1155/2012/831460
Hilbert D, Neumann JV, Nordheim L. Über die grundlagen der quantenmechanik [On the fundamentals of quantum mechanics]. Math Ann. 1928;98:1–30.
Piatetsky-Shapiro G, Frawley W. Knowledge discovery in databases. San Jose (CA): AAAI/MIT Press; 1991.
Fayyad U, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery in databases. AI Mag. 1996;17:37–54.
Raza K. Application of data mining in bioinformatics. Indian J Comp Sci Eng. 2012;1:114–118.
Cacciatore S, Tenori L, Luchinat C, et al. KODAMA: an R package for knowledge discovery and data mining. Bioinformatics. 2017;33:621–623.
Han J. How can data mining help bio-data analysis? [extended abstract]. Paper presented at: BIOKDD02: Workshop on Data Mining in Bioinformatics (with SIGKDD02 Conference); 2002 Jul 23; Edmonton (Canada). Available from: https://web.njit.edu/∼wangj/publications/biokdd02/01-han.pdf
Esfandiari N, Babavalian MR, Moghadam AME, et al. Knowledge discovery in medicine: current issue and future trend. Expert Syst Appl. 2014;41:4434–4463.
Padhy N, Mishra P, Panigrahi R. The survey of data mining applications and feature scope. Int J Comp Sci Eng Inf Tech. 2012 [cited 2017 Feb 10];2:2. DOI:10.5121/ijcseit.2012.2303.
Pang-Ning T, Steinbach M, Kumar V. Introduction to data mining. Boston: Pearson Education Inc; 2006.
Piatetsky-Shapiro G. CRISP-DM, still the top methodology for analytics, data mining, or data science projects [Internet]. KDnuggets. 2014 [cited 2017 Feb 10]. Available from: http://www.kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-mining-data-science-projects.html
Niakšu O. Development and application of data mining methods in medical diagnostics and healthcare management. [dissertation]. Vilnius: Vilnius University; 2015.
Yang B, Liu F, Ren C, et al. BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone. Bioinformatics. 2017;33(13):1930–1936.
Lim KMK, Li C, Chng KR, et al. @MInter: automated text-mining of microbial interactions. Bioinformatics. 2016;32:2981–2987.
Kanehisa M, Bork P. Bioinformatics in the post-sequence era. Nat Genet. 2003;33:305–310.
Haque W, Aravind A, Reddy B, editors. Pairwise sequence alignment algorithms: a survey. Proceedings of the Conference on Information Science, Technology and Applications; 2009 Mar 20–22; Kuwait. New York (NY): ACM; 2009. p. 96–103. Available from: http://dl.acm.org/citation.cfm?id=1551980
Cristianini N, Hahn MW. Introduction to computational genomics: a case studies approach. New York (NY): Cambridge University Press; 2006.
Smith TF, Waterman MS. Identification of common molecular subsequence. J Mol Biol. 1981;147:195–197.
Lipman DJ, Pearson WR. Rapid and sensitive protein similarity searches. Science. 1985;227:1435–1441.
Altschul SF, Gish W, Miller W, et al. Basic local alignment search tool. J Mol Biol. 1990;215:403–410.
Altschul SF, Madden TL, Schaffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402.
Kent WJ. BLAT–The BLAST-Like alignment tool. Genome Res. 2002;12:656–664.
Schwartz S, Kent WJ, Smit A, et al. Human–mouse alignments with BLASTZ. Genome Res. 2003;13:103–107.
Ma B, Tromp J, Li M. Pattern-hunter: faster and more sensitive homology search. Bioinformatics. 2002;18:440–445.
Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48:443–453.
Delcher AL, Kasif S, Fleischmann RD, et al. Alignment of whole genomes. Nucleic Acids Res. 1999;27:2369–2376.
Batzoglou L, Pachter J, Mesirov B, et al. Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res. 2000;10:950–958.
Bray N, Dubchak I, Pachter L. AVID: a global alignment program. Genome Res. 2003;13:97–102.
Brudno M, Morgenstern B, editors. Fast and sensitive alignment of large genomic sequences. Proceedings of IEEE Computer Science Bioinformatics Conference on Comparative Genomics; 2002 Aug 14–16; Stanford (CA): IEEE; 2002. p. 138–147. DOI:10.1109/CSB.2002.1039337
Mathe E, Olivier M, Kato S, et al. Computational approaches for predicting the biological effect of p53 missense mutations: a comparison of three sequence analysis based methods. Nucleic Acids Res. 2006;34:1317–1325.
Thompson JD, Linard B, Lecompte O, et al. A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PloS One. 2011 [cited 2017 Feb 10]; 6:e18093. DOI:10.1371/journal.pone.0018093
Higgins DG, Sharp PM. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene. 1988;73:237–244.
Higgins DG, Thompson JD, Gibson TJ. Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 1996;266:383–402.
Bao Y, Bolotov P, Dernovoy D, et al. The influenza virus resource at the National Center for Biotechnology Information. J Virol. 2008;82:596–601.
Dunn CW, Hejnol A, Matus DQ, et al. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature. 2008;452:745–749.
Eaton MJ, Martin A, Thorbjarnarson J, et al. Species-level diversification of African dwarf crocodiles (Genus Osteolaemus): a geographic and phylogenetic perspective. Mol Phylogenet Evol. 2009;50:496–506.
Kuipers RK, Joosten HJ, van Berkel WJ, et al. 3DM: systematic analysis of heterogeneous superfamily data to discover protein functionalities. Proteins. 2010;78:2101–2113.
Singh S, Tokhunts R, Baubet V, et al. Sonic hedgehog mutations identified in holoprosencephaly patients can act in a dominant negative manner. Hum Genet. 2009;125:95–103.
Zhang J, Chen X, Kent M, et al. Establishment of a dog model for the p53 family pathway and identification of a novel isoform of p21 cyclin-dependent kinase inhibitor. Mol Cancer Res. 2009;7:67–78.
Levasseur A, Pontarotti P, Poch O, et al. Strategies for reliable exploitation of evolutionary concepts in high throughput biology. Evol Bioinform. 2008;4:121–137.
Loytynoja A, Goldman N. Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science. 2008;320:1632–1635.
Wong KM, Suchard MA, Huelsenbeck JP. Alignment uncertainty and genomic analysis. Science. 2008;319:473–476.
Brandt BW, Feenstra KA, Heringa J. Multi-harmony: detecting functional specificity from sequence alignment. Nucleic Acids Res. 2010;38:W35–40.
Brown DP, Krishnamurthy N, Sjolander K. Automated protein subfamily identification and classification. PLoS Comput Biol. 2007 [cited 2017 Feb 10];3:e160. DOI:10.1371/journal.pcbi.0030160
Rausell A, Juan D, Pazos F, et al. Protein interactions and ligand binding: from protein subfamilies to functional specificity. Proc Natl Acad Sci USA. 2010;107:1995–2000.
Stenson PD, Ball EV, Mort M, et al. Human gene mutation database (HGMD): 2003 update. Hum Mutat. 2003;21:577–581.
Yang Z, Ro S, Rannala B. Likelihood models of somatic mutation and codon substitution in cancer genes. Genetics. 2003;165:695–705.
Damborský J, Prokop M, Koca J. TRITON: graphic software for rational engineering of enzymes. Trends Biochem Sci. 2001;26:71–73.
Sunyaev S, Ramensky V, Bork P. Towards a structural basis of human non-synonymous single nucleotide polymorphisms. Trends Genet. 2000;16:198–200.
Ramensky V, Bork P, Sunyaev S. Human non-synonymous SNPs: server and survey. Nucleic Acids Res. 2002;30:3894–3900.
Huang Y, Zhang L, Zhang P. A framework for mining sequential patterns from spatio-temporal event data sets. IEEE Trans Knowl Data Eng. 2008; 20:433–448.
Kashyap AK, Steel J, Oner AF, et al. Combinatorial antibody libraries from survivors of the Turkish H5N1 avian influenza outbreak reveal virus neutralization strategies. Proc Natl Acad Sci USA. 2008;105:5986–5991.
Wu G. Prediction of mutations in H5N1 hemagglutinins from influenza a virus. Protein Peptide Lett. 2006;13:971–976.
Sheng C, Hsu W, Lee ML, et al. editors. Mining mutation chains in biological sequences. Proceedings of the 26th International conference on Data Engineering. 2010 Mar 1–6; Long Beach (CA): IEEE; 2010. p. 473–484. DOI:10.1109/ICDE.2010.5447869
Wei H. Mining non-contiguous mutation chain in biological sequences based on 3D-structure [dissertation]. Singapore: National University of Singapore; 2011.
Goya R, Meyer IM, Marra MA, et al. Applications of high-throughput sequencing. In: Rodríguez-Ezpeleta N, Hackenberg M, Aransay AM, editors. Bioinformatics for high throughput sequencing. New York (NY): Springer; 2012. p. 27–52.
Li R, Zhu H, Ruan J, et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 2010;20:265–272.
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760.
Langmead B, Trapnell C, Pop M, et al. Ultrafast and memory efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009 [cited 2017 Feb 10];10:R25. DOI:10.1186/gb-2009-10-3-r25
Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008;18(11):1851–1858.
Hillier LW, Marth GT, Quinlan AR, et al. Whole genome sequencing and variant discovery in C. elegans. Nat Methods. 2008;5:183–188.
Berkhin P. A survey of clustering data mining techniques. In: Kogan J, Nicholas C, Teboulle M, editors. Grouping multidimensional data. Berlin, Heidelberg: Springer; 2006. p. 25–71.
Tasoulis D, Plagianakos V, Vrahatis M. Unsupervised clustering of bioinformatics data. Paper presented at: The European Symposium on Intelligent Technologies, Hybrid Systems and their implementation on Smart Adaptive Systems, Eunite; 2004 Jun 10–12; Aachen (Germany). Available from: http://www.eunite.org/eunite/events/eunite2004/eunite2004.htm
Li J, Zhang Y, Tian Y. Medical big data analysis in hospital information system. In: Ventura Soto S, Luna JM, Cano A, editors. Big data on real-world applications. Rijeka (Croatia): InTech; 2016. [cited 2017 Jul 21]; p. 65–96. DOI:https://doi.org/10.5772/63754
Engreitz JM, Daigle JrBJ, Marshall JJ, et al. Independent component analysis: mining microarray data for fundamental human gene expression modules. J Biomed Inform. 2010;43:932–944.
Antonelli D, Baralis E, Bruno G, et al. Analysis of diagnostic pathways for colon cancer. Flex Serv Manuf J. 2012;24:379–399.
Mueller J, Von Eggeling F, Driesch D, et al. ProteinChip technology reveals distinctive protein expression profiles in the urine of bladder cancer patients. Eur Urol. 2005;47:885–894.
Frey BJ, Dueck D. Clustering by passing messages between data points. Science. 2007;315:972–976.
Bodenhofer U, Kothmeier A, Hochreiter S. APCluster: an R package for affinity propagation clustering. Bioinformatics. 2011;27:2463–2464.
Kiddle SJ, Windram OP, McHattie S, et al. Temporal clustering by affinity propagation reveals transcriptional modules in Arabidopsis thaliana. Bioinformatics. 2010;26:355–362.
Leone M, Weigt M. Clustering by soft-constraint affinity propagation: applications to gene-expression data. Bioinformatics. 2007;23:2708–2715.
Liu H, Zhou S, Guan J. Detecting microarray data supported microRNA-mRNA interactions. Int J Data Min Bioinform. 2010;4:639–655.
Tang D, Zhu Q, Yang F. A Poisson-based adaptive affinity propagation clustering for SAGE data. Comput Biol Chem. 2010;34:63–70.
Pavlopoulos GA, O'Donoghue SI, Satagopam VP, et al. Arena3D: visualization of biological networks in 3D. BMC Syst Biol. 2008 [cited 2017 Feb 5];2:104. DOI:10.1186/1752-0509-2-104
Vlasblom J, Wodak SJ. Markov clustering versus affinity propagation for the partitioning of protein interaction graphs. BMC Bioinformatics. 2009 [cited 2017 Feb 3];10:99. DOI:10.1186/1471-2105-10-99
Wozniak M, Tiuryn J, Dutkowski J. MODEVO: exploring modularity and evolution of protein interaction networks. Bioinformatics. 2010;26:1790–1791.
North B, Lehmann A, Dunbrack RL. A new clustering of antibody CDR loop conformations. J Mol Biol. 2011;406:228–256.
Pandit SB, Skolnick J. TASSER_low-zsc an approach to improve structure prediction using low z-score–ranked templates. Proteins Struct Funct Bioinform. 2010;78:2769–2780.
Wang CW, Chen KT, Lu CL. iPARTS: an improved tool of pairwise alignment of RNA tertiary structures. Nucleic Acids Res. 2010;38:W340–W347.
Yang F, Zhu Q, Tang D, et al. Using affinity propagation combined post-processing to cluster protein sequences. Protein Peptide Lett. 2010;17:681–689.
Fujiwara Y, Irie G, Kitahara T. Fast algorithm for affinity propagation. In: Walsh T, editor. Proceedings of the 22nd International Joint Conference on Artificial Intelligence. Vol. 3; 2011 Jul 16–22; Barcelona (Spain). Menlo Park (CA): AAAI Press; 2011. p. 2238–2243. Available from: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.208.3617&rep=rep1&type=pdf
Jia Y, Wang J, Zhang C, et al. editors. Finding image exemplars using fast sparse affinity propagation. Proceedings of the ACM International Conference on Multimedia (ACM MM); 2008 Oct 26–31; Vancouver (Canada): ACM; 2008. p. 639–642. DOI:10.1145/1459359.1459448
Jiang L, Dong Y, Chen N, et al. DACE: a scalable DP-means algorithm for clustering extremely large sequence data. Bioinformatics. 2017;33:834–842.
Atluri G, Gupta R, Fang G, et al. Association analysis techniques for bioinformatics problems. In: Rajasekara S, editor. Bioinformatics and computational biology. Berlin, Heidelberg (Germany): Springer; 2009. p. 1–13.
Becquet C, Blachon S, Jeudy B, et al. Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human sage data. Genome Biol. 2002 [cited 2017 Feb 5];3:research0067. DOI:10.1186/gb-2002-3-12-research0067
Creighton C, Hanash S. Mining gene expression databases for association rules. Bioinformatics. 2003;19:79–86.
Martinez R, Pasquier N, Pasquier C. GenMiner: mining non-redundant association rules from integrated gene expression data and annotations. Bioinformatics. 2008;24:2643–2644.
McIntosh T, Chawla S. High confidence rule mining for microarray analysis. IEEE/ACM Trans Comput Biol Bioinform. 2007;4:611–623.
Mohanty A, Senapati M, Lenka S. An improved data mining technique for classification and detection of breast cancer from mammograms. Neural Comput Appl. 2013;22:303–310.
Loh WY. Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov. 2011;1:14–23.
Orozova-Bekkevold I, Jensen H, Stensballe L, et al. Maternal vaccination and preterm birth: using data mining as a screening tool. Pharm World Sci. 2007;29:205–212.
Leung KS, Lee KH, Wang JF, et al. Data mining on DNA sequences of hepatitis B virus. IEEE/ACM Trans Comput Biol Bioinform. 2011;8:428–440.
Swan AL, Mobasheri A, Allaway D, et al. Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology. Omics: J Integr Biol. 2013;17:595–610.
Židek R, Sidlova V, Kasarda R, et al. Methods for distinction of cattle using supervised learning. Int J Biol Vet Agri Food Eng. 2014;8:500–502.
Breiman L. Random forests. Mach Learn. 2001;45:5–32.
John G, Langley P. Estimating continuous distributions in Bayesian classifiers. In: Besnard P, Hanks S, editors. Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence; 1995 Aug 18–20; Montréal (Canada). San Francisco (CA): Morgan Kaufmann Publishers; 1995. p. 338–345. Available from: http://dl.acm.org/citation.cfm?id=2074196
Dayhoff JE, Deleo JM. Artificial neural networks: opening the black box. Cancer. 2001;91:1615–1635.
Bacardit J, Burke E, Krasnogor N. Improving the scalability of rule-based evolutionary learning. Memetic Comput. 2009;1:55–67.
Cohen WW. Fast effective rule induction. In: Prieditis A, Russell SJ, editors. Proceedings of the 12th International Conference on Machine Learning; 1995 Jul 9–12; Tahoe City (CA). San Francisco (CA): Morgan Kaufmann Publishers; 1995. p. 115–123. Available from: http://dl.acm.org/citation.cfm?id=3091637
Fürnkranz J. Separate-and-conquer rule learning. Artif Intell Rev. 1999;13:3–54.
Schaefer C, Bromberg Y, Achten D. Rost B. Disease-related mutations predicted to impact protein function. BMC Genomics. 2012 [cited 2017 Feb 5];13(Suppl. 4):S11. DOI:10.1186/1471-2164-13-S4-S11
Yellasiri R, Rao CR. Rough set protein classifier. J Theor Appl Inform Technol. 2009;5(1):1–7.
Saha S, Chaki R. A brief review of data mining application involving protein sequence classification. In: Meghanathan N, Nagamalai D, Chaki N, editors. Advances in computing and information technology; 2013; Berlin, Heidelberg (Germany): Springer; 2013. p. 469–477. Available from: https://doi.org/10.1007/978-3-642-31552-7_48
Caragea C, Silvescu A, Mitra P. Protein sequence classification using feature hashing. Proteome Sci. 2012 [cited 2017 Feb 5];10:S14. DOI:10.1186/1477-5956-10-S1-S14
Zhao XM, Huang DS, Cheung YM, et al. A Novel Hybrid GA/SVM system for protein sequences classification. In: Yang ZR, Yin H, Everson RM, editors. Intelligent data engineering and automated learning–IDEAL. Berlin, Heidelberg (Germany): Springer; 2004. p. 11–16.
Banwait JK, Bastola DR. Contribution of bioinformatics prediction in microRNA-based cancer therapeutics. Adv Drug Deliver Rev. 2015;81:94–103.
Chandra B, Gupta M. An efficient statistical feature selection approach for classification of gene expression data. J Biomed Inform. 2011;44:529–535.
Maulik U, Mukhopadhyay A, Chakraborty D. Gene-expression-based cancer subtypes prediction through feature selection and transductive SVM. IEEE Trans Biomed Eng. 2013;60:1111–1117.
Chen Y, Wang L, Li L, et al. Informative gene selection and the direct classification of tumors based on relative simplicity. BMC Bioinformatics. 2016 [cited 2017 Jul 23];17:44. DOI:10.1186/s12859-016-0893-0
Wang H, Zhang H, Dai Z, et al. TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection. BMC Med Genomics. 2013 [cited 2017 Feb 5];6:S3. DOI:10.1186/1755-8794-6-S1-S3
Woods CT, Laederach A. Classification of RNA structure change by ‘gazing’ at experimental data. Bioinformatics. 2017;33(11):1647–1655.
Liao SH, Chu PH, Hsiao PY. Data mining techniques and applications–a decade review from 2000 to 2011. Expert Syst Appl. 2012;39:11303–11311.
Chen K, Kurgan LA. Neural networks in bioinformatics. In: Bianchini M, Maggini M, Jain LC, editors. Handbook of natural computing. Berlin, Heidelberg (Germany): Springer;2012. p. 565–583.
Lin WT, Wang SJ, Wu YC, et al. An empirical analysis on auto corporation training program planning by data mining techniques. Expert Syst Appl. 2011;38:5841–5850.
Rivas T, Paz M, Martín J, et al. Explaining and predicting workplace accidents using data-mining techniques. Reliab Eng Syst Safe. 2011;96:739–747.
Cesana M, Cerutti R, Grossi E, et al. Bayesian data mining techniques: the evidence provided by signals detected in single-company spontaneous reports databases. Drug Inf J. 2007;41:16–28.
Trafalis TB, White A. Data mining techniques for pattern recognition: tornado signatures in doppler weather radar data. Int J Smart Eng Syst Des. 2003;5:347–359.
Zhang C, Ramirez-Marquez JE. Approximation of minimal cut sets for a flow network via evolutionary optimization and data mining techniques. Int J Performability Eng. 2011;7:21–31.
Aliev RA, Aliev RR, Guirimov B, et al. Dynamic data mining technique for rules extraction in a process of battery charging. Appl Soft Comput. 2008;8:1252–1258.
Ma PCH, Chan KCC. An effective data mining technique for reconstructing gene regulatory networks from time series expression data. J Bioinform Comput Biol. 2007;5:651–668.
Tu C, Chang C, Chen K, et al. Application of data mining technique in the performance analysis of shipping and freight enterprise and the construction of stock forecast model. J Converg Infor Technol. 2011;6:331–342.
Dutta M, Mukhopadhyay A, Chakrabarti S. Effect of galvanizing parameters on spangle size investigated by data mining technique. ISIJ Int. 2004;44:129–138.
Tsai C, Chen M. Using adaptive resonance theory and data-mining techniques for materials recommendation based on the e-library environment. Electron Libr. 2008;26:287–302.
Srivastava AN, Oza NC, Stroeve J. Virtual sensors: using data mining techniques to efficiently estimate remote sensing spectra. IEEE Trans Geosci Remote Sensing. 2005;43:590–600.
Brutlag D, Davison D, Chang AC. BIOMEDIN 231: computational molecular biology [Internet]. 2014 [cited 2017 Feb 5]; Available from: http://cmgm3.stanford.edu/biochem/biochem218/Projects%202014/Chang.pdf
Lancashire LJ, Lemetre C, Ball GR. An introduction to artificial neural networks in bioinformatics—application to complex microarray and mass spectrometry datasets in cancer studies. Brief Bioinform. 2009 [cited 2017 Feb 5];bbp012:1–15. DOI:10.1093/bib/bbp012
Garraway LA, Lander ES. Lessons from the cancer genome. Cell. 2013;153:17–37.
Hofree M, Shen JP, Carter H, et al. Network-based stratification of tumor mutations. Nat Methods. 2013;10:1108–1115.
Chang JN, Collisson EA, Mills GB, et al. The cancer genome atlas pan-cancer analysis project. Nat Genet. 2013;45:1113–1120.
Winter C, Kristiansen G, Kersting S, et al. Google goes cancer: improving outcome prediction for cancer patients by network-based ranking of marker genes. PLOS Comput Biol. 2012 [cited 2017 Feb 5];8: 002511. DOI:10.1371/journal.pcbi.1002511
Xiang Y, Zhang CQ, Huang K. Predicting glioblastoma prognosis networks using weighted gene co-expression network analysis on TCGA data. BMC Bioinform. 2012 [cited 2017 Feb 5];13:S12. DOI:10.1186/1471-2105-13-S2-S12
March HN, Rust AG, Wright NA, et al. Insertional mutagenesis identifies multiple networks of cooperating genes driving intestinal tumorigenesis. Nat Genet. 2011;43:1202–1209.
Rozenblatt-Rosen O, Deo RC, Padi M, et al. Interpreting cancer genomes using systematic host network perturbations by tumour virus proteins. Nature. 2012;487:491–495.
Thomas R, Thomas RS, Auerbach SS, et al. Biological networks for predicting chemical hepatocarcinogenicity using gene expression data from treated mice and relevance across human and rat species. PLoS One. 2013 [cited 2017 Feb 5];8:e63308. DOI:https://doi.org/10.1371/journal.pone.0063308
Won HH, Kim JW, Lee DA. Bayesian ensemble approach with a disease gene network predicts damaging effects of missense variants of human cancers. Hum Genet. 2013;132:15–27.
Horn H, Lawrence MS, Hu JX, et al. A comparative analysis of network mutation burdens across 21 tumor types augments discovery from cancer genomes. BioRxiv. 2015 [cited 2017 Feb 5];025445. DOI:https://doi.org/10.1101/025445
Shannon P, Markiel A, Ozier O, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–504.
Goh KI, Cusick ME, Valle D, et al. The human disease network. Proc Natl Acad Sci USA. 2007;104:8685–8690.
Bauer-Mehren A, Bundschus M, Rautschka M, et al. Gene-disease network analysis reveals functional modules in mendelian, complex and environmental diseases. PloS One. 2011 [cited 2017 Feb 5];6:e20284. DOI:h10.1371/journal.pone.0020284
Godinez WJ, Hossain I, Lazic SE, et al. A multi-scale convolutional neural network for phenotyping high-content cellular images. Bioinformatics. 2017;33(13):2010–2019.
Zhao Z, Yang Z, Lin H, et al. Drug drug interaction extraction from biomedical literature using syntax convolutional neural network. Bioinformatics. 2016;32:3444–3453.
Yaseen A, Li Y. Context-based features enhance protein secondary structure prediction accuracy. J Chem Inf Model. 2014;54:992–1002.
Meiler J, Baker D. Coupled prediction of protein secondary and tertiary structure. Proc Natl Acad Sci USA. 2003;100:12105–12110.
Garnier J, Osguthorpe DJ, Robson J. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol. 1978;120:97–120.
Thompson MJ, Goldstein RA. Predicting protein secondary structure with probabilistic schemata of evolutionary derived information. Protein Sci. 1997;6:1963–1975.
Bordoloi H, Sarma KK. Protein structure prediction using artificial neural network. Int J Comput Appl Electron Inf Commun Eng. 2011;3:22–25.
Pham TH, Satou K, Ho TB. Support vector machines for prediction and analysis of beta and gamma-turns in proteins. J Bioinform Comput Biol. 2005;03:343–358.
Jaiswal K. Prediction of uniquitin proteins using artificial neural networks, hidden Markov models, and support vector machines. In Slico Biol. 2007;7:559–568.
Zhang Q, Yoon S, Welsh WJ. Improved method for predicting beta-turn using support vector machine. Bioinformatics. 2005;21:2370–2374.
Johal AK, Singh R. Protein secondary structure prediction using improved support vector machine and neural networks. Int J Eng Comp Sci. 2014;3:3593–3597.
Bakhtiarizadeh MR, Moradi-Shahrbabak M, Ebrahimi M, et al. Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology. J Theor Biol. 2014;356:213–222.
Salamov AA, Solovyev VV. Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. J Mol Biol. 1995;247:11–15.
Simas GM, Botelho SSC, Grando N, et al. Dimensional reduction in protein secondary structure prediction- nonlinear method improvements in innovations in hybrid intelligent systems. In: Corchado E, Corchado J, Abraham A, editors. Innovations in hybrid intelligent systems. Berlin, Heidelberg (Germany): Springer; 2007. p. 425–432.
Uziela K, Hurtado DM, Shu N, et al. ProQ3D: improved model quality assessments using deep learning. Bioinformatics. 2017;33(10):1578–1580.
Gao J, Yang Y, Zhou Y. Predicting the errors of predicted local backbone angles and non-local solvent- accessibilities of proteins by deep neural networks. Bioinformatics. 2016;32:3768–3773.
Zeng H, Edwards MD, Liu G, et al. Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics. 2016;32:i121–i127.
Prati RC, Batista GE, Monard MC. A survey on graphical methods for classification predictive performance evaluation. IEEE Trans Knowl Data Eng. 2011;23:1601–1618.
Cao X, Maloney K, Brusic V. Data mining of cancer vaccine trials: a bird's-eye view. Immunome Res. 2008;4:7.
Ren J, Lu J, Wang L, et al. Data visualization in bioinformatics. Adv Inf Sci Serv Sci. 2012;4:157–165.
Amir ED, Davis KL, Tadmor MD, et al. viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nat Biotechnol. 2013;31:545–552.
Tao Y, Liu Y, Friedman C, et al. Information visualization techniques in bioinformatics during the postgenomic era. Drug Discov Today. 2004;2:237–245.
Shneiderman B, editor. The eyes have it: a task by data type taxonomy for information visualizations. Proceedings of the IEEE Symposium on Visual Languages; 1996 Sep 3–6; Boulder (CO): IEEE Xplore; 1996. p. 336–343. DOI:10.1109/VL.1996.545307
Keim DA, Ankerst M. Visual data mining and exploration of large databases. Paper presented at: Tutorial at the 5th European Conference on Principles and Practice of Knowledge Discovery in Databases; 2001 Sep 3–5; Freiburg (Germany).
Chi EH, editor. A taxonomy of visualization techniques using the data state reference model. Proceedings of the IEEE Symposium on Information Visualization; 2000 Oct 9–10; Salt Lake City (UT): IEEE Xplore; 2000. p. 69–75. DOI:10.1109/INFVIS.2000.885092
Pfitzner D, Hobbs V, Powers D. A unified taxonomic framework for information visualization. In: Pattison T, Thomas B, editors. Proceedings of the Asia-Pacific Symposium on Information Visualisation. Vol. 24. 2003 Feb 3–4; Adelaide (Australia). Darlinghurst (Australia): Australian Computer Society; 2003. p. 57–66. Available from: http://dl.acm.org/citation.cfm?id=857087
Hérisson J, Gherbi R, editors. Model-based prediction of the 3D trajectory of huge DNA sequences interactive visualization and exploration. Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering (BIBE’01); 2001 Nov 4–6; Bethesda (MD). Washington (DC): IEEE Computer Society; 2001. p. 263–270. Available from: http://www.computer.org/csdl/proceedings/bibe/2001/1423/00/14230263.pdf
Doncheva NT, Klein K, Morris JH, et al. Integrative visual analysis of protein sequence mutations. BMC Proc. 2014;8(Suppl. 2):S2. DOI:10.1186/1753-6561-8-S2-S2
Vehlow C, Kao DP, Bristow MR, et al. Visual analysis of biological data-knowledge networks. BMC Bioinformatics. 2015 [cited 2017 Jan 28];16:135. DOI:10.1186/s12859-015-0550-z
Kuntal BK, Mande SS. Web-igloo: a web based platform for multivariate data visualization. Bioinformatics. 2017;33:615–617.
Schadt EE, Turner S, Kasarskis A. A window into third-generation sequencing. Hum Mol Genet. 2010;19:R227–R240.
Schlötterer C, Tobler R, Kofler R, et al. Sequencing pools of individuals–mining genome-wide polymorphism data without big funding. Nat Rev Genet. 2014;15:749–763.
Cao C, Sun X. Combinatorial pooled sequencing: experiment design and decoding. Quant Biol. 2016;4:36–46.
Feuillet C, Leach JE, Rogers J, et al. Crop genome sequencing: lessons and rationales. Trends Plant Sci. 2011;16:77–88.
Golestan Hashemi FS, Rafii MY, Ismail MR, et al. Biochemical, genetic and molecular advances of fragrance characteristics in rice. Crit Rev Plant Sci. 2013;32:445–457.
Golestan Hashemi FS, Rafii MY, Ismail MR, et al. The genetic and molecular origin of natural variation for the fragrance trait in an elite Malaysian aromatic rice through quantitative trait loci mapping using SSR and gene-based markers. Gene. 2015;555:101–107.