multivariate analysis; surprisal analysis; High order SVD
Résumé :
[en] We consider here multivariate data which we understand as the problem where each data point i is measured for two or more distinct variables. In a typical situation there are many data points i while the range of the different variables is more limited. If there is only one variable then the data can be arranged as a rectangular matrix where i is the index of the rows while the values of the variable label the columns. We begin here with this case, but then proceed to the more general case with special emphasis on two variables when the data can be organized as a tensor. An analysis of such multivariate data by a maximal entropy approach is discussed and illustrated for gene expressions in four different cell types of six different patients. The different genes are indexed by i, and there are 24 (4 by 6) entries for each i. We used an unbiased thermodynamic maximal-entropy based approach (surprisal analysis) to analyze the multivariate transcriptional profiles. The measured microarray experimental data is organized as a tensor array where the two minor orthogonal directions are the different patients and the different cell types. The entries are the transcription levels on a logarithmic scale. We identify a disease signature of prostate cancer and determine the degree of variability between individual patients. Surprisal analysis determined a baseline expression level common for all cells and patients. We identify the transcripts in the baseline as the “housekeeping” genes that insure the cell stability. The baseline and two surprisal patterns satisfactorily recover (99.8%) the multivariate data. The two patterns characterize the individuality of the patients and, to a lesser extent, the commonality of the disease. The immune response was identified as the most significant pathway contributing to the cancer disease pattern. Delineating patient variability is a central issue in personalized diagnostics and it remains to be seen if additional data will confirm the power of multivariate analysis to address this key point. The collapsed limits where the data is compacted into two dimensional arrays are contained within the proposed formalism.
Centre/Unité de recherche :
Theoretical Physical Chemistry
Disciplines :
Sciences du vivant: Multidisciplinaire, généralités & autres
Auteur, co-auteur :
Remacle, Françoise ; Université de Liège > Département de chimie (sciences) > Laboratoire de chimie physique théorique
Goldstein, S. Andrew
Levine, D. Raphael
Langue du document :
Anglais
Titre :
Multivariate Surprisal Analysis of Gene Expression Levels
Date de publication/diffusion :
2017
Titre du périodique :
Entropy
eISSN :
1099-4300
Maison d'édition :
MDPI, Basel, Suisse
Volume/Tome :
18
Fascicule/Saison :
12
Pagination :
445
Peer reviewed :
Peer reviewed vérifié par ORBi
Projet européen :
FP7 - 618024 - BAMBI - Bottom-up Approaches to Machines dedicated to Bayesian Inference
Alhassid, Y.; Levine, R.D. Connection between maximal entropy and scattering theoretic analyses of collision processes. Phys. Rev. A 1978, 18, 89-116.
Levine, R.D. Information theory approach to molecular reaction dynamics. Annu. Rev. Phys. Chem. 1978, 29, 59-92.
Levine, R.D.; Bernstein, R.B. Energy disposal and energy consumption in elementary chemical reactions. Information theoretic approach. Acc. Chem. Res. 1974, 7, 393-400.
Kravchenko-Balasha, N.; Levitzki, A.; Goldstein, A.; Rotter, V.; Gross, A.; Remacle, F.; Levine, R.D. On a fundamental structure of gene networks in living cells. Proc. Natl. Acad. Sci. USA 2012, 109, 4702-4707.
Remacle, F.; Kravchenko-Balasha, N.; Levitzki, A.; Levine, R.D. Information-theoretic analysis of phenotype changes in early stages of carcinogenesis. Proc. Natl. Acad. Sci. USA 2010, 107, 10324-10329.
Zadran, S.; Remacle, F.; Levine, R.D. miRNA and mRNA cancer signatures determined by analysis of expression levels in large cohorts of patients. Proc. Natl. Acad. Sci. USA 2013, 110, 19160-19165.
Remacle, F.; Levine, R.D. Statistical thermodynamics of transcription profiles in normal development and tumorigeneses in cohorts of patients. Eur. Biophys. J. 2015, 44, 709-726.
Zadran, S.; Arumugam, R.; Herschman, H.; Phelps, M.E.; Levine, R.D. Surprisal analysis characterizes the free energy time course of cancer cells undergoing epithelial-to-mesenchymal transition. Proc. Natl. Acad. Sci. USA 2014, 109, 4702-4707.
Mora, T.; Walczak, A.; Bialek, W.; Callan, C.G., Jr. Maximum entropy models for antibody diversity. Proc. Natl. Acad. Sci. USA 2009, 107, 5405-5410.
Lezon, T.R.; Banavar, J.R.; Cieplak, M.; Maritan, A.; Fedoroff, N.V. Using the principle of entropy maximization to infer genetic interaction networks from gene expression patterns. Proc. Natl. Acad. Sci. USA 2006, 103, 19033-19038.
Aghagolzadeh, M.; Soltanian-Zadeh, H.; Araabi, B.N. Information theoretic hierarchical clustering. Entropy 2011, 13, 450-465.
Margolin, A.A.; Nemenman, I.; Basso, K.; Wiggins, C.; Stolovitzky, G.; Dalla Favera, R.; Califano, A. Aracne: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinform. 2006, 7 (Suppl. S1), S7.
Margolin, A.A.; Califano, A. Theory and limitations of genetic network inference from microarray data. Ann. N. Y. Acad. Sci. 2007, 1115, 51-72.
Yeung, M.K.; Tegner, J.; Collins, J.J. Reverse engineering gene networks using singular value decomposition and robust regression. Proc. Natl. Acad. Sci. USA 2002, 99, 6163-6168.
Shin, Y.S.; Remacle, F.; Fan, R.; Hwang, K.; Wei, W.; Ahmad, H.; Levine, R.D.; Heath, J.R. Protein signaling networks from single cell fluctuations and information theory profiling. Biophys. J. 2011, 100, 2378-2386.
Schneidman, E.; Still, S.; Berry, M.J.; Bialek, W. Network information and connected correlations. Phys. Rev. Lett. 2003, 91, 238701.
Rosvall, M.; Bergstrom, C.T. An information-theoretic framework for resolving community structure in complex networks. Proc. Natl. Acad. Sci. USA 2007, 104, 7327-7331.
Quigley, D.A.; To, M.D.; Kim, I.J.; Lin, K.K.; Albertson, D.G.; Sjolund, J.; Pérez-Losada, J.; Balmain, A. Network analysis of skin tumor progression identifies a rewired genetic architecture affecting inflammation and tumor susceptibility. Genome Biol. 2011, 12, R5.
Nykter, M.; Price, N.D.; Larjo, A.; Aho, T.; Kauffman, S.A.; Yli-Harja, O.; Shmulevich, I. Critical networks exhibit maximal information diversity in structure-dynamics relationships. Phys. Rev. Lett. 2008, 100, 058702.
Alter, O. Genomic signal processing: From matrix algebra to genetic networks. In Microarray Data Analysis: Methods and Applications; Korenberg, M.J., Ed.; Humana Press: Totowa, NJ, USA, 2007.
Golub, T.R.; Slonim, D.K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J.P.; Coller, H.; Loh, M.L.; Downing, J.R.; Caligiuri, M.A.; et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 1999, 286, 531-537.
Gross, A.; Levine, R.D. Surprisal analysis of transcripts expression levels in the presence of noise: A reliable determination of the onset of a tumor phenotype. PLoS ONE 2013, 8, e61554.
Gross, A.; Li, C.M.; Remacle, F.; Levine, R.D. Free energy rhythms in saccharomyces cerevisiae: A dynamic perspective with implications for ribosomal biogenesis. Biochemistry 2013, 52, 1641-1648.
Kravchenko-Balashaa, N.; Remacle, F.; Gross, A.; Rotter, V.; Levitzki, A.; Levine, R.D. Convergence of logic of cellular regulation in different premalignant cells by an information theoretic approach. BMC Syst. Biol. 2011, 5, 42.
Wei, W.; Shi, Q.H.; Remacle, F.; Qin, L.D.; Shackelford, D.B.; Shin, Y.S.; Mischel, P.S.; Levine, R.D.; Heath, J.R. Hypoxia induces a phase transition within a kinase signaling network in cancer cells. Proc. Natl. Acad. Sci. USA 2013, 110, E1352-E1360.
De Lathauwer, L.; De Moor, B.; Vandewalle, J. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 2000, 21, 1253-1278.
Tucker, L.R. Some mathematical notes on three-mode factor analysis. Psychometrika 1966, 31, 279-311.
Alon, U. An Introduction to Systems Biology; CRC Press: Boca Raton, FL, USA, 2007.
Willamme, R.; Alsafra, Z.; Arumugam, R.; Eppe, G.; Remacle, F.; Levine, R.D.; Remacle, C. Metabolomic analysis of the green microalga chlamydomonas reinhardtii cultivated under day/night conditions. J. Biotechnol. 2015, 215, 20-26.
Omberg, L.; Golub, G.H.; Alter, O. A tensor higher-order singular value decomposition for integrative analysis of DNA microarray data from different studies. Proc. Natl. Acad. Sci. USA 2007, 104, 18371-18376.
Ponnapalli, S.P.; Saunders, M.A.; Van Loan, C.F.; Alter, O. A higher-order generalized singular value decomposition for comparison of global mrna expression from multiple organisms. PLoS ONE 2011, 6, e28072.
Sankaranarayanan, P.; Schomay, T.E.; Aiello, K.A.; Alter, O. Tensor GSVD of patient-and platform-matched tumor and normal DNA copy-number profiles uncovers chromosome arm-wide patterns of tumor-exclusive platform-consistent alterations encoding for cell transformation and predicting ovarian cancer survival. PLoS ONE 2015, 10, e0121396.
Zadran, S.; Remacle, F.; Levine, R.D. Microfluidic chip with molecular beacons detects miRNAs in human CSF to reliably characterize CNS-specific disorders. RNA Dis. 2016, 3, e1183.
Huang, D.W.; Sherman, B.T.; Lempicki, R.A. Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nat. Protoc. 2008, 4, 44-57.