Statistical treatment of 2D NMR COSY spectra in metabolomics: data preparation, clustering-based evaluation of the Metabolomic Informative Content and comparison with 1H-NMR
[en] Compared with the widely used 1H-NMR spectroscopy, two-dimensional NMR experiments provide more sophisticated spectra which should facilitate the identification of relevant spectral zones or biomarkers in metabolomics. This paper focuses on 1H-1H COrrelation SpectroscopY (COSY) spectral data. In spite of longer inherent acquisition times, it is commonly accepted by users (biologists, healthcare professionals) that the introduction of an additional dimension probably represents a huge qualitative step for investigations in terms of metabolites identification. Moreover, it seems natural that more information leads to more predictive power. But, until now, very few statistical studies clearly proved this assumption. Therefore a fundamental question is “Is this supplementary information relevant?”. In order to extend the statistical properties developed for 1D spectroscopy to the challenges raised by 2D spectra, a rigorous study of the performances of COSY spectra is needed as a prerequisite. Having introduced new pre-processing concepts, such as the Global Peak List or an ad hoc 2D “bucketing”, this paper presents an innovative methodology based on multivariate clustering algorithms to evaluate this question. Numerical clustering quality indexes and graphical results are proposed, based both on the spectral presence or absence of peaks (binary position vectors) and on peak intensities, and through different levels of spectral resolution. The second goal of this paper is to compare clustering performances obtained on COSY and on 1H-NMR spectra, with the aim of understanding to what extent the COSY spectra carry more Metabolomic Informative Content about the signal than 1D ones. The methodology is applied to two real experimental designs involving different groups of spectra (which define the signal): a 4-mixture cell culture media containing various supervised metabolites and a complex human serum based design. It is shown that COSY spectra appear to be statistically powerful and, in addition, provide better clustering results than corresponding 1H-NMR when using unlabeled information. Consequently, additional information appears to be relevant for metabolomics applications
Disciplines :
Pharmacy, pharmacology & toxicology
Author, co-author :
Feraud, Baptiste; Université Catholique de Louvain - UCL > Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA)
Govaerts, Bernadette; Université Catholique de Louvain - UCL > Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA)
Verleysen, Michel; Université Catholique de Louvain - UCL > Machine learning group
De Tullio, Pascal ; Université de Liège > Département de pharmacie > Chimie pharmaceutique
Language :
English
Title :
Statistical treatment of 2D NMR COSY spectra in metabolomics: data preparation, clustering-based evaluation of the Metabolomic Informative Content and comparison with 1H-NMR
Akitt J.W., Mann B.E. (2000). NMR and Chemistry (Manual), Cheltenham UK, Stanley Thornes. p. 287.
Aue, W. P., Bartholdi, E., & Ernst, R. R. (1976). Two-dimensional spectroscopy. Application to nuclear magnetic resonance. The Journal of Chemical Physics, 64, 2229–2246.
Bruschweiler, R., & Bingol, K. (2011). Deconvolution of chemical mixtures with high complexity by NMR consensus trace clustering. Analytical Chemistry, 83(19), 7412–7417.
Bruschweiler, R., Bingol, K., Bruschweiler-Li, L., & Li, D.-W. (2014). Customized metabolomics database for the analysis of NMR1H-1H TOCSY and13C-1H HSQC-TOCSY Spectra of Complex Mixtures. Analytical Chemistry, 86(11), 5494–5501.
Craig, A., Cloarec, O., Holmes, E., Nicholson, J. K., & Lindon, J. C. (2006). Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Analytical Chemistry, 78, 2262–2267.
Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI–1(2), 224–227.
Dunn, J. C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3(3), 32–57.
Giraudeau, P., Remaud, G., & Akoka, S. (2009). Evaluation of Ultrafast 2D NMR for quantitative analysis. Analytical Chemistry, 81(1), 479–484.
Hartigan, J. A., & Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics, 28, 100–108.
Holliday, J. D., Hu, C. Y., & Willett, P. (2002). Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. Combinatorial Chemistry and High Throughput Screening, 5(2), 155–166.
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Iman, R. L. (2008). Latin hypercube sampling. New York: Wiley.
Keeler, J. (2010). Understanding NMR Spectroscopy (2nd ed., pp. 190–191). New York: Wiley.
Le Guennec, A., Giraudeau, P., & Caldarelli, S. (2014). Evaluation of fast 2D NMR for metabolomics. Analytical chemistry, 86(12), 5946–5954.
Lloyd S. P., Least squares quantization in PCM, Technical Note, Bell Laboratories, IEEE Transactions on Information Theory 28, pp. 128-137 (1957, 1982).
MacKay, D. (2003). An Example Inference Task: Clustering, Information Theory, Inference and Learning Algorithms (pp. 284–292). Cambridge: Cambridge University Press.
MacQueen J. B. (1967). Some Methods for classification and Analysis of Multivariate Observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, (vol 1), University of California Press, (pp. 281-297).
Mao, X., & Ye, C. (1997). Phase-shift presaturation for water peak suppression in biomolecular NMR experiments, Science in China. Series C, Life sciences, 40(4), 345–350.
Marion, D., & Bax, A. (1988). Baseline distortion in real-fourier-transform NMR spectra. Journal of Magnetic Resonance (1969), 79(2), 252–356.
Murtagh, F., & Contreras, P. (2012). Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1), 86–97.
Nicholson, J., Connelly, J., Lindon, J. C., & Holmes, E. (2002). Metabonomics: a generic platform for the study of drug toxicity and gene function. Nature Reviews Drug Discovery, 1, 153–161.
Plasse, M., Niang, N., Saporta, G., Villeminot, A., & Leblond, L. (2007). Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large data set. Computational Statistics and Data Analysis, 52(1), 596–613.
Queiroz Junior, L. H. K., Ferreira, A. G., & Giraudeau, P. (2013). Optimization and practical implementation of ultrafast 2D NMR experiments. Quimica Nova, 36(4), 577–581.
Rasmussen, L. G., Savorani, F., Larsen, T. M., Dragsted, L. O., Astrup, A., & Engelsen, S. B. (2011). Standardization of factors that influence human urine metabolomics. Metabolomics, 7(1), 71–83.
Rousseau R., Statistical contribution to the analysis of metabonomic data in 1H-NMR spectroscopy, PhD Thesis, UCL, http://hdl.handle.net/2078.1/75532 (2011).
Santos, J. M., & Embrechts, M. (2009). On the use of the adjusted rand index as a metric for evaluating supervised classification, Artificial Neural Networks, ICANN 2009. Berlin: Springer.
Sousa, S. A., Magalhaes, A., & Castro Ferreira, M. M. (2013). Optimized bucketing for NMR spectra. Chemometrics and Intelligent Laboratory Systems, 122, 93–102.
Vanwinsberghe J., Bubble: development of a matlab tool for automated 1H-NMR data processing in metabonomics, Master’s thesis, Université de Strasbourg (2005).
Vega-Vazquez, M., Cobas, J. C., & Martin-Pastor, M. (2010). Fast multidimensional localized parallel NMR spectroscopy for the analysis of samples. Magnetic Resonance in Chemistry, 48(10), 749–752.
Ward, J. H. (1963). Hierarchical Grouping to optimize an objective function. Journal of American Statistical Association, 58(301), 236–244.
Xi, Y., deRopp, J. S., Viant, M., Woodruff, D., & Yu, P. (2007). Automated screening for metabolites in complex mixtures using 2D COSY NMR spectroscopy. Metabolomics, 2(4), 221–233.
Xia, J., & Wishart, D. (2010). MetPA: a web-based metabolomics tool for pathway analysis and visualization. Bioinformatics, 26(18), 2342–2344.
Yun, K., Sunghyouk, P., Jongheon, S., & Dong-Chan, O. (2013). Application of 13C-labeling and 13C-13C COSY NMR experiments in the structure determination of a microbial natural product. Archive of Pharmacal Research,. doi:10.1007/s12272-013-0254-8.