Two data pre‑processing workflows to facilitate the discovery of biomarkers by 2D NMR metabolomics

[en] Abstract Introduction The pre-processing of analytical data in metabolomics must be considered as a whole to allow the construction of a global and unique object for any further simultaneous data analysis or multivariate statistical modelling. For 1D 1H-NMR metabolomics experiments, best practices for data pre-processing are well defined, but not yet for 2D experiments (for instance COSY in this paper). Objective By considering the added value of a second dimension, the objective is to propose two workflows dedicated to 2D NMR data handling and preparation (the Global Peak List and Vectorization approaches) and to compare them (with respect to each other and with 1D standards). This will allow to detect which methodology is the best in terms of amount of metabolomic content and to explore the advantages of the selected workflow in distinguishing among treatment groups and identifying relevant biomarkers. Therefore, this paper explores both the necessity of novel 2D pre-processing workflows, the evaluation of their quality and the evaluation of their performance in the subsequent determination of accurate (2D) biomarkers. Methods To select the more informative data source, MIC (Metabolomic Informative Content) indexes are used, based on clustering and inertia measures of quality. Then, to highlight biomarkers or critical spectral zones, the PLS-DA model is used, along with more advanced sparse algorithms (sPLS and L-sOPLS). Results Results are discussed according to two different experimental designs (one which is unsupervised and based on human urine samples, and the other which is controlled and based on spiked serum media). MIC indexes are shown, leading to the choice of the more relevant workflow to use thereafter. Finally, biomarkers are provided for each case and the predictive power of each candidate model is assessed with cross-validated measures of RMSEP. Conclusion In conclusion, it is shown that no solution can be universally the best in every case, but that 2D experiments allow to clearly find relevant cross peak biomarkers even with a poor initial separability between groups. The MIC measures linked with the candidate workflows (2D GPL, 2D vectorization, 1D, and with specific parameters) lead to visualize which data set must be used as a priority to more easily find biomarkers. The diversity of data sources, mainly 1D versus 2D, may often lead to complementary or confirmatory results.

Disciplines :

Human health sciences: Multidisciplinary, general & others

Author, co-author :

Feraud, Baptiste

Leenders, Justine ; Université de Liège - ULiège > Département des sciences cliniques > Labo de biologie des tumeurs et du développement

Martineau, Estelle

Giraudeau, Patrick

Govaerts, Bernadette

De Tullio, Pascal ; Université de Liège - ULiège > CIRM > Metabolomics group

Language :

English

Title :

Two data pre‑processing workflows to facilitate the discovery of biomarkers by 2D NMR metabolomics

Publication date :

2019

Journal title :

Metabolomics

ISSN :

1573-3882

eISSN :

1573-3890

Publisher :

Springer, Germany

Volume :

Issue :

Peer reviewed :

Peer Reviewed verified by ORBi

Available on ORBi :

since 24 April 2019

Statistics

Number of views

146 (4 by ULiège)

Number of downloads

2 (1 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenAlex citations

Bibliography

Barna, J. C., & Laue, E. D. (1987). Conventional and exponential sampling for 2D NMR experiments with application to a 2D NMR spectrum of a protein. Journal of Magnetic Resonance (1969), 75(2), 384–389.
Bylesjo, M., Rantalainen, M., Cloarec, O., & Nicholson, J. (2006). OPLS discriminant analysis: Combining the strengths of PLS-DA and SIMCA classification. Journal of Chemometrics, 20(8–10), 341–351.
Chung, D., & Chun, H. (2012). Keles S, Spls: Sparse partial least squares (SPLS) regression and classification. R package, version, 2, 1–1.
Chun, H., & Keles, S. (2007). Sparse partial least squares regression with an application to genome scale transcription factor analysis. Madison: Department of Statistics, University of Wisconsin.
Craig, A., Cloarec, O., Holmes, E., Nicholson, J. K., & Lindon, J. C. (2006). Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Analytical Chemistry, 78(7), 2262–2267.
Dieterle, F., Ross, A., Schlotterbeck, G., & Senn, H. (2006). Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Analytical Chemistry, 78(13), 4281–4290.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32(2), 407–499.
Feraud, B. (2019). Statistical contributions to the analysis of 2D NMR spectra in metabolomics studies: From pre-processing workflows to 2D biomarker discovery. http://hdl.handle.net/2078.1/214124.
Feraud, B., Govaerts, B., Verleysen, M., & De Tullio, P. (2015). Statistical treatment of 2D NMR COSY spectra in metabolomics: Data preparation, clustering-based evaluation of the metabolomic informative content and comparison with 1 H-NMR. Metabolomics, 11(6), 1756–1768.
Feraud, B., Munaut, C., Martin, M., Verleysen, M., & Govaerts, B. (2017). Combining strong sparsity and competitive predictive power with the L-sOPLS approach for biomarker discovery in metabolomics. Metabolomics, 13(11), 130.
Frydman, L., Scherf, T., & Lupulescu, A. (2002). The acquisition of multidimensional NMR spectra within a single scan. Proceedings of the National Academy of Sciences, 99(25), 15858–15862.
Giraudeau, P. (2014). Quantitative 2D liquid-state NMR. Magnetic Resonance in Chemistry, 52(6), 259–272.
Giraudeau, P., Tea, I., Remaud, G. S., & Akoka, S. (2014). Reference and normalization methods: Essential tools for the intercomparison of NMR spectra. Journal of Pharmaceutical and Biomedical Analysis, 93, 3–16.
Hoch, J. C., Maciejewski, M. W., Mobli, M., Schuyler, A. D., & Stern, A. S. (2014). Non-uniform sampling and maximum entropy reconstruction in multidimensional NMR. Accounts of Chemical Research, 47(2), 708–717.
Jezequel, T., Deborde, C., Maucourt, M., Zhendre, V., Moing, A., & Giraudeau, P. (2015). Absolute quantification of metabolites in tomato fruit extracts by fast 2D NMR. Metabolomics, 11(5), 1231–1242.
Le Guennec, A., Giraudeau, P., & Caldarelli, S. (2014). Evaluation of fast 2D NMR for metabolomics. Analytical Chemistry, 86(12), 5946–5954.
Le Guennec, A., Tea, I., Antheaume, I., Martineau, E., Charrier, B., Pathan, M., et al. (2012). Fast determination of absolute metabolite concentrations by spatially encoded 2D NMR: Application to breast cancer cell extracts. Analytical Chemistry, 84(24), 10831–10837.
Liland, K. H. (2011). Multivariate methods in metabolomics, from pre-processing to dimension reduction and statistical analysis. TrAC Trends in Analytical Chemistry, 30(6), 827–841.
MacQueen, J. B. (1967). Some Methods for classification and Analysis of Multivariate Observations, Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, 1, University of California Press, pp. 281–297.
Marchand, J., Martineau, E., Guitton, Y., Dervilly-Pinel, G., & Giraudeau, P. (2017). Multidimensional NMR approaches towards highly resolved, sensitive and high-throughput quantitative metabolomics. Current Opinion in Biotechnology, 43, 49–55.
Marchand, J., Martineau, E., Guitton, Y., Le Bizec, B., Dervilly-Pinel, G., & Giraudeau, P. (2018). A multidimensional 1 H-NMR lipidomics workflow to address chemical food safety issues. Metabolomics, 14(5), 60.
Marjanska, M., Henry, P. G., Ugurbil, K., & Gruetter, R. (2008). Editing through multiple bonds: Threonine detection. Magnetic Resonance in Medicine, 59(2), 245–251.
Martin, M., Legat, B., Leenders, J., Vanwinsberghe, J., Rousseau, R., et al. (2017). PepsNMR for the 1H-NMR metabolomic data pre-processing. ISBA Discussion Paper, 2017/22, http://hdl.handle.net/2078.1/187159.
Martineau, E., Tea, I., Akoka, S., & Giraudeau, P. (2012). Absolute quantification of metabolites in breast cancer cell extracts by quantitative 2D 1H INADEQUATE NMR. NMR in Biomedicine, 25(8), 985–992.
Murtagh, F., & Legendre, P. (2011). Ward’s hierarchical clustering method: clustering criterion and agglomerative algorithm, arXiv preprint arXiv:1111.6285.
Ravanbakhsh, S., Liu, P., Bjorndahl, T., Mandal, R., Grant, J. R., Wilson, M., & Greiner, R. (2014). Accurate, fully-automated NMR spectral profiling for metabolomics. arXiv:1409.1456.
Rist, M. J., Roth, A., Frommherz, L., Weinert, C. H., Kruger, R., Merz, B., et al. (2017). Metabolite patterns predicting sex and age in participants of the Karlsruhe Metabolomics and Nutrition (KarMeN) study. PLoS ONE, 12(8), e0183228.
Rouger, L., Gouilleux, B., & Giraudeau, P. (2017). Fast n-dimensional data acquisition methods. Encyclopedia of spectroscopy and spectrometry (pp. 588–596). Oxford: Academic Press.
Rousseau, R. (2011). Statistical contribution to the analysis of metabonomic data in 1 H-NMR spectroscopy, PhD Thesis, UCL, http://hdl.handle.net/2078.1/75532.
Sousa, S. A., Magalhaes, A., & Castro Ferreira, M. M. (2013). Optimized bucketing for NMR spectra: Three case studies. Chemometrics and Intelligent Laboratory Systems, 122, 93–102.
Thevenot, E. A., Roux, A., Xu, Y., Ezan, E., & Junot, C. (2015). Analysis of the human adult urinary metabolome variations with age, body mass index, and gender by implementing a comprehensive workflow for univariate and OPLS statistical analyses. Journal of Proteome Research, 14(8), 3322–3335.
Trygg, J., & Wold, S. (2002). Orthogonal projections to latent structures (O-PLS). Journal of Chemometrics, 16(3), 119–128.
Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of American Statistical Association, 58(301), 236–244.
Wold, S., Sjostrom, M., & Eriksson, L. (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58(2), 109–130.
Wold, S., Trygg, J., Berglund, A., & Antti, H. (2001). Some recent developments in PLS modeling. Chemometrics and Intelligent Laboratory Systems, 58(2), 131–150.
Wu, Y., & Liang, L. (2016). Sample normalization methods in quantitative metabolomics. Journal of Chromatography A, 1430, 80–95. ISSN 0021-9673.