A probabilistic class-modelling method based on prediction bands for functional spectral data: Methodological approach and application to near-infrared spectroscopy

Avohou, Tonakpon Hermane; Sacre, Pierre-Yves; Lebrun, Pierre; Hubert, Philippe; Ziemons, Eric

doi:10.1016/j.aca.2020.11.039

Article (Scientific journals)

A probabilistic class-modelling method based on prediction bands for functional spectral data: Methodological approach and application to near-infrared spectroscopy

Avohou, Tonakpon Hermane; Sacre, Pierre-Yves; Lebrun, Pierre et al.

2021 • In Analytica Chimica Acta, 1144, p. 130-149

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/2268/254315

DOI
10.1016/j.aca.2020.11.039

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

A probabilistic class-modelling method based on prediction.pdf

Author postprint (2.55 MB)

Creative Commons License - Attribution, Non-Commercial

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Class-modelling; Functional data analysis; Bayesian chemometrics; Spectral predictive distribution; Prediction band; Depth statistic; Multivariate data analysis

Abstract :

[en] Class-modelling methods aim to predict the conformity of new unknown samples with a single target class, using statistical decision rules built exclusively with objects of that class. This article introduces a novel class-modelling method for spectral data. The method uses the concept of beta(%)-prediction band for functional data to classify spectra. The band is defined by an upper and a lower limiting spectra which delimit critical trajectories for beta(%) of future spectra of the target class. It is constructed in three main steps: firstly, a naïve bootstrap sample of calibration spectra is projected onto a parsimonious principal component (PC) basis and their scores are estimated. The posterior predictive distribution of the scores on each PC is estimated using a Bayesian zero-mean normal model. This procedure is repeated on naïve bootstrap estimations of the PCs to obtain the predictive distribution of the scores. These enable to account for all modelling uncertainties including the random deviation of scores from their zero-mean on each PC, uncertainty in the variance of scores (eigenvalue) on each PC, and uncertainty in the PC estimations. Secondly, the predicted scores are back-transformed to the original signal scale to obtain the predictive distribution of future spectra. Thirdly, the predicted spectra are ranked to select the beta(%) most central ones as typical set, whose ranges of variation are used to construct the simultaneous limits of the band. Once the band is constructed, reconstructions of future unknown test spectra by bootstrap PC models are projected onto it, and the extent to which they overlap with it is used to decide their acceptance or rejection. The statistical properties and classification performances of the proposed prediction band are evaluated on real near-infrared datasets and compared to the well-known soft-independent modelling of class analogy (SIMCA) model. The results of the evaluation provide evidence that the proposed prediction band possesses satisfactory predictive performances. It even outperforms the SIMCA while offering attractive advantages like risk-management and straightforward physical interpretability of outlyingness patterns of tested spectra.

Research Center/Unit :

CIRM - Centre Interdisciplinaire de Recherche sur le Médicament - ULiège

Disciplines :

Pharmacy, pharmacology & toxicology
Mathematics

Author, co-author :

Avohou, Tonakpon Hermane ; Université de Liège - ULiège > Département de pharmacie > Chimie analytique

Sacre, Pierre-Yves ; Université de Liège - ULiège > Département de pharmacie > Chimie analytique

Lebrun, Pierre ; Université de Liège - ULiège > Département de pharmacie > Chimie analytique

Hubert, Philippe ; Université de Liège - ULiège > Département de pharmacie > Chimie analytique

Ziemons, Eric ; Université de Liège - ULiège > Département de pharmacie > Chimie analytique

Language :

English

Title :

A probabilistic class-modelling method based on prediction bands for functional spectral data: Methodological approach and application to near-infrared spectroscopy

Publication date :

01 February 2021

Journal title :

Analytica Chimica Acta

ISSN :

0003-2670

eISSN :

1873-4324

Publisher :

Elsevier, Netherlands

Volume :

1144

Pages :

130-149

Peer reviewed :

Peer Reviewed verified by ORBi

Name of the research project :

Vibra4Fake (convention 7517)

Funders :

DGTRE - Région wallonne. Direction générale des Technologies, de la Recherche et de l'Énergie

Available on ORBi :

since 22 December 2020

Statistics

Number of views

170 (21 by ULiège)

Number of downloads

28 (10 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

Forina, M., Oliveri, P., Lanteri, S., Casale, M., Class-modeling techniques, classic and new, for old and new problems. Chemometr. Intell. Lab. Syst. 93 (2008), 132–148.
Oliveri, P., Class-modelling in food analytical chemistry: development, sampling, optimization and validation issues – a tutorial. Anal. Chim. Acta 982 (2017), 9–19.
Bevilacqua, M., Bucci, R., Magrì, A.D., Magrì, A.L., Nescatelli, R., Marini, F., Chapter 5 - classification and class-modelling. Marini, F., (eds.) Chemometrics in Food Chemistry, Data Handling in Science and Technology, vol. 28, 2013, Elsevier, Oxford, 171–233.
Pomerantseva, A.L., Rodionova, O.Ye, Concept and role of extreme objects in PCA/SIMCA. J. Chemom. 28 (2014), 429–438.
Rodionova, O.Ye, Titova, A.V., Pomerantsev, A.L., Discriminant analysis is an inappropriate method of authentication. Trac. Trends Anal. Chem. 78 (2016), 17–22.
Pomerantsev, A.L., Acceptance areas for multivariate classification derived by projection methods. J. Chemom. 22 (2008), 601–609.
Pomerantseva, A.L., Rodionova, O.Ye, On the type II error in SIMCA method. J. Chemom. 28 (2014), 518–522.
Ye Rodionova, O., Oliveri, P., Pomerantsev, A., Rigorous and compliant approaches to one-class classification. Chemometr. Intell. Lab. Syst. 159 (2016), 89–96.
Ferraty, F., Vieu, P., Nonparametric Functional Data Analysis, Theory and Practice. 2006, Springer-Verlag, New York.
Ferraty, F., Goia, A., Vieu, P., Nonparametric functional methods, New tools for chemometric analysis. Härdle, W., Mori, Y., Vieu, P., (eds.) Statistical Methods for Biostatistics and Related Fields, 2007, Springer-Verlag, Berlin, 245–264.
Saeys, W., de Ketelaerea, B., Dariusa, P., Potential applications of functional data analysis in chemometrics. J. Chemometr. 22 (2008), 335–344.
Wold, S., Sjöström, M., Chapter 12, SIMCA: a method for analyzing chemical data in terms of similarity and analogy. Kowalski, B.R., (eds.) Chemometrics, Theory and Application, vol. 52, 1977, American Chemical Society, Washington, DC, 243–282.
Vanden Branden, K., Hubert, M., Robust classification in high dimensions based on the SIMCA method. Chemometr. Intell. Lab. Syst. 79 (2005), 10–21.
Derde, M.P., Massart, D.L., UNEQ. A disjoint modelling technique for pattern recognition based on normal distribution. Anal. Chim. Acta 184 (1986), 33–51.
López-Pintado, S., Romo, J., On the Concept of depth for functional data. J. Am. Stat. Assoc. 104 (2009), 718–734.
Hahn, G.J., Meeker, W.Q., Statistical Intervals: A Guide for Practitioners. 1991, John Wiley & Sons, New-York, 392p.
Krishnamoorthy, K., Mathew, T., Statistical Tolerance Regions: Theory, Applications, and Computation. 2009, John Wiley & Sons, Inc., Hoboken.
Rozet, E., Hubert, C., Ceccato, A., Walthère, D., Ziemons, E., Moonen, F., Michail, K., Wintersteiger, R., Streel, B., Boulanger, B., Hubert, P., Using tolerance Intervals in pre-study validation of analytical methods to predict in-study results, the fit-for-future-purpose concept. J. Chromatogr. A 1158 (2007), 26–137.
USP 41-NF 36, General Chapter, 1210, Statistical Tools for Analytical Procedure Validation, 2017, US Pharmacopeial Convention, Rockville, MD.
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B., Bayesian Data Analysis. 2014, Chapman and Hall/CRC, Boca Raton, 675p.
Morris, J.S., Functional regression. Annu, Rev. Stat. Appl. 2 (2015), 321–359.
Chen, H., Bakshi, B.R., Goel, P.K., Towards Bayesian chemometrics – a tutorial on some recent advances. Anal. Chem. Acta. 602 (2007), 1–16.
Goldsmith, J., Greven, S., Crainiceanu, C., Corrected confidence bands for functional data using principal components. Biometrics 69 (2013), 41–51.
Xiao, L., Zipunnikov, V., Ruppert, D., Crainiceanu, C., Fast covariance estimation for high-dimensional functional data. Stat. Comput. 26 (2016), 409–421.
Josse, J., Husson, F., Selecting the number of components in principal component analysis using cross-validation approximations. Comput. Stat. Data Anal. 56 (2012), 1869–1879.
Gavish, M., Donoho, D.L., The optimal hard threshold for singular values is 4/3. IEEE Trans. Inf. Theor. 60 (2014), 5040–5053.
Donoho, D.L., Gavish, M., Code Supplement to the Optimal Hard Threshold for Singular Values Is 4/3. 2014 http://purl.stanford.edu/vg705qn9070 http://purl.stanford.edu/vg705qn9070. (Accessed 27 September 2019)
Goldsmith, J., Scheipl, F., Huang, L., Wrobel, J., Gellar, J., Harezlak, J., McLean, M.W., Swihart, B., Xiao, L., Crainiceanu, C., Reiss, P.T., Refund: Regression with Functional Data, R Package Version 0. 2018, 1–17 https://CRAN.R-project.org/package=refund.
R Core Team, R., A Language and Environment for Statistical Computing. 2018, R Foundation for Statistical Computing, Vienna, Austria https://www.R-project.org/.
Carpenter, B., Gelman, A., Hoffman, M.D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., Riddell, A., Stan: a Probabilistic programming language. J. Stat. Software, 76, 2017, 10.18637/jss.v076.i01.
Sun, Y., Genton, M.G., Nychka, D.W., Exact and fast computation of band depth for large functional datasets: how quickly can one million of curves be ranked. Stat 1 (2012), 68–74.
Tarabelloni, N., Arribas-Gil, A., Ieva, F., Paganoni, A.M., Romo, J., roahd, Robust Analysis of High Dimensional Data, R Package Version 1. 2018 https://CRAN.R-project.org/package=roahd.4.1.
Ciza, P.H., Sacre, P.-Y., Waffo, C., Coïc, L., Avohou, T.H., Mbinze, J.K., Ngono, R., Marini, R.D., Hubert, Ph, Ziemons, E., Comparing the qualitative performances of handheld NIR and Raman spectrophotometers for the detection of falsified pharmaceutical products. Talanta 202 (2019), 469–478.
Savitzky, A., Golay, M.J.E., Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36 (1964), 1627–1639.
Jackson, J.E., Mudholkar, J.S., Control procedures for residuals associated with principal component analysis. Technometrics 21 (1979), 341–349.
Malyjurek, Z., Vitale, R., Walczak, B., Different strategies for class model optimization, A comparative study. Talanta, 215, 2020.
Kucheryavskiy, S., Mdatools – R package for chemometrics. Chemometr. Intell. Lab. Syst. 198 (2020), 1–10.
Ghahramani, Z., Probabilistic machine learning and artificial intelligence. Nature 521 (2015), 452–459.