Article (Scientific journals)
Tile-Based Random Forest Analysis for Analyte Discovery in Balanced and Unbalanced GC × GC-TOFMS Data Sets.
Gaida, Meriem; Cain, Caitlin N; Synovec, Robert E et al.
2023In Analytical Chemistry, 95 (36), p. 13519 - 13527
Peer Reviewed verified by ORBi
 

Files


Full Text
acs.analchem.3c01872.pdf
Publisher postprint (3.25 MB)
Request a copy

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
Analysis method; Analytes; Data set; Fisher-ratio; GC×GC-TOFMS; Machine learning algorithms; Non-targeted; Random forests; Ratio analysis; Unbalanced data; Analytical Chemistry
Abstract :
[en] In this study, we introduce a new nontargeted tile-based supervised analysis method that combines the four-grid tiling scheme previously established for the Fisher ratio (F-ratio) analysis (FRA) with the estimation of tile hit importance using the machine learning (ML) algorithm Random Forest (RF). This approach is termed tile-based RF analysis. As opposed to the standard tile-based F-ratio analysis, the RF approach can be extended to the analysis of unbalanced data sets, i.e., different numbers of samples per class. Tile-based RF computes out-of-bag (oob) tile hit importance estimates for every summed chromatographic signal within each tile on a per-mass channel basis (m/z). These estimates are then used to rank tile hits in a descending order of importance. In the present investigation, the RF approach was applied for a two-class comparison of stool samples collected from omnivore (O) subjects and stored using two different storage conditions: liquid (Liq) and lyophilized (Lyo). Two final hit lists were generated using balanced (8 vs Eight comparison) and unbalanced (8 vs Nine comparison) data sets and compared to the hit list generated by the standard F-ratio analysis. Similar class-distinguishing analytes (p < 0.01) were discovered by both methods. However, while the FRA discovered a more comprehensive hit list (65 hits), the RF approach strictly discovered hits (31 hits for the balanced data set comparison and 29 hits for the unbalanced data set comparison) with concentration ratios, [OLiq]/[OLyo], greater than 2 (or less than 0.5). This difference is attributed to the more stringent feature selection process used by the RF algorithm. Moreover, our findings suggest that the RF approach is a promising method for identifying class-distinguishing analytes in settings characterized by both high between-class variance and high within-class variance, making it an advantageous method in the study of complex biological matrices.
Disciplines :
Chemistry
Author, co-author :
Gaida, Meriem  ;  Université de Liège - ULiège > Molecular Systems (MolSys)
Cain, Caitlin N ;  Department of Chemistry, University of Washington, Seattle, Washington 98195-1700, United States
Synovec, Robert E ;  Department of Chemistry, University of Washington, Seattle, Washington 98195-1700, United States
Focant, Jean-François  ;  Université de Liège - ULiège > Département de chimie (sciences) > Chimie analytique, organique et biologique
Stefanuto, Pierre-Hugues  ;  Université de Liège - ULiège > Département de chimie (sciences) > Chimie analytique, organique et biologique
Language :
English
Title :
Tile-Based Random Forest Analysis for Analyte Discovery in Balanced and Unbalanced GC × GC-TOFMS Data Sets.
Publication date :
12 September 2023
Journal title :
Analytical Chemistry
ISSN :
0003-2700
eISSN :
1520-6882
Publisher :
American Chemical Society, United States
Volume :
95
Issue :
36
Pages :
13519 - 13527
Peer reviewed :
Peer Reviewed verified by ORBi
Funders :
ULiège - Université de Liège
FWO - Fonds Wetenschappelijk Onderzoek Vlaanderen
Fonds Léon Fredericq
F.R.S.-FNRS - Fonds de la Recherche Scientifique
Funding text :
This research was funded by the FWO/FNRS Belgium EOS Grant 30897864 “Chemical Information Mining in a Complex World”, the University of Liège, F.R.S.-F.N.R.S, and Léon Fredericq Foundation scientific grants.
Available on ORBi :
since 08 October 2023

Statistics


Number of views
52 (5 by ULiège)
Number of downloads
0 (0 by ULiège)

Scopus citations®
 
1
Scopus citations®
without self-citations
1
OpenCitations
 
0
OpenAlex citations
 
1

Bibliography


Similar publications



Contact ORBi