Keywords :
mt-QSRR, Regressor chain, algorithm adaptation, Random forest
Abstract :
[en] A multi-target QSRR(mT-QSRR) approach to model retention time of small pharmaceutical compounds in RPLC
Priyanka Kumari*, Thomas Van Laethema,b, Philippe Hubert, Marianne Fillet, Pierre-Yves Sacré, Cédric Huberta*
a. University of Liège (ULiege), CIRM, Laboratory of Pharmaceutical Analytical Chemistry, Liège, Belgium
b. University of Liège (ULiege), CIRM, Laboratory for the Analysis of Medicines, Liège, Belgium
Abstract:
Quantitative Structure Retention Relationship models (QSRR) have been used for retention time prediction as an alternative tool for expensive and time-consuming separation analysis and associated experiments. So far, the conventional QSRR approaches have been predicting the targets(retention times) individually, making the process very tedious when multiple targets need to be predicted. Despite the availability of multiple tools and approaches, 100% retention prediction accuracy is far-fetched. Hence, to improve from currently available prediction methods, in this study, we have explored a new approach, namely mT-QSRR models, that aims to predict retention times at multiple conditions better simultaneously. The usefulness of mT-QSRR models of retention prediction applied is twofold. First, they provide enhanced knowledge about the chemistry behind compounds being separated and identify possible physicochemical attributes that may contribute to various retention times at different separation conditions. Second, the selection of a better prediction modelling approach that can be applied to any chromatographic fields for multiple target prediction altogether.
In the current study, we used RPLC data of small pharmaceutical compounds having retention times to be predicted at ten conditions: five pH at two different gradient times. We explored and compared two approaches for mtQSRR modelling for retention prediction [1] to learn a model for each condition separately (single-target approach) and [2] to learn one model for all conditions simultaneously (multitarget approaches).
Our results demonstrate the advantages of a multitarget approach over single-target modelling with a statistically significant difference in terms of RMSE and R2 as model performance. Out of all features MolLogP had the highest variable importance showing that it’s the key structural descriptor for retention prediction.