Abstract :
[en] This PhD dissertation explores various Quantitative Structure-Retention Relationship (QSRR) modelling approaches to enhance the method development process in analytical chemistry. By establishing a predictive framework that relates the chemical structure of analytes to their chromatographic retention behaviours, this approach aims to minimize experimental efforts and increase efficiency in developing robust chromatographic methods for pharmaceutical compound separation and analysis.
For instance, instead of empirically testing a broad range of conditions, QSRR models enable the prediction of analytes' retention behaviour under diverse experimental conditions based on their molecular structure. This method can significantly decrease the necessity for experimental trials by focusing efforts on conditions most likely to enhance separation. In addressing the challenges that analytical chemists encounter, particularly in retention prediction modelling with varying data availability, this study sets out to bridge the gap in the field. These challenges range from determining a starting point in situations of data scarcity to selecting the optimal modelling strategy when faced with large datasets ready for model training. Further complexities arise in choosing the appropriate modelling approach as experimental variations expand and the nature of the dataset evolves.
Recognizing the absence of a clear, definitive strategy for QSRR modelling, this study began with Single Target Retention Prediction Modelling. A detailed QSRR strategy was developed, incorporating a wide range of methods for selecting descriptors and utilizing a variety of regression algorithms, including linear, non-linear, parametric, non-parametric, and ensemble methods, all developed to predict retention times across different pH conditions in Reversed-Phase Liquid Chromatography (RPLC). Each condition, referred to as a target, was analysed individually. By implementing this comprehensive QSRR approach, the study aims to systematically tackle the aforementioned challenges, thereby setting a foundation for future progress in this area.
After exploring single-target QSRR, the research progressed to Multitarget QSRR modelling. This phase compared the accuracy of retention predictions using two different approaches: one that creates separate models for each condition (single-target) and another that uses a unified model to predict retention times across all conditions simultaneously (multitarget). The goal was to find a more efficient way to model multiple target properties at once, potentially making the process quicker and more compact. This has significant implications for improving chromatographic separation methods, offering analytical chemists a valuable tool in their method development efforts. The thesis advances QSRR modelling by incorporating Transfer Learning, to investigate enhancements in both accuracy and model efficiency, particularly when data is scarce. This research delved into employing both physicochemical properties and image-based features of small molecules for QSRR modelling using techniques emerging from advanced Artificial Intelligence, aiming to broaden the methodological framework and improve predictive capabilities.
In summary, this thesis offers valuable insights and tools for pharmaceutical research and development. By integrating computational modelling with RPLC, it introduces a systematic approach and various potential strategies for analytical chemists to explore, aiming to predict small molecule separation. This could ultimately lead to optimized compound separation with reduced time and cost expenditures.
Funding text :
This research was funded by FWO/FNRS Belgium EOS-program, grant number 30897864
“Chemical Information Mining in a Complex World”, Belgium.