Fuente:
PubMed "essential OR oil extract"
J Am Soc Mass Spectrom. 2026 Jan 8. doi: 10.1021/jasms.5c00276. Online ahead of print.ABSTRACTAccurate prediction of Collision Cross-Section (CCS) values is essential for identifying molecular structures in complex environmental mixtures. This study integrates supervised machine learning and deep learning to predict CCS values for a diverse array of dissolved organic molecules, including carbohydrates, hydrocarbons, lignins, lipids, proteins, tannins, and unassigned molecules. We evaluated eight regression models─Gradient Boosted Regression, K-Nearest Neighbors, LASSO, Linear Regression, Partial Least Squares, Random Forest, Support Vector Regression, and a Voting Regressor─alongside a Graph Neural Network (GNN) trained on molecular fingerprints (SMILES) and structural descriptors (m/z, O/C, H/C, AImod, DBE). Model performance varied by molecular class and the characteristics of the data set. The best-performing models were as follows: Voting Regressor for carbohydrates and unknowns, Random Forest for hydrocarbons and proteins, SVR for lignins and lipids, and LASSO for tannins. The GNN consistently delivered competitive accuracy across all classes. Validation using High-Resolution Mass Spectrometry (HRMS) data from the Arctic Ocean confirmed the predictive power of these models, enabling more precise selection of correct molecular structures from candidate lists generated by conventional workflows. This work presents a robust, data-driven framework for CCS prediction that enhances molecular classification and improves contaminant detection in environmental samples.PMID:41505766 | DOI:10.1021/jasms.5c00276