Abstract
Gas chromatography coupled with electron impact mass spectrometry (GC-EI-MS) is a widely used analytical technique for identifying volatile and semi-volatile compounds in applications ranging from pharmaceutical research to material science. However, since not every molecule is included in EI-MS databases, scientists often have to identify unknown chromatographic peaks solely from their EI-MS spectra. This manual interpretation is time-consuming and depends heavily on expert knowledge, often leading to ambiguous or inconclusive results. In this work, we introduce MASSISTANT, a novel deep learning model that directly predicts de novo molecular structures from low-resolution EI-MS spectra using SELFIES encoding. Trained on compounds with molecular weights below 600 Da, MASSISTANT's performance is sensitive to dataset curation; while training on the full NIST dataset (180k spectra) yields approximately 10 % exact predictions, a more focused, chemically homogeneous subset boosts this rate to as high as 54 % (Tanimoto score = 1). These results highlight the capability of deep neural networks to capture complex fragmentation patterns and generate chemically valid structures, offering mass spectrometry scientists a powerful tool to enhance the interpretation and elucidation of whole molecular structures but also substructures, and functional groups in GC-EI-MS analyses.
| Original language | English |
|---|---|
| Article number | 466216 |
| Journal | Journal of Chromatography A |
| Volume | 1759 |
| DOIs | |
| Publication status | Published - 27 Sept 2025 |
Keywords
- Cheminformatics
- De novo structure prediction
- Deep learning
- Electron impact mass spectrometry
- GC-MS
- SELFIES