TY - JOUR
T1 - Utilizing CNN architectures for non-invasive diagnosis of speech disorders – further experiments and insights
AU - Ratajczak, Filip
AU - Najda, Mikolaj
AU - Szyc, Kamil
N1 - Publisher Copyright:
© The Author(s).
PY - 2025
Y1 - 2025
N2 - This research investigated the application of deep neural networks for diagnosing diseases that affect the voice and speech mechanisms through the non-invasive analysis of vowel sound recordings. Using the Saarbruecken Voice Database, the voice recordings were converted to spectrograms to train the models, specifically focusing on the vowels /a/, /u/, and /i/. The study used Explainable Artificial Intelligence (XAI) methodologies to identify essential features within these spectrograms for pathology identification, with the aim of providing medical professionals with enhanced insight into how diseases manifest in sound production. The F1 Score performance evaluation showed that the DenseNet model scored 0.70 ± 0.03 with a top of 0.74. The findings indicated that neither vowel selection nor data augmentation strategies significantly improved model performance. Additionally, the research highlighted that signal splitting was ineffective in enhancing the models’ ability to extract features. This study builds on our previous research [1], offering a more comprehensive understanding of the topic. 1
AB - This research investigated the application of deep neural networks for diagnosing diseases that affect the voice and speech mechanisms through the non-invasive analysis of vowel sound recordings. Using the Saarbruecken Voice Database, the voice recordings were converted to spectrograms to train the models, specifically focusing on the vowels /a/, /u/, and /i/. The study used Explainable Artificial Intelligence (XAI) methodologies to identify essential features within these spectrograms for pathology identification, with the aim of providing medical professionals with enhanced insight into how diseases manifest in sound production. The F1 Score performance evaluation showed that the DenseNet model scored 0.70 ± 0.03 with a top of 0.74. The findings indicated that neither vowel selection nor data augmentation strategies significantly improved model performance. Additionally, the research highlighted that signal splitting was ineffective in enhancing the models’ ability to extract features. This study builds on our previous research [1], offering a more comprehensive understanding of the topic. 1
KW - Convolutional Neural Networks (CNNs)
KW - Explainable Artificial Intelligence (XAI)
KW - Voice Disorder Diagnosis
KW - Vowel Sound Analysis
U2 - 10.24425/ijet.2025.153621
DO - 10.24425/ijet.2025.153621
M3 - Article
SN - 2081-8491
VL - 71
JO - International Journal of Electronics and Telecommunications
JF - International Journal of Electronics and Telecommunications
IS - 3
ER -