TY - JOUR
T1 - Multi-Task Estimation of Age and Cognitive Decline from Speech
AU - Pan, Yilin
AU - Nallanthighal, Venkata Srikanth
AU - Blackburn, Daniel
AU - Christensen, Heidi
AU - Härmä, Aki
N1 - Funding Information:
This work is supported under the European Union's H2020 Marie Sklodowska-Curie programme TAPAS (Training Network for PAthological Speech processing; Grant Agreement No. 766287).
Funding Information:
This work is supported under the European Union’s H2020 Marie Skłodowska-Curie programme TAPAS (Training Network for PAthological Speech processing; Grant Agreement No. 766287).
Publisher Copyright:
©2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Speech is a common physiological signal that can be affected by both ageing and cognitive decline. Often the effect can be confounding, as would be the case for people at, e.g., very early stages of cognitive decline due to dementia. Despite this, the automatic predictions of age and cognitive decline based on cues found in the speech signal are generally treated as two separate tasks. In this paper, multi-task learning is applied for the joint estimation of age and the Mini-Mental Status Evaluation criteria (MMSE) commonly used to assess cognitive decline. To explore the relationship between age and MMSE, two neural network architectures are evaluated: a SincNet-based end-to-end architecture, and a system comprising of a feature extractor followed by a shallow neural network. Both are trained with single-task or multi-task targets. To compare, an SVM-based regressor is trained in a single-task setup. i-vector, xvector and ComParE features are explored. Results are obtained on systems trained on the DementiaBank dataset and tested on an inhouse dataset as well as the ADReSS dataset. The results show that both the age and MMSE estimation is improved by applying multitask learning, with state-of-the-art results achieved on the ADReSS dataset acoustic-only task.
AB - Speech is a common physiological signal that can be affected by both ageing and cognitive decline. Often the effect can be confounding, as would be the case for people at, e.g., very early stages of cognitive decline due to dementia. Despite this, the automatic predictions of age and cognitive decline based on cues found in the speech signal are generally treated as two separate tasks. In this paper, multi-task learning is applied for the joint estimation of age and the Mini-Mental Status Evaluation criteria (MMSE) commonly used to assess cognitive decline. To explore the relationship between age and MMSE, two neural network architectures are evaluated: a SincNet-based end-to-end architecture, and a system comprising of a feature extractor followed by a shallow neural network. Both are trained with single-task or multi-task targets. To compare, an SVM-based regressor is trained in a single-task setup. i-vector, xvector and ComParE features are explored. Results are obtained on systems trained on the DementiaBank dataset and tested on an inhouse dataset as well as the ADReSS dataset. The results show that both the age and MMSE estimation is improved by applying multitask learning, with state-of-the-art results achieved on the ADReSS dataset acoustic-only task.
KW - Age estimation
KW - Cognitive decline estimation
KW - Multi-task learning
KW - Sincnet
KW - X-vector
UR - http://www.scopus.com/inward/record.url?scp=85112089372&partnerID=8YFLogxK
U2 - 10.1109/ICASSP39728.2021.9414642
DO - 10.1109/ICASSP39728.2021.9414642
M3 - Conference article in journal
AN - SCOPUS:85112089372
SN - 1520-6149
VL - 2021-June
SP - 7258
EP - 7262
JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
T2 - 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing
Y2 - 6 June 2021 through 11 June 2021
ER -