Abstract
While we have seen significant advances in automatic summarization for text, research on speech summarization is still limited. In this work, we address the challenge of automatically generating teasers for TED talks. In the first step, we create a corpus for automatic summarization of TED and TEDx talks consisting of the talks' recording, their transcripts and their descriptions. The corpus is used to build a speech summarization system for the task. We adapt and combine pre-trained models for automatic speech recognition (ASR) and text summarization using the collected data. This initial work shows that is more important to adapt the summarization model to the ASR transcripts than to adapt the ASR model to the talks.
Original language | English |
---|---|
Title of host publication | ICASSP 2022 - 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) |
Publisher | IEEE |
Pages | 8067-8071 |
Number of pages | 5 |
ISBN (Print) | 9781665405409 |
DOIs | |
Publication status | Published - 2022 |
Event | 47th IEEE International Conference on Acoustics, Speech and Signal Processing - Online, Singapore, Singapore Duration: 22 May 2022 → 27 May 2022 Conference number: 47 https://2022.ieeeicassp.org/ |
Publication series
Series | International Conference on Acoustics Speech and Signal Processing Proceedings |
---|---|
ISSN | 1520-6149 |
Conference
Conference | 47th IEEE International Conference on Acoustics, Speech and Signal Processing |
---|---|
Abbreviated title | ICASSP 2022 |
Country/Territory | Singapore |
City | Singapore |
Period | 22/05/22 → 27/05/22 |
Internet address |
Keywords
- speech summarization
- automatic speech recognition
- abstractive summarization