LibriS2S: A German-English Speech-to-Speech Translation Corpus

P. Jeuris; J. Niehues

LibriS2S: A German-English Speech-to-Speech Translation Corpus

P. Jeuris^*, J. Niehues

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

Abstract

Recently, we have seen an increasing interest in the area of speech-to-text translation. This has led to astonishing improvements in this area. In contrast, the activities in the area of speech-to-speech translation is still limited, although it is essential to overcome the language barrier. We believe that one of the limiting factors is the availability of appropriate training data. We address this issue by creating LibriS2S, to our knowledge the first publicly available speech-to-speech training corpus between German and English.For this corpus, we used independently created audio for German and English leading to an unbiased pronunciation of the text in both languages. This allows the creation of a new text-to-speech and speech-to-speech translation model that directly learns to generate the speech signal based on the pronunciation of the source language.Using this created corpus, we propose Text-to-Speech models based on the example of the recently proposed FastSpeech 2 model that integrates source language information. We do this by adapting the model to take information such as the pitch, energy or transcript from the source speech as additional input.

Original language	English
Title of host publication	Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022)
Publisher	European Language Resources Association
Pages	928-935
Number of pages	8
ISBN (Print)	9791095546726
Publication status	Published - 2022
Event	13th International Conference on Language Resources and Evaluation (LREC) - Le Palais du Pharo, Marseille, France Duration: 20 Jun 2022 → 25 Jun 2022 https://lrec2022.lrec-conf.org/en/

Conference

Conference	13th International Conference on Language Resources and Evaluation (LREC)
Abbreviated title	LREC 2022
Country/Territory	France
City	Marseille
Period	20/06/22 → 25/06/22
Internet address	https://lrec2022.lrec-conf.org/en/

Keywords

Speech-to-Speech translation
Speech synthesis
Dataset creation

Access to Document

http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.98.pdfLicence: CC BY-NC

Cite this

@inproceedings{f85131a7cd5b452fbc61277db7d25580,

title = "LibriS2S: A German-English Speech-to-Speech Translation Corpus",

abstract = "Recently, we have seen an increasing interest in the area of speech-to-text translation. This has led to astonishing improvements in this area. In contrast, the activities in the area of speech-to-speech translation is still limited, although it is essential to overcome the language barrier. We believe that one of the limiting factors is the availability of appropriate training data. We address this issue by creating LibriS2S, to our knowledge the first publicly available speech-to-speech training corpus between German and English.For this corpus, we used independently created audio for German and English leading to an unbiased pronunciation of the text in both languages. This allows the creation of a new text-to-speech and speech-to-speech translation model that directly learns to generate the speech signal based on the pronunciation of the source language.Using this created corpus, we propose Text-to-Speech models based on the example of the recently proposed FastSpeech 2 model that integrates source language information. We do this by adapting the model to take information such as the pitch, energy or transcript from the source speech as additional input.",

keywords = "Speech-to-Speech translation, Speech synthesis, Dataset creation",

author = "P. Jeuris and J. Niehues",

note = "Publisher Copyright: {\textcopyright} European Language Resources Association (ELRA), licensed under CC-BY-NC-4.0.; 13th International Conference on Language Resources and Evaluation (LREC), LREC 2022 ; Conference date: 20-06-2022 Through 25-06-2022",

year = "2022",

language = "English",

isbn = "9791095546726",

pages = "928--935",

booktitle = "Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022)",

publisher = "European Language Resources Association",

address = "Luxembourg",

url = "https://lrec2022.lrec-conf.org/en/",

}

Jeuris, P & Niehues, J 2022, LibriS2S: A German-English Speech-to-Speech Translation Corpus. in Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022). European Language Resources Association, pp. 928-935, 13th International Conference on Language Resources and Evaluation (LREC), Marseille, France, 20/06/22. <http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.98.pdf>

TY - GEN

T1 - LibriS2S: A German-English Speech-to-Speech Translation Corpus

AU - Jeuris, P.

AU - Niehues, J.

PY - 2022

Y1 - 2022

N2 - Recently, we have seen an increasing interest in the area of speech-to-text translation. This has led to astonishing improvements in this area. In contrast, the activities in the area of speech-to-speech translation is still limited, although it is essential to overcome the language barrier. We believe that one of the limiting factors is the availability of appropriate training data. We address this issue by creating LibriS2S, to our knowledge the first publicly available speech-to-speech training corpus between German and English.For this corpus, we used independently created audio for German and English leading to an unbiased pronunciation of the text in both languages. This allows the creation of a new text-to-speech and speech-to-speech translation model that directly learns to generate the speech signal based on the pronunciation of the source language.Using this created corpus, we propose Text-to-Speech models based on the example of the recently proposed FastSpeech 2 model that integrates source language information. We do this by adapting the model to take information such as the pitch, energy or transcript from the source speech as additional input.

AB - Recently, we have seen an increasing interest in the area of speech-to-text translation. This has led to astonishing improvements in this area. In contrast, the activities in the area of speech-to-speech translation is still limited, although it is essential to overcome the language barrier. We believe that one of the limiting factors is the availability of appropriate training data. We address this issue by creating LibriS2S, to our knowledge the first publicly available speech-to-speech training corpus between German and English.For this corpus, we used independently created audio for German and English leading to an unbiased pronunciation of the text in both languages. This allows the creation of a new text-to-speech and speech-to-speech translation model that directly learns to generate the speech signal based on the pronunciation of the source language.Using this created corpus, we propose Text-to-Speech models based on the example of the recently proposed FastSpeech 2 model that integrates source language information. We do this by adapting the model to take information such as the pitch, energy or transcript from the source speech as additional input.

KW - Speech-to-Speech translation

KW - Speech synthesis

KW - Dataset creation

M3 - Conference article in proceeding

SN - 9791095546726

SP - 928

EP - 935

BT - Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022)

PB - European Language Resources Association

T2 - 13th International Conference on Language Resources and Evaluation (LREC)

Y2 - 20 June 2022 through 25 June 2022

ER -