Abstract
Many people are multilingual and they may draw from multiple language varieties when writing their messages. This paper is a first step towards analyzing and detecting code-switching within words. We first segment words into smaller units. Then, words are identified that are composed of sequences of subunits associated with different languages. We demonstrate our method on Twitter data in which both Dutch and dialect varieties labeled as Limburgish, a minority language, are used.
Original language | English |
---|---|
Title of host publication | Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, SIGMORPHON 2016 at the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 |
Editors | M. Elsner, S. Kubler |
Publisher | The Association for Computational Linguistics |
Pages | 82-86 |
Number of pages | 5 |
ISBN (Print) | 9781945626081 |
Publication status | Published - 1 Jan 2016 |
Externally published | Yes |
Event | 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, SIGMORPHON 2016 at the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Berlin, Germany Duration: 11 Aug 2016 → 11 Aug 2016 https://www.ling.ohio-state.edu/sigmorphon/ |
Workshop
Workshop | 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, SIGMORPHON 2016 at the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 |
---|---|
Country/Territory | Germany |
City | Berlin |
Period | 11/08/16 → 11/08/16 |
Internet address |