Carnegie Mellon University

A photo of signs indicating the name and price of baked goods in multiple languages

Large Scale Multilingual Speech Processing

By Shinji Watanabe

Advancements in language technologies have made significant strides in recent years, but progress has been concentrated in only a select few languages. To make these technologies widely accessible and useful, we need to create systems that work for all languages. Our focus is in the domain of human speech: an everyday communication method for many around the world.

Multilingual speech training is a potent method that can improve the performance of end-to-end systems on lower-resourced languages by exploiting cross-lingual language representations within a single model. Most current work, however, focuses on multilingual training of a small set of languages, such as those that are geographically close or within the same language family. Development that is done on large-scale multilingual speech models often use private datasets, which prevents the release of the trained model.

The goal of this research project is to develop an open source, large scale multilingual speech recognition model, in terms of both dataset size and languages covered. Our dataset spans over 3 terabytes of Creative Commons-licensed data, covering over 120 unique languages. This will allow the research community to effectively test new techniques on an open benchmark and develop speech systems for low-resourced languages that are not often covered.