Streaming Pipelines for Neural Machine Translation

FOSDEM VZW

Marthi, Suneel Kottmann, Jörn

Formale Metadaten

Titel

Serientitel

FOSDEM 2019

Anzahl der Teile

561

Autor

Marthi, Suneel

Kottmann, Jörn

Lizenz

CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/44557 (DOI)

Herausgeber

FOSDEM VZW

Erscheinungsjahr

2019

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Machine Translation is important when having to cater to different geographies and locales for news or eCommerce website content. Machine Translation systems often need to handle a large volume of concurrent translation requests from multiple sources across multiple languages in near real time. Many Machine Translation preprocessing tasks like Text Normalization, Language Detection, Sentence Segmentation etc. can be performed at scale in a real time streaming pipeline utilizing Apache Flink. We will be looking at a few such streaming pipelines leveraging different NLP components and Flink’s dynamic processing capabilities for real time training and inference. We'll demonstrate and examine the end-to-end throughput and latency of a pipeline that detects language and translates news articles shared via twitter in real-time. Developers will come away with a better understanding of how Neural Machine Translation works, how to build pipelines for machine translation preprocessing tasks and Neural Machine Translation models. Speaker Bio Suneel Marthi: Suneel is a member of the Apache Software Foundation and is a PMC member on Apache OpenNLP, Apache Mahout, and Apache Streams. He has done talks at Hadoop Summit, Apache Big Data, Flink Forward, Berlin Buzzwords, and Big Data Tech Warsaw. He is a Principal Engineer at Amazon Web Services. Experience Suneel: He has done talks at Hadoop Summit, Apache Big Data, Flink Forward, Berlin Buzzwords, and Big Data Tech Warsaw. Speaker Bio Jörn Kottmann: Jörn is a member of the Apache Software Foundation. He contributed to Apache OpenNLP for 13 years and is PMC Chair and committer of the project. In his day jobs he used OpenNLP to process large document collections and streams, often in combination with Apache UIMA where he is a PMC member and committer as well.