“Alignment is All You Need”: Analyzing Cross-Lingual Document Similarity for Domain-Specific Applications

Cite

Related Material

Leibniz Universität Hannover (LUH)

L3S Research Center

Cleopatra ITN

Dutta, Sourav

Formal Metadata

Title

“Alignment is All You Need”: Analyzing Cross-Lingual Document Similarity for Domain-Specific Applications

Title of Series

CLEOPATRA Workshop 2021

Number of Parts

Author

Dutta, Sourav

License

CC Attribution - NonCommercial - NoDerivatives 3.0 Germany:
You are free to use, copy, distribute and transmit the work or content in unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/52946 (DOI)

Publisher

Leibniz Universität Hannover (LUH)

Release Date

Language

Production Year

2021

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Cross-lingual text similarity provides an important measure to adjudge the contextual and semantic similarity between documents across different languages. Extraction of similar or aligned multilingual texts would enable efficient approaches for information retrieval and natural language processing applications. However, diversity of linguistic constructs coupled with domain specificity and low resources pose a significant challenge. In this paper, we present a study analyzing the performance of different existing approaches, and show that Word Mover’s Distance on aligned language embedding provides a reliable and cost-effective cross-lingual text similarity measure to tackle evolving domain information, even when compared to advanced machine learning models.

Keywords

cross-lingual document similarity

cross-lingual text alignment