We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

“Alignment is All You Need”: Analyzing Cross-Lingual Document Similarity for Domain-Specific Applications

Formal Metadata

Title
“Alignment is All You Need”: Analyzing Cross-Lingual Document Similarity for Domain-Specific Applications
Title of Series
Number of Parts
7
Author
License
CC Attribution - NonCommercial - NoDerivatives 3.0 Germany:
You are free to use, copy, distribute and transmit the work or content in unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2021

Content Metadata

Subject Area
Genre
Abstract
Cross-lingual text similarity provides an important measure to adjudge the contextual and semantic similarity between documents across different languages. Extraction of similar or aligned multilingual texts would enable efficient approaches for information retrieval and natural language processing applications. However, diversity of linguistic constructs coupled with domain specificity and low resources pose a significant challenge. In this paper, we present a study analyzing the performance of different existing approaches, and show that Word Mover’s Distance on aligned language embedding provides a reliable and cost-effective cross-lingual text similarity measure to tackle evolving domain information, even when compared to advanced machine learning models.
Keywords