Multi-task Learning for Cross-Lingual Sentiment Analysis

Video in TIB AV-Portal: Multi-task Learning for Cross-Lingual Sentiment Analysis

Formal Metadata

Title
Multi-task Learning for Cross-Lingual Sentiment Analysis
Title of Series
Author
Contributors
License
CC Attribution - NonCommercial - NoDerivatives 3.0 Germany:
You are free to use, copy, distribute and transmit the work or content in unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2021
Language
English
Production Year
2021

Content Metadata

Subject Area
Abstract
This paper presents a cross-lingual sentiment analysis of news articles using zero-shot and few-shot learning. The study aims to classify the Croatian news articles with the positive, negative, and neutral sentiment using the Slovene dataset. The system is based on a trilingual BERT-based model trained in three languages: English, Slovene,Croatian. The paper analyses different setups of using datasets in two languages and proposes a simple multi-task model to perform sentiment classification. The evaluation is performed using the few-shot and zero-shot scenarios in single-task and multi-task experiments for Croatian and Slovene.
Keywords sentiment analysis, cross-lingual sentiment analysis, transfer learning
Wechselseitige Information Presentation of a group Multiplication sign Decision theory Source code Set (mathematics) Open set Formal language Data model Type theory Different (Kate Ryan album) Single-precision floating-point format Process (computing) Endliche Modelltheorie Position operator Observational study Data storage device Attribute grammar Bit Degree (graph theory) Hierarchy Order (biology) Chain Normal (geometry) Energy level Whiteboard Cycle (graph theory) Classical physics Stapeldatei Link (knot theory) Mathematical analysis Mass Heat transfer Distance Wave packet 2 (number) Performance appraisal Energy level Tunis Form (programming) Task (computing) Distribution (mathematics) Information Coalition Mathematical analysis Heat transfer Wave packet Performance appraisal Word Personal digital assistant String (computer science) Musical ensemble Table (information) Window Gradient descent
the are so hillary won back so i'm presenting. but my mentor the was nicholas cage the other. which and market that is that is money to us than info costing was sent to another says. ok so it will be short of of position with the introduction the metallurgy to experiment to set up the designs and the conclusion. so as windows and analysis is the field of surreal i was on people's opinions sentiments appraisals at it you wish and stores the entities and their next president takes so usually we see of sentiment and alice is being well studied in in the english language or india. a large amount of training data on u.k.'s so engrossing wasn't so this paper deals with causing will send an analysis of prohibition were incredible you're not using our english training data but oh we would leverage of the dissent from another language which is so we kind of. from a crossing was sentiment transfer another reason the week shows or slovene reason being the mute mutual in hinge ability between the cycling which is there was a people violin none of which which showed that out of india live roshan slovene have really high mutual intangibility the which. which means. into one i asked the cathartic what a really large amount of effort to understand the other so we got our baby this also applies to the language transfer that could be possible so sick so we have two data sets of all of luckily we have won in creation which is like of. or two thousand documents only this is so bizarre of commission on news articles staggered document level and then we have slovene our news articles which is a really big data said when we have liked and thousand documents that technically different levels of each of the distributions is distributed indeed. people like you can see the table and and it's a massive data sets so coming onto the next we have a really simple mythology were in what to do is for you have a shared a quarter of all very we have three different of all it's been trained on. three different of live births from each of the desert so there's any new notes in the news as document was a government and the same sentence is been gagged so so we have three different levels for that loss but since we as human are the mutual intangibility would play a role so we have croatian document being combined with. so when document and being fed into the shareholder us and then been trained on with the specter of us were in the cycle to each of these stars one by one randomly of of during training. so we have around five different on experiments that people formed with of this desert so we have forced simulcast slovene all very new chain only be slowing document or will the time then we tested on russian document and the monthly cost longer in use all the levers of from the slogan data. now and are we not use cushion later in these two previous settings but we do tested on the ball commission data then we learned in only the croatian level document and then the then we try to see a lot of how does it work this is like the normal classic fine tuning of or. transform will be the star difference and the last very we have our market has of all model erin be combined croatian and so we don't really want but we also train the other comparable hits with their respective get the sense that is of iraq and sentence and the last but not. police also try to see what happens when our when we take away the the pair of any sentence and just train with document and compared the overall performance since so all we do see that the data sets are bit noisy so before form a little bit of reprocessing we removed the campaign. the strings and on to the duplication which doesn't use the size of the of the data set to perform are aged twenty split of and the public and person for the desert one thing that we changed of from typical of all the language more of must language. modern scenario would be used or on board is language model called close slow and world which is strange bit slow in croatia and english or reason been this model was has been it has been shown that does not allow this and quarter of on swelled for the task of course and dependency. we are seeing with respect to amber but it's our substantially are all forms the amber and comes to this the cops so we have liked on three different labels was doing it in neutral and the training with the bat size of thirty two cents go. this award deserts has a simple baseline we use just stop majority clerks band are coming onto a single gasoline the c.d.c. that have only only using document. this was the school's the all single cost ozery slovene market cost. performs well on slovene bark the when we do as he was shot on the croatian. the scores are the f one is kind of fifteen bought all putting in the croatian model itself the obviously it's like the classical fine tuning case here as one of fifty six couldn't coming on to any stark and in our coalition data to this will mean we do see your drop in order. bachmann table for the money for its low in but then we start seeing an increase in the croatian hand finally you for the market as when you combine the or document of slowly hand the croatian modern of all there is light degrees with compared to the slogan market as a boy. but overall we have. or better performance in respect to decision recalled. of for the croatian talking for the croatian classifications hear me now coming under discussion i mean i have already said that diet of the market aspect of reforms better than the other seconds. of the best document model for this movie was slow in itself but it on or when we had religion document and of the was forming croatian moral was wearing use only still in the singer to us all but nevertheless it forms better with a slow in set. of wall with respect to conclusions we represent a war or a simple sector for all or at the crossing was an even worse for clinicians and we utilized to really live slow indeed to sit in order to improve the the final crossing were transposed. or obviously in order to have checked this into in slovenia is also under the source language we would like to see how can be utilized like english along with the ocean still in that is like a ongoing works also are in the same day as we saw that this data. say as. the descent are the coalition descendants level doesn't exist so well you know proximity finish all work on them. maybe soon we will have one people were in we compare the levels were so when and croatian. and no coming to the and another says we also found dire. all divorced to play a major road like a on something that is all six of us so the more tries to capture like. positive words even though it would be all neutral news but. the the u.k.'s major majority on these boards all and we need to check that like how can we have improved people on such on the us. how old is available on the link hand and open for questions.
Feedback