MilaNLP 2021 in Review Part II: Text Analytics

Reviewing MilaNLP 2021 research papers on text analytics

Federico Bianchi
8 min readSep 28, 2021

In this blog post series, we revise what MilaNLP has been doing during 2021, analyzing the main themes of research and which output the team has produced.

MilaNLP is the NLP Lab in Milano (Italy) lead by Prof. Dirk Hovy at Bocconi University.

The first blog post covered what we did in the area of Bias and Ethics:

This second blog post is going to cover what we did in terms of more general text analytics.

Our MilaNLP Logo. The left part represents the Dome in Milan.

Part I: Text Analytics

There is a more general line of work meant at providing methods to extract information from text. We have developed a new family of topic models and methods to predict emotions from Italian and multilingual datasets. We also have studied the effect of disagreement in annotated datasets and have developed methods to analyze political pledges.

The topic models and the Italian emotion recognition methods have been released as easy-to-use open-source applications.

This blog post has been compiled by different authors:

Our awesome team!

While a few of the papers we show are preprints, most of what we present has been peer-reviewed and has been presented at the most important conferences of the field.

Note that papers tagged with “special mention” refer to work done by our new lab members with their former institutions.

There is a more general line of work meant at providing methods to extract information from text. We have developed a new family of topic models and methods to predict emotions from Italian and multilingual datasets. We also have studied the effect of disagreement in annotated datasets and have developed methods to analyze political pledges.

The topic models and the Italian emotion recognition methods have been released as easy-to-use open-source applications.

1) Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

by Federico Bianchi, Silvia Terragni, Dirk Hovy

ACL2021

We find that our approach produces more meaningful and coherent topics than traditional bag-of-words topic models and recent neural models.

Topic models extract groups of words from documents, whose interpretation as a topic hopefully allows for a better understanding of the data. However, the resulting word groups are often not coherent, making them harder to interpret. Recently, neural topic models have shown improvements in overall coherence. Concurrently, contextual embeddings have advanced the state of the art of neural models in general. In this paper, we combine contextualized representations with neural topic models.

2) Cross-lingual Contextualized Topic Models with Zero-shot Learning

by Federico Bianchi, Silvia Terragni, Dirk Hovy, Debora Nozza, Elisabetta Fersini

EACL2021

Our results show that the transferred topics are coherent and stable across languages, which suggests exciting future research directions.

Many data sets (e.g., reviews, forums, news, etc.) exist parallelly in multiple languages. They all cover the same content, but the linguistic differences make it impossible to use traditional, bag-of-word-based topic models. Models have to be either single-language or suffer from a huge, but extremely sparse vocabulary. Both issues can be addressed by transfer learning. In this paper, we introduce a zero-shot cross-lingual topic model. Our model learns topics on one language (here, English), and predicts them for unseen documents in different languages (here, Italian, French, German, and Portuguese). We evaluate the quality of the topic predictions for the same document in different languages.

Read the blog post:

3) BERTective: Language Models and Contextual Information for Deception Detection

by Tommaso Fornaciari, Federico Bianchi, Massimo Poesio, Dirk Hovy

EACL2021

BERT alone does not capture the implicit knowledge of deception cues

We study a corpus of Italian dialogues containing deceptive statements and implement deep neural models that incorporate various linguistic contexts. We establish a new state-of-the-art identifying deception and find that not all context is equally useful to the task. Only the texts closest to the target, if from the same speaker (rather than questions by an interlocutor), boost performance. We also find that the semantic information in language models such as BERT contributes to the performance.

4) Beyond Black & White: Leveraging Annotator Disagreement via Soft-Label Multi-Task Learning

by Tommaso Fornaciari, Alexandra Uma, Silviu Paun, Barbara Plank, Dirk Hovy, Massimo Poesio

NAACL2021

We find that the soft-label prediction auxiliary task reduces the penalty for errors on ambiguous entities, and thereby mitigates overfitting.

We propose a novel method to incorporate this disagreement as information: in addition to the standard error computation, we use soft-labels (i.e., probability distributions over the annotator labels) as an auxiliary task in a multi-task neural network. We measure the divergence between the predictions and the target soft-labels with several loss-functions and evaluate the models on various NLP tasks.

5) MilaNLP @ WASSA: Does BERT Feel Sad When You Cry?

by Tommaso Fornaciari, Federico Bianchi, Debora Nozza, Dirk Hovy

Our results suggest that emotion and empathy are not related tasks — at least for the purpose of prediction.

WASSA2021

The paper describes the MilaNLP team’s submission (Bocconi University, Milan) in the WASSA 2021 Shared Task on Empathy Detection and Emotion Classification. We focus on Track 2 — Emotion Classification — which consists of predicting the emotion of reactions to English news stories at the essay-level. We test different models based on multi-task and multi-input frameworks. The goal was to better exploit all the correlated information given in the data set. We find, though, that empathy as an auxiliary task in multi-task learning and demographic attributes as additional input provide worse performance with respect to single-task learning.

6) Universal joy: a data set and results for classifying emotions across languages

by Sotiris Lamprinidis, Federico Bianchi, Daniel Hardt, Dirk Hovy

WASSA2021

We find that structural and typological similarity between languages facilitates cross-lingual learning, as well as linguistic diversity of training data. Our results suggest that there are commonalities underlying the expression of emotion in different languages.

While emotions are universal aspects of human psychology, they are expressed differently across different languages and cultures. We introduce a new data set of over 530k anonymized public Facebook posts across 18 languages, labeled with five different emotions. Using multilingual BERT embeddings, we show that emotions can be reliably inferred both within and across languages. Zero-shot learning produces promising results for low-resource languages. Following established theories of basic emotions, we provide a detailed analysis of the possibilities and limits of cross-lingual emotion classification.

7) FEEL-IT: Emotion and sentiment classification for the Italian language

by Federico Bianchi, Debora Nozza, Dirk Hovy

WASSA2021

We release an open-source Python library, so researchers can use a model trained on FEEL-IT for inferring both sentiments and emotions from Italian text.

While sentiment analysis is a popular task to understand people’s reactions online, we often need more nuanced information: is the post negative because the user is angry or sad? An abundance of approaches have been introduced for tackling these tasks, also for Italian, but they all treat only one of the tasks. We introduce FEEL-IT, a novel benchmark corpus of Italian Twitter posts annotated with four basic emotions: anger, fear, joy, sadness. By collapsing them, we can also do sentiment analysis. We evaluate our corpus on benchmark datasets for both emotion and sentiment classification, obtaining competitive results.

8) We Need to Consider Disagreement in Evaluation

by Valerio Basile, Michael Fell, Tommaso Fornaciari, Dirk Hovy, Silviu Paun, Barbara Plank, Massimo Poesio, Alexandra Uma

ACL-BPPF2021

We suggest that we need to better capture the sources of disagreement to improve today’s evaluation practice.

We discuss three sources of disagreement: from the annotator, the data, and the context, and show how this affects even seemingly objective tasks. Datasets with multiple annotations are becoming more common, as are methods to integrate disagreement into modeling. The logical next step is to extend this to evaluation.

9) “We will Reduce Taxes” — Identifying Election Pledges with Language Models

by Tommaso Fornaciari, Dirk Hovy, Elin Naurin, Julia Runeson, Robert Thomson, Pankaj Adhikari

Findings-ACL2021

Our results indicate that year and party have predictive power even in Zero-Shot Learning, while context introduces some noise. We finally discuss the linguistic features of pledges.

We use election manifestos of Swedish and Indian political parties to learn neural models that distinguish actual pledges from generic political positions. Since pledges might vary by election year and party, we implement a Multi-Task Learning (MTL) setup, predicting election year and manifesto’s party as auxiliary tasks. Pledges can also span several sentences, so we use hierarchical models that incorporate contextual information. Lastly, we evaluate the models in a Zero-Shot Learning (ZSL) framework across countries and languages.

10) SemEval-2021 Task 12: Learning with Disagreements

by Alexandra Uma, Tommaso Fornaciari, Anca Dumitrache, Tristan Miller, Jon Chamberlain, Barbara Plank, Edwin Simpson, Massimo Poesio

ACL-SEMEVAL2021

The aim of the SemEval-2021 shared task on learning with disagreements (Le-Wi-Di) was to provide a unified testing framework for methods for learning from data containing multiple and possibly contradictory annotations covering the best-known datasets containing information about disagreements for interpreting language and classifying images.

Special Mentions

We hereby describe our “special mention” papers!

11) Special Mention: Visual Summary Identification From Scientific Publications via Self-Supervised Learning

by Shintaro Yamamoto, Anne Lauscher, Simone Paolo Ponzetto, Goran Glavaš, Shigeo Morishima

Frontiers in Research Metrics and Analytics

Our self-supervised pre-training, executed on a large unlabeled collection of publications, attenuates the need for large annotated data sets for visual summary identification and facilitates domain transfer for this task.

We build a novel benchmark data set for visual summary identification from scientific publications, which consists of papers presented at conferences from several areas of computer science. We couple this contribution with a new self-supervised learning approach to learn a heuristic matching of in-text references to figures with figure captions. We evaluate our self-supervised pretraining for visual summary identification on both the existing biomedical and our newly presented computer science data set. The experimental results suggest that the proposed method is able to outperform the previous state-of-the-art without any task-specific annotations.

So Long

Thank you for reading our work! Feel free to contact us if you have any questions!

Looking for Part I?

If you find errors, you can send me a message on Twitter.

See you next week for Part III. We will cover more linguistics related and thoretical papers!

--

--