PoliTeam @ AMI: Improving Sentence Embedding Similarity with Misogyny Lexicons for Automatic Misogyny Identificationin Italian Tweets

Giuseppe Attanasio, Eliana Pastor

November 2020

Abstract

We present a multi-agent classiﬁcation solution for identifying misogynous and aggressive content in Italian tweets. A ﬁrst agent uses modern Sentence Embedding techniques to encode tweets and a SVM classiﬁer to produce initial labels. A second agent, based on TF-IDF and Misogyny Italian lexicons, is jointly adopted to improve the ﬁrst agent on uncertain predictions. We evaluate our approach in the Automatic Misogyny Identiﬁcation Shared Task of the EVALITA 2020 campaign. Results show that TF-IDF and lexicons effectively improve the supervised agent trained on sentence embeddings.

Type

Conference paper

Publication

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020