Hate Speech

Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection

Transformer-based Natural Language Processing models have become the standard for hate speech detection. However, the unconscious use …

Giuseppe Attanasio, Debora Nozza, Eliana Pastor, Dirk Hovy

Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection

Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists

Natural Language Processing (NLP) models risk overfitting to specific terms in the training data, thereby reducing their performance, …

Giuseppe Attanasio, Debora Nozza, Dirk Hovy, Elena Baralis

Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists