Detecting Sexism Across Social Media with a BERT Model
As part of the EXIST 2025 lab at CLEF, Darren and I built a system for identifying and classifying sexism across tweets, memes, and videos. The core idea was deliberately contrarian to the competition’s framing: rather than building separate models tuned to each media type, we wanted to see how far a single model trained only on tweet text could get when applied to the other formats. The full paper is available here.
The Problem
The task had two parts. First, identify whether a given piece of social media content is sexist. Second, if it is, classify the intent behind it — is it someone being directly sexist, someone condemning sexism, or someone reporting an experience of it? These are meaningfully different things, and conflating them is both analytically and practically a problem.
The data covered three media types: around 10,000 tweets, 5,000 memes, and 2,500 videos, split across English and Spanish. For the memes, text was extracted via OCR. For the videos, transcripts were provided. Both are obviously lossy ways of representing their respective media, but they’re what you’ve got if you want to feed the content into a language model.
What We Did
For English tweets we used DistilBERT, a smaller and faster distillation of the original BERT model that holds up surprisingly well on classification tasks. For Spanish, we used BETO, a BERT model pre-trained on three billion tokens of Spanish text. Multilingual BERT variants exist, but initial testing showed the single-language models performed better, which is not particularly surprising.
Two submission variants were tried. The first (bergro_1) trained the model as a soft regressor — predicting a probability rather than a hard yes/no — and then thresholded at 0.5 for the hard evaluation. The second (bergro_2) trained it as a direct binary classifier. The soft approach won, which is a recurring theme in these kinds of tasks: treating label disagreement between annotators as signal rather than noise tends to help.
For the memes, we also used Gemma 3 to generate image descriptions as an alternative to relying purely on OCR output. The LLM-generated descriptions outperformed OCR on the identification task, which makes sense — OCR captures only the text in an image, while a multimodal model can describe the visual context that often carries the meaning.
Results
The identification model ranked 39th out of 160 on the hard evaluation and 18th out of 67 on the soft evaluation. More interestingly, without any additional training, the tweet-trained model outperformed the majority-class baseline on both memes and videos. That’s the result that feels worth paying attention to. It’s a reasonable argument that a single generalizable model, fine-tuned on the most accessible data type, can do useful work across formats it’s never seen.
The source intention classification held up well on tweets and memes, but performed more modestly on videos. The likely reason is that videos and tweets share more surface-level textual similarity than videos and memes do — a video transcript looks a lot like a tweet in ways that a meme’s OCR output often doesn’t.
Reflections
The paper notes a few honest limitations. Sexist language actively tries to evade detection — slang, euphemisms, and in-group terminology shift constantly, which means any static model will degrade over time. The dataset also spans nearly a decade of social media, during which language usage changed considerably. Neither of these is a problem you can fully solve; they’re just worth keeping in mind when interpreting the numbers.
We also didn’t use the annotator demographic data, or any of the visual content from the videos beyond their transcripts. Both of those feel like reasonable places to look for further improvements.