Group mention economic attributes classifier

A multi-label classifier for detecting economic attribute categories referred to in a social group mention, trained with setfit based on the light-weight sentence-transformers/all-mpnet-base-v2 sentence embedding model.

The economic attributes classified are:

attribute	definition
class membership	People described with their membership in or belonging to a social class such as the upper class, the middle class, lower class, or the working class.
employment status	People described or categorized by their employment status such as employers, employees, self-employed, or unemployed people.
education level	People described with or categorized by their education level such as students, apprentices, higher education, tertiary education, vocational training or graduates.
income/wealth/economic status	People defined or categorized by their income, wealth, or economic status such as high/medium/low income groups, rich/poor people, homeowners/tenants/homeless.
occupation/profession	People referred to with or categorized according to their occupation or profession such as teachers, farmers, public servants, police officers
ecology of group	People categorized by their relation to the ecology of society such as carbon emitters, coal miners, green employers, green workers, sustainable farmers, those working in the fossil sector

Model Details

Model Description

Group mention economic attributes classifier

Developed by: Hauke Licht
Model type: mpnet
Language(s) (NLP): ['en']
License: apache-2.0
Finetuned from model: sentence-transformers/all-mpnet-base-v2
Funded by: The Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy – EXC 2126/1 – 390838866

Model Sources

Repository: tba
Paper: tba
Demo: [More Information Needed]

Uses

Bias, Risks, and Limitations

Evaluation of the classifier in held-out data shows that it makes mistakes.
The model has been finetuned only on human-annotated labeled social group mentions recorded in sentences sampled from party manifestos of European parties (mostly far-right and Green parties). Applying the classifier in other domains can lead to higher error rates.
The data used to finetune the model come from human annotators. Human annotators can be biased and factors like gender and social background can impact their annotations judgments. This may lead to bias in the detection of specific social groups.

Recommendations

Users who want to apply the model outside its training data domain should evaluate its performance in the target data.
Users who want to apply the model outside its training data domain should contuninue to finetune this model on labeled data.

How to Get Started with the Model

Use the code below to get started with the model.

Usage

You can use the model with the setfit python library (>=1.1.0):

Note: It is recommended to use transformers version >=4.5.5,<=5.0.0 and sentence-transformers version >=4.0.1,<=5.1.0 for compatibility.

Classification

import torch
from setfit import SetFitModel

model_name = "haukelicht/all-mpnet-base-v2_economic-attributes-classifier"
device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
classifier = SetFitModel.from_pretrained(model_name)
classifier.to(device);

# Example mentions
mentions = ["working class people", "highly-educated professionals", "people without a stable job"]

# Get predictions
with torch.no_grad():
    predictions = classifier.predict(mentions)
print(predictions)

# Map predictions to labels
[
    [
        classifier.id2label[l]
        for l, p in enumerate(pred) if p==1
    ]
    for pred in predictions
]

Mention embedding

import torch
from sentence_transformers import SentenceTransformer

model_name = "haukelicht/all-mpnet-base-v2_economic-attributes-classifier"
device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"

# Load the sentence transformer component of the pre-trained classifier
model = SentenceTransformer(model_name, device=device)

# Example mentions
mentions = ["working class people", "highly-educated professionals", "people without a stable job"]

# Compute mention embeddings
with torch.no_grad():
    embeddings = model.encode(mentions)

Training Details

Training Data

The train, dev, and test splits used for model finetuning and evaluation will be made available on Github upon publication of the associated research paper.

Training Procedure

Training Hyperparameters

num epochs: (1, 4)
train batch sizes: (16, 4)
body train max teps: 100
head learning rate: 0.030
L2 weight: 0.015
warmup proportion: 0.10

Evaluation

Testing Data, Factors & Metrics

Testing Data

The train, dev, and test splits used for model finetuning and evaluation will be made available on Github upon publication of the associated research paper.