AI-powered bias detection for a more informed world

Machine Learning

Master Data Set:

Built a bias classification dataset with 7 labels: IDENTITY_ABUSE, STEREOTYPE_NEGATIVE,

STEREOTYPE_BENEVOLENT, SYSTEMIC_BIAS, HATE_SPEECH, TOXICITY_GENERAL, and NO_BIAS.

Data Sources Used:
Successfully Integrated (~83K samples from real datasets):
Jigsaw/Civil Comments: ~38K samples (toxicity detection)
Social Bias Frames: ~27K samples (offensive content classification)
HatExplain: ~14K samples (hate speech with explanations)
StereoSet: ~2K samples (stereotype detection)
CrowS-Pairs: ~1.5K samples (stereotype pairs)

remaining data, synthetically generated.

Data Formatting for Training:

Single-label structure: Samples have one bias label or no bias
Ratio: 79,528 training / 20,472 validation samples

JSONL format: Each line contains {"text": "...", "target": ["LABEL1", "LABEL2"], "source": "..."}

Training Rationale:
Fine-tuned DistilBERT model: Smaller and faster than BERT but has near identical performance for text

classification.
Conservative settings: Small batch (8), few epochs (3), standard learning rate (5e-05) to avoid overfitting
Dataset has 7 overlapping labels from diverse sources, and figured choices prevent the model from

memorizing patterns.

Model Wins:

50% greater speed then fastest non-thinking models
- Using Macbook m2 to run (with pipelining)
- Gemini 2.5 flash
- Chatgpt 5 (non-thinking)

77x more power efficient than LLM’s
- Conservative estimate of ~1.0kWh per 1 million tokens for AI
- Approximately 0.012 kWh average per million tokens after running 100 tests on Macbook m2 (with

pipelining)

98.8% Cheaper to run than LLM’s
- Estimates from previously
- Assumes average U.S electricity rate of $0.15 per kWh

accuracy on validation set: 0.8669402110199297

loss during training: 0.3745632469654083