
AI-powered bias detection for a more informed world
Machine Learning
Master Data Set:
Built a bias classification dataset with 7 labels: IDENTITY_ABUSE, STEREOTYPE_NEGATIVE,
STEREOTYPE_BENEVOLENT, SYSTEMIC_BIAS, HATE_SPEECH, TOXICITY_GENERAL, and NO_BIAS.
​
Data Sources Used:
Successfully Integrated (~83K samples from real datasets):
Jigsaw/Civil Comments: ~38K samples (toxicity detection)
Social Bias Frames: ~27K samples (offensive content classification)
HatExplain: ~14K samples (hate speech with explanations)
StereoSet: ~2K samples (stereotype detection)
CrowS-Pairs: ~1.5K samples (stereotype pairs)
​
remaining data, synthetically generated.
​
Data Formatting for Training:
Single-label structure: Samples have one bias label or no bias
Ratio: 79,528 training / 20,472 validation samples
JSONL format: Each line contains {"text": "...", "target": ["LABEL1", "LABEL2"], "source": "..."}
Training Rationale:
Fine-tuned DistilBERT model: Smaller and faster than BERT but has near identical performance for text
classification.
Conservative settings: Small batch (8), few epochs (3), standard learning rate (5e-05) to avoid overfitting
Dataset has 7 overlapping labels from diverse sources, and figured choices prevent the model from
memorizing patterns.
Model Wins:
50% greater speed then fastest non-thinking models
- Using Macbook m2 to run (with pipelining)
- Gemini 2.5 flash
- Chatgpt 5 (non-thinking)
77x more power efficient than LLM’s
- Conservative estimate of ~1.0kWh per 1 million tokens for AI
- Approximately 0.012 kWh average per million tokens after running 100 tests on Macbook m2 (with
pipelining)
98.8% Cheaper to run than LLM’s
- Estimates from previously
- Assumes average U.S electricity rate of $0.15 per kWh
​
accuracy on validation set: 0.8669402110199297
loss during training: 0.3745632469654083


