Positive Reviews
52.8%
1,056 of 2,000 reviews
Majority classNegative Reviews
34.7%
694 require attention
Action neededPriority Flagged
596
Safety · Legal · Fraud signals
Auto-detectedModel Accuracy
87.6%
Held-out test set only
DistilBERTSentiment Distribution
Ground-truth label proportions across all 2,000 UK reviews
Sentiment by Product Category
Which categories generate the most negative reviews
Monthly Sentiment Trend — 2022 to 2024
% positive vs % negative reviews per month — tracking shifts in customer satisfaction over time
Topic Distribution
Delivery dominates at 60.9% — the single biggest driver of review volume
Negative Sentiment Rate by Topic
Which topics are most associated with negative customer experience
Topic × Sentiment Breakdown
Stacked positive / neutral / negative for each topic — safety and returns/refunds have highest negative rates
Company % Positive Sentiment — League Table
All 30 companies ranked by positive review rate · Red = below 50% · Green = above 60%
Average Star Rating by Company
Top 15 companies by mean star rating
Priority Complaints by Company
Flagged safety / legal / fraud complaints per company
Priority Complaint Queue — Auto-Flagged Reviews
Sorted by priority tier · Critical = safety / legal / fraud signals · detected by rule-based keyword scoring · independent of sentiment model
| Tier | Company | Category | Topic | Review Extract | Signals |
|---|
Confusion Matrix — Test Set Only
Rows = actual label · Cols = predicted · Neutral class hardest due to fewest samples (250 total)
Per-Class F1 Score
Neutral underperforms due to class imbalance — addressed with class weighting on train set
Training Curve — Loss per Epoch (Simulated from Demo Run)
Train loss vs validation loss · Small gap (~0.03) confirms no overfitting · Early stopping monitors val loss, never test loss
Model Card
Architecture
Base model: distilbert-base-uncased
Parameters: 66.4M
Task: Sequence classification (3 classes)
Max length: 128 tokens
Dropout: 0.1 (DistilBERT default)
Training Config
Optimizer: AdamW
Learning rate: 2e-5
Weight decay (L2): 0.01
Gradient clipping: 1.0
Warmup ratio: 10%
Data Splits
Strategy: Stratified 70 / 15 / 15
Train: 1,400 reviews
Validation: 300 reviews
Test: 300 reviews (held out)
Class weights: Computed on train only
Known Limitations
Neutral F1 is lowest (72.2%) — class is ambiguous and smallest
Trained on English UK text only — may not generalise to other dialects
Very short reviews (<5 words) may produce low-confidence outputs
Live Review Classifier
Type or paste any UK customer review · pipeline classifies sentiment, extracts topics, and scores priority in real time using the rule-based demo classifier (run 02_train_model.py --demo and 04_inference.py --use_model for the full DistilBERT model)
Sentiment
—
—
Pos
Neg
Neu
Primary Topic
—
—
Word Count
—
chars: —
Priority Tier
—
—
Quick Test Examples
Click any example to classify instantly