Data Verification

This page runs automated checks against Eurostat employment totals, Statistik Austria VSE earnings, and internal consistency rules (sectors, pay, AI exposure, outlook). Use it to see which claims the dataset satisfies and how each test is defined.

Coverage checks for totals, sector plausibility, pay statistics, and bounds on AI exposure and outlook. Re-run after regenerating data.

Last run:

30/30 passedAll tests passed!
Total employment matches Eurostat (2024)
We sum jobs across all occupation rows and compare to Eurostat’s Austrian employed-persons total (4,731,970). The share should stay within 95–105% after rounding and proportional splits.
Expected: 95–105% of 4,731,970Actual: 4,741,781 (100.2%)
Every occupation has positive employment
Missing or zero job counts would signal a broken sector split or a dropped Eurostat row.
Expected: All occupations > 0 jobsActual: All > 0
Every occupation has VSE-based pay
Median gross annual pay should come from Statistik Austria VSE (ISCO or NACE), scaled with the 13th/14th-month factor used in this pipeline.
Expected: All occupations have pay dataActual: All > 0
Annual pay stays in a realistic band
Across aggregated occupations, min/max gross annual pay should sit inside €15k–€150k so obvious data-entry errors fail fast.
Expected: €15,000 – €150,000Actual: €24,798 – €65,196
Education labels map to the canonical list
Each row’s education string must be one of EDU_LEVELS_EN so filters and comparisons stay stable in EN/DE.
Expected: All in EDU_LEVELS_ENActual: All valid
AI exposure scores stay within 0–10
Exposure is an integer rubric score (curated, optionally merged with optional LLM overrides). Values outside 0–10 indicate a bad merge or corrupt row.
Expected: All values 0–10Actual: All 0–10
Job outlook scores stay within −10…+10
Outlook is a qualitative demand signal per occupation group, not a forecast model. It must remain inside the documented scale.
Expected: All values −10 to +10Actual: All −10 to 10
Occupation slugs are unique
Duplicate slugs would break routing and merge logic in the UI.
Expected: No duplicatesActual: All unique
Each row carries a traceable source string
The combined employment + pay provenance string should be long enough to identify Eurostat/VSE references.
Expected: All occupations have source fieldActual: All have sources
Manufacturing (NACE C) matches Eurostat scale
Summed jobs in all NACE C rows should exceed ~600k so the manufacturing total is in line with Eurostat’s Austria figure (~690k).
Expected: > 600,000 (Eurostat ≈ 690k)Actual: 688,540
Information & communication (NACE J) matches Eurostat scale
Summed jobs across NACE J should exceed ~140k, consistent with Eurostat’s IT/telecom employment for Austria (~155k).
Expected: > 140,000 (Eurostat ≈ 155k)Actual: 152,910
Enough occupation groups for a useful treemap
We keep a minimum number of distinct occupation aggregates so the visualization is not dominated by a handful of cells.
Expected: ≥ 50Actual: 62
Salary inequality (Gini) across occupations is plausible
Unweighted Gini of median annual pay across occupation rows should sit in 0.10–0.35 — Austria’s wage structure is relatively compressed.
Expected: 0.10–0.35Actual: 0.136
Coefficient of variation of pay is plausible
CV (std/mean) of pay across occupations should stay between 0.15 and 0.50 — enough spread without wild outliers.
Expected: 0.15–0.50Actual: 0.240
AI exposure correlates positively with pay
Across occupations, higher cognitive/digital roles tend to pay more; Pearson r between exposure and pay should be > 0.
Expected: r > 0Actual: r = 0.701
Higher formal education → higher median pay
Weighted median pay for Master’s/PhD occupations should exceed weighted median for compulsory-school-only occupations.
Expected: High median > Low medianActual: Low: €29,203 | High: €60,986
Services (G–S) employ more than goods (A–F)
Tertiarization check: employment in sections G–S should exceed A–F, consistent with Austria’s service-heavy economy.
Expected: Services > 50%Actual: 74.6% in G–S (3,537,571 jobs)
Health & social work (NACE Q) is a large employer
Summed jobs in NACE Q should exceed 400k to match Eurostat’s ballpark (~440k) for Austria.
Expected: > 400,000Actual: 522,600
Construction (NACE F) sits in the Eurostat band
Construction employment should fall between 250k and 350k, around Eurostat’s ~300k for Austria.
Expected: 250k–350kActual: 317,300
All ÖNACE sections A–S appear at least once
Every high-level section letter A through S should have ≥1 job row so sector coverage is complete.
Expected: All sections A–S representedActual: 19 sections A–S
Employment-weighted median pay is near the national band
Jobs-weighted median gross annual pay should land in €35k–€45k, consistent with Statistik Austria’s national medians after the 13th/14th uplift.
Expected: €35,000–€45,000Actual: €42,710
AI exposure uses most of the 0–10 scale
Max − min exposure should span at least 7 points so physical vs cognitive jobs are visibly separated.
Expected: span ≥ 7Actual: Range 1–9 (span 8)
Physical sectors (A, F) average low AI exposure
Jobs-weighted mean exposure for agriculture + construction should stay below 4 — these roles remain mostly manual/site-based.
Expected: < 4.0Actual: Avg 1.96 (n=7)
Knowledge-intensive sectors (J, K, M) average high exposure
Jobs-weighted mean exposure for IT, finance, and professional services should exceed 5.
Expected: > 5.0Actual: Avg 7.39 (n=10)
Employment is not dominated by one ÖNACE section (HHI)
Herfindahl index across section letters (A–S) weighted by jobs should stay < 0.15 for a diversified economy.
Expected: < 0.15Actual: HHI = 0.0869
Extreme education levels are a minority of rows
Share of occupations tagged compulsory-only or PhD should stay below 30% so the ladder isn’t all extremes.
Expected: < 30%Actual: 7 of 62 (11.3%)
Total wage bill (pay × jobs) is economically plausible
Sum of pay×jobs across rows should fall in €150B–€260B, near Eurostat compensation of employees for Austria.
Expected: €150B–€260BActual: €205.1B
Wholesale & retail (NACE G) is the largest private trade block
Summed jobs in all G rows (G45–G47) should sit in 500k–700k, consistent with WKO/Eurostat ballparks.
Expected: 500k–700kActual: 693,530
Every exposure score has a written rationale
Exposure rationales should be at least ~20 characters so the rubric is explainable in the UI.
Expected: All rationales ≥ 20 charsActual: All ≥ 20 chars
Outlook is not systematically biased
Jobs-weighted mean outlook should sit near zero (±3). Strong drift would hint at one-sided scoring.
Expected: −3.0 to +3.0Actual: 0.87