Fine-grained evaluation and ViT-guided editing of tactile graphics.
Tactile graphics demand expert verification before embossing for blind and visually impaired learners, yet existing datasets only provide coarse holistic ratings. TactileEval decomposes quality into five BANA-aligned dimensions (view, parts, background, texture, lines) across six object families, yielding 30 task families and 14,095 option-level annotations collected with a rigorous AMT protocol. A frozen CLIP ViT-L/14 feature probe trained on this dataset delivers 85.7% test accuracy, with consistent difficulty ordering across families. Building on these evaluations, we introduce a ViT-guided editing pipeline that routes classifier scores through family-specific prompt templates and applies targeted corrections via gpt-image-1, reducing issue probabilities in 14/15 high-confidence cases.
66 object classes, 6 families, 5 quality dimensions → 30 task families and 14,095 binary option labels with full vote metadata.
Frozen CLIP ViT-L/14 + 2-layer probe: 85.7% overall accuracy (test split), with background checks reaching 100%.
ViT-guided edits decrease issue probability in 14/15 high-confidence samples (mean drop 0.329, median drop 0.397).
| Task / Option | $p_{\text{before}}$ | $p_{\text{after}}$ | $\Delta$ |
|---|---|---|---|
F2QT Egg — missing_texture | 0.903 | 0.693 | +0.211 |
F2QT Planet — missing_texture | 0.739 | 0.099 | +0.640 |
F5QL Scooty — too_thick | 0.934 | 0.679 | +0.255 |
F4QP Laptop — missing_parts | 0.858 | 0.807 | +0.050 |
F1QL Dinosaur — too_thick | 0.985 | 0.989 | -0.004 |