TactileEval

Fine-grained evaluation and ViT-guided editing of tactile graphics.

Paper (coming soon) Code Dataset

Abstract

Tactile graphics demand expert verification before embossing for blind and visually impaired learners, yet existing datasets only provide coarse holistic ratings. TactileEval decomposes quality into five BANA-aligned dimensions (view, parts, background, texture, lines) across six object families, yielding 30 task families and 14,095 option-level annotations collected with a rigorous AMT protocol. A frozen CLIP ViT-L/14 feature probe trained on this dataset delivers 85.7% test accuracy, with consistent difficulty ordering across families. Building on these evaluations, we introduce a ViT-guided editing pipeline that routes classifier scores through family-specific prompt templates and applies targeted corrections via gpt-image-1, reducing issue probabilities in 14/15 high-confidence cases.

Key Results

Structured Dataset

66 object classes, 6 families, 5 quality dimensions → 30 task families and 14,095 binary option labels with full vote metadata.

Probe Performance

Frozen CLIP ViT-L/14 + 2-layer probe: 85.7% overall accuracy (test split), with background checks reaching 100%.

Editing Impact

ViT-guided edits decrease issue probability in 14/15 high-confidence samples (mean drop 0.329, median drop 0.397).

Task / Option$p_{\text{before}}$$p_{\text{after}}$$\Delta$
F2QT Egg — missing_texture0.9030.693+0.211
F2QT Planet — missing_texture0.7390.099+0.640
F5QL Scooty — too_thick0.9340.679+0.255
F4QP Laptop — missing_parts0.8580.807+0.050
F1QL Dinosaur — too_thick0.9850.989-0.004

Resources