TactileEval

Abstract

Tactile graphics demand expert verification before embossing for blind and visually impaired learners, yet existing datasets only provide coarse holistic ratings. TactileEval decomposes quality into five BANA-aligned dimensions (view, parts, background, texture, lines) across six object families, yielding 30 task families and 14,095 option-level annotations collected with a rigorous AMT protocol. A frozen CLIP ViT-L/14 feature probe trained on this dataset delivers 85.7% test accuracy, with consistent difficulty ordering across families. Building on these evaluations, we introduce a ViT-guided editing pipeline that routes classifier scores through family-specific prompt templates and applies targeted edits via gpt-image-1, reducing issue probabilities in 14/15 high-confidence cases.

Key Results

Structured Dataset

66 object classes, 6 families, 5 quality dimensions → 30 task families and 14,095 binary option labels with full vote metadata.

Probe Performance

Frozen CLIP ViT-L/14 + 2-layer probe: 85.7% overall accuracy (test split), with background checks reaching 100%.

Editing Impact

ViT-guided edits decrease issue probability in 14/15 high-confidence samples (mean drop 0.329, median drop 0.397).

Task / Option	p(before)	p(after)	Δ
F2QT Egg — `missing_texture`	0.903	0.693	+0.211
F2QT Planet — `missing_texture`	0.739	0.099	+0.640
F5QL Scooty — `too_thick`	0.934	0.679	+0.255
F4QP Laptop — `missing_parts`	0.858	0.807	+0.050
F1QL Dinosaur — `too_thick`	0.985	0.989	-0.004

Acknowledgements

This work was supported in part by MITACS and the Digital Alliance of Canada. We thank the student volunteers at the Intelligent Machines Lab (iML), Carleton University, for their contributions, and Joshua Olojede and Hoda Vafaeesefat for their help with the AMT annotation environment.