MMTabQA is a large-scale multimodal table QA dataset containing 69,740 questions over 25,026 tables, created by augmenting four existing datasets—WikiSQL (21,472 Qs, 9,784 tables), WikiTableQuestions (10,052 Qs, 1,259 tables), FeTaQA (7,476 Qs, 5,898 tables), and HybridQA (30,470 Qs, 8,085 tables).
It integrates multiple reasoning styles: SQL-based parsing, complex multi-row/column reasoning, long-form answers, and hybrid table–text reasoning with contextual passages.
Questions are categorized into 4 types—Explicit (24,797), Implicit (21,453), Visual (5,763), and Answer-Mention (17,727)—with an average of 14.10 images per table, enabling evaluation of entity parsing, visual grounding, and multimodal reasoning skills.