SEAR

Dataset Documentation

This directory contains context-enhanced table datasets used for temporal table reasoning research presented in “No Universal Prompt: Unifying Reasoning through Adaptive Prompting for Temporal Table Reasoning”.

Overview

The dataset collection comprises 9 benchmark datasets that have been enhanced with improved table formatting using GPT-4-mini. Each dataset is stored in JSON format and contains question-answer pairs along with structured table data.

Dataset Files

Filename Description Questions Domain
fetaqa.context.json FeTaQA - Question answering over Wikipedia tables 1,582 General Knowledge
finqa.context.json FinQA - Financial question answering over tables 962 Finance
hitabs.context.json HiTabs - Hierarchical table question answering 897 Structured Data
hybridqa.context.json HybridQA - Multi-hop QA over tables and text 1,528 Hybrid Reasoning
multi.context.json Multi-hop reasoning over tables 1,587 Complex Reasoning
sqa.context.json SQA - Sequential question answering 248 Sequential Reasoning
squall.context.json SQUALL - SQL-like natural language QA 774 Structured Queries
tatqa.context.json TAT-QA - Tabular and textual question answering 2,244 Hybrid Data
wiki.context.json WikiTableQuestions - Wikipedia table QA 1,504 General Knowledge

Total: 11,326 questions across all datasets

Data Structure

Each JSON file contains an array of objects with the following fields:

{
  "_id": {
    "$oid": "unique_mongodb_object_id"
  },
  "q_num": 0,
  "question": "The question text",
  "table": "Raw table data in text/markdown format",
  "table_id": "source_table_identifier",
  "answer": "The answer or answer array",
  "improved_table_gpt4omini": "Enhanced table formatting with context"
}

Field Descriptions

Dataset Characteristics

FeTaQA (Fact-based Table Question Answering)

FinQA (Financial Question Answering)

HiTabs (Hierarchical Tables)

HybridQA

Multi

SQA (Sequential Question Answering)

SQUALL

TAT-QA (Tabular and Textual QA)

WikiTableQuestions

Usage

Loading Data

import json

# Load a dataset
with open('dataset/fetaqa.context.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

# Access individual examples
example = data[0]
question = example['question']
table = example['improved_table_gpt4omini']  # Use enhanced version
answer = example['answer']

Preprocessing Recommendations

  1. Use improved_table_gpt4omini for better formatted tables
  2. Parse answer formats based on dataset type
  3. Handle sequential questions in SQA dataset as conversation chains
  4. Consider table context and metadata when available

Data Enhancement

All tables have been enhanced using GPT-4-mini to:

Citation

If you use these datasets, please cite the original paper:

@article{sear2025,
  title={No Universal Prompt: Unifying Reasoning through Adaptive Prompting for Temporal Table Reasoning},
  author={[Authors]},
  journal={arXiv preprint},
  year={2025},
  url={https://arxiv.org/abs/2506.11246}
}

Original Dataset Sources

Please also cite the original dataset papers:

License

Please refer to the original dataset licenses. This enhanced version maintains the same licensing as the source datasets.

Contact

For questions about this dataset collection, please refer to the main project README or open an issue on the GitHub repository.