SportSQL

SportSQL: Interactive System for Real-Time Sports Reasoning and Visualization

Paper License Python PostgreSQL

A modular, interactive system for natural language querying and visualization of dynamic sports data, with a focus on the English Premier League (EPL).

System Workflow

SportSQL translates user questions into executable SQL over a live, temporally indexed database constructed from real-time Fantasy Premier League (FPL) data. It supports both tabular and visual outputs, leveraging symbolic reasoning capabilities of Large Language Models (LLMs) for query parsing, schema linking, and visualization selection.

πŸ“„ Paper: SPORTSQL: An Interactive System for Real-Time Sports Reasoning and Visualization
🌐 Project page: https://coral-lab-asu.github.io/SportSQL/
🌐 Demo / Code: https://github.com/coral-lab-asu/SportSQL


🎯 Key Features

πŸ” Three Query Modes

  1. Single-Query NL2SQL - Direct translation of natural language to SQL
    • Fast, single-shot query execution
    • Ideal for simple questions about current season stats
    • Example: β€œHow many goals has Erling Haaland scored?”
  2. Deep Research Mode - Multi-query comprehensive analysis
    • Automatic query decomposition into sub-questions
    • Historical data analysis across multiple seasons
    • Player comparison and trend analysis
    • Example: β€œCompare Haaland and Salah’s offensive performance over the last 3 seasons”
  3. Interactive Visualization - Automatic chart generation
    • LLM-powered visualization selection
    • Dynamic chart generation from query results
    • Pre-built gallery of common visualizations

πŸ—οΈ System Architecture


πŸ“Š DSQABENCH: Dynamic Sport Question Answering Benchmark

To evaluate system performance, we introduce DSQABENCH, comprising:


πŸš€ Quick Start

Prerequisites

Installation

# Clone the repository
git clone https://github.com/coral-lab-asu/SportSQL.git
cd SportSQL

# Create conda environment
conda env create -f environment.yml
conda activate sportsql

# Or use pip
pip install -r requirements.txt

# Set up PostgreSQL (macOS)
brew install postgresql@15
brew services start postgresql@15

# Configure environment variables
cp .env.example .env
# Edit .env with your database credentials and API keys

Database Setup

# Initialize local database with FPL data
python src/database/setup_local_db.py

Run the Web Interface

# Start the Flask application
cd website
python app.py --server local --port 5000

# Open browser to http://localhost:5000

πŸ’¬ Example Queries

Simple Queries

"Who are the top 5 goal scorers this season?"
"How many assists does Saka have?"
"Which team has the most clean sheets?"

Deep Research Queries

"Compare Erling Haaland and Mohamed Salah's offensive performance over the last 3 seasons"
"Analyze Liverpool's defensive statistics and trends this season"
"Show me players who consistently outperform their expected goals"

Visualization Queries

"Show me a chart of top scorers"
"Visualize the relationship between expected goals and actual goals for Haaland"
"Plot team standings by strength"

πŸ“ Project Structure

SportSQL/
β”œβ”€β”€ src/                          # Core source code
β”‚   β”œβ”€β”€ database/                 # Database layer (PostgreSQL)
β”‚   β”œβ”€β”€ llm/                      # LLM integration (Gemini/OpenAI)
β”‚   β”œβ”€β”€ nl2sql/                   # Single-query NL2SQL
β”‚   β”œβ”€β”€ deep_research/            # Deep research mode
β”‚   └── visualization/            # Chart generation
β”‚
β”œβ”€β”€ website/                      # Web interface
β”‚   β”œβ”€β”€ app.py                    # Flask application
β”‚   β”œβ”€β”€ static/                   # CSS, JS, images
β”‚   └── templates/                # HTML templates
β”‚
β”œβ”€β”€ data/                         # Dataset (CSV files)
β”œβ”€β”€ docs/                         # Documentation
β”œβ”€β”€ scripts/                      # Utility scripts
β”œβ”€β”€ benchmarking/                 # Evaluation scripts & results
└── update_player_mappings/       # Ground truth tools

See STRUCTURE.md for detailed documentation.


πŸ”§ Configuration

Environment Variables

Create a .env file in the project root:

# PostgreSQL Configuration
LOCAL_DATABASE_HOST=localhost
LOCAL_DATABASE_PORT=5432
LOCAL_DATABASE_USER=your_username
LOCAL_DATABASE_PASSWORD=your_password
LOCAL_DATABASE_NAME=postgres

# LLM Configuration (choose one or both)
# Gemini (default)
API_KEY=your_gemini_api_key
GEMINI_MODEL=gemini-2.0-flash

# OpenAI (optional)
OPENAI_API_KEY=your_openai_api_key
GPT_MODEL=gpt-4o

LLM Provider Selection

# Use Gemini (default)
python website/app.py --server local

# Use OpenAI
python website/app.py --server local --llm openai

See docs/LLM_USAGE.md for detailed LLM configuration.


πŸ§ͺ Evaluation

Run the evaluation pipeline on DSQABENCH:

# Evaluate the full pipeline
python scripts/evaluate_pipeline.py

# Test specific components
python scripts/test_evaluation.py

# Run benchmarking scripts
python benchmarking/scripts/llm_sql_evaluator.py

πŸ“– Documentation


πŸŽ“ Citation

If you use SportSQL or DSQABENCH in your research, please cite:

@inproceedings{ahuja-etal-2025-sportsql,
    title = "{SPORTSQL}: An Interactive System for Real-Time Sports Reasoning and Visualization",
    author = "Ahuja, Naman and others",
    booktitle = "Proceedings of the 2025 International Joint Conference on Natural Language Processing: System Demonstrations",
    year = "2025",
    url = "https://aclanthology.org/2025.ijcnlp-demo.11",
    pages = "TBD"
}

πŸ› οΈ Development

Running Tests

# Test imports after reorganization
python test_imports.py

# Run evaluation tests
python scripts/test_evaluation.py

Database Management

# Refresh local database with latest FPL data
python src/database/setup_local_db.py

# Update specific player data
python scripts/update_db.py

Adding New Features

The modular architecture makes it easy to extend:

  1. New query types: Add to src/nl2sql/ or src/deep_research/
  2. New LLM providers: Extend src/llm/wrapper.py
  3. New visualizations: Add to src/visualization/
  4. New data sources: Extend src/database/operations.py

🀝 Contributing

We welcome contributions! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments


πŸ“§ Contact

For questions or issues:


🌟 Star History

If you find SportSQL useful, please consider giving it a star ⭐!

Star History Chart