Accepted to BioNLP 2026
Structured Clinical hybrid Planning for Evidence retrieval in clinical trials
Arizona State University, Mayo Clinic
We study clinical trial table reasoning, where answers are not directly stored in visible cells and must be inferred from semantic understanding through normalization, classification, extraction, and lightweight domain reasoning. We introduce SCoPE, a multi-LLM planner-based framework that decomposes the problem into row selection, structured planning, and execution. Across 1,500 hybrid reasoning questions over oncology clinical-trial tables, explicit planning improves grounded row-level reasoning accuracy over direct prompting and stronger tabular baselines, while maintaining a favorable accuracy-efficiency tradeoff.
The benchmark contains 1,500 programmatically augmented hybrid reasoning questions constructed from an expert-authored seed set of 500 questions over oncology clinical-trial data.
| Statistic | Value |
|---|---|
| Rows | 159 |
| Columns | 32 |
| Unique trials | 105 |
| Cancer types | 19 |
| Total questions | 1,500 |
| Target fields | 31 |
SCoPE improves grounded row-level reasoning over direct prompting and tabular baselines. In the reported results, it is strongest overall on GPT-OSS and Qwen3, and ties Table F1 while improving grounding metrics on Llama-3.3.
| Method | Qwen3 F1 | Llama-3.3 F1 | GPT-OSS F1 |
|---|---|---|---|
| Zero Shot | 56.32 | 66.96 | 73.50 |
| CoT | 55.37 | 70.87 | 74.17 |
| Few-Shot | 54.74 | 69.38 | 73.99 |
| EHRAgent | 32.99 | 30.99 | 34.85 |
| SCoPE | 63.19 | 70.87 | 74.31 |
Supported by the Mayo Clinic and Arizona State University Alliance for Health Care Collaborative Research Seed Grant Program.