ACL Demo 2026 · Project Page

EviSearch: A Human in the Loop System for Extracting and Auditing Clinical Evidence for Systematic Reviews

Systematic reviews depend on evidence that is not just extracted, but auditable. EviSearch turns clinical trial PDFs into structured evidence tables by pairing direct PDF reasoning, retrieval-guided search, automated reconciliation, and a review interface built for human validation.

Watch Video Open Demo Paper Link Coming Soon

The public paper URL will be added once the release is live.

Naman Ahuja¹ Saniya Mulla² Muhammad Ali Khan² Zaryab Bin Riaz² Kaneez Zahra Rubab Khakwani² Mohamad Bassam Sonbol² Irbaz Bin Riaz¹ Vivek Gupta¹

¹Arizona State University ²Mayo Clinic

Reading Note

Clinical evidence extraction is a document problem, not a plain-text problem

Trial evidence is scattered across prose, tables, figure captions, and visual plots. Some fields require document-level judgment, while others depend on finding the exact table cell or chart label that supports a reported value. That makes naive parsed-text extraction brittle, especially when downstream decisions depend on trustworthy evidence.

EviSearch is designed around that reality. It combines a direct PDF query agent with a retrieval-guided search agent, then reconciles disagreements through page-level verification. The final output is not just a table of answers, but an auditable record that a clinician or reviewer can inspect and revise.

Architecture

A multi-stage system centered on verification and human feedback

Users specify the schema or values to extract, the system runs parallel evidence gathering and reconciliation, and the review interface exposes source-grounded outputs for final validation.

EviSearch system architecture showing user inputs, AI processing and retrieval, reconciliation, and human validation interface — The pipeline couples direct PDF reasoning with retrieval-based search, then routes disagreements through a reconciliation module and a human validation layer that supports updates, feedback, and long-term storage.

Results

Performance depends on modality, so the page should show the evidence mix too

Donut chart showing evidence source modality distribution across text, table, and figure — In the benchmark, most reported values come from running text, but a large share still depends on structured tables. Figure-based evidence is less frequent, yet it is often the hardest to extract correctly.

Line chart comparing extraction performance across text, table, and figure modalities — EviSearch holds the strongest scores across all three evidence modalities, with the clearest separation on figure-based extraction where layout and visual grounding matter most.

53.4%

of reported evidence in the benchmark comes from narrative text rather than cleanly structured fields.

41.8%

still relies on tables, which means robust extraction cannot ignore document structure.

86.7

overall score on figure-sourced evidence for EviSearch, outperforming the compared baselines shown in the chart.

Interface

The review loop is part of the system, not a cleanup step

Reviewers can inspect extracted values, compare answers from different extraction paths, open the attributed source region in the PDF, and correct outputs with feedback that can later support model improvement.

EviSearch extraction UI with extraction results table — The extraction view groups schema fields into batches, runs multiple methods in parallel, and exposes the resulting evidence table with method-level outputs.

EviSearch attribution interface showing source text from the PDF next to extracted answer — The attribution interface links answers back to source evidence in the PDF, making reconciliation and human auditing practical rather than aspirational.

Artifacts

Try the system, then come back for the paper release

This page is meant to work like a compact research note: a quick route into the live system, the demo video, and the main figures that explain why the approach matters.

Launch Demo View Video Paper Coming Soon