Performance comparison of different methods on Rotowire across models using various string similarity metrics
Performance comparison using TabEval and AutoQA on Rotowire across strategies on various models.
Performance comparison using string similarity metrics across different categories showing Error Rates (in %) and RMSE scores of GPT-4o and Gemini-2.0-flash-exp for Livesum.