Financial documents often contain critical information buried within complex, fragmented tables spanning multiple pages, posing significant challenges for automated extraction. TASER addresses these challenges by employing a schema-guided approach with specialized table agents that perform detection, classification, extraction, and recommendations based on an initial schema. A Recommender Agent further refines the outputs by suggesting schema revisions and finalizing recommendations, enabling continuous learning and improvement. This approach leads to a 10.1% performance increase over existing models like Table Transformer. The system’s continuous learning process is enhanced by larger batch sizes, which more than double actionable schema recommendations and increase extracted holdings by 9.8%. TASER is trained on a large, manually labeled dataset comprising over 22,000 pages and 3,200 tables representing more than $731 billion in holdings, making it one of the first real-world financial table datasets. The release of TASERTab provides the research community with valuable resources to advance financial table extraction techniques. TASER’s agentic, schema-guided methodology shows strong potential for robust understanding and processing of complex financial tables, which is critical for accurate financial analysis and reporting. This work highlights the importance of continuous learning and schema adaptation in handling heterogeneous, multi-page financial data. Future applications may include improved financial data analytics, regulatory compliance, and automated reporting systems.
👉 Pročitaj original: arXiv AI Papers