Almanac
protocol

CC-BY-4.0

protocolactiveprovisionalcc-by-4-0-653663c1·1 events·first seen 20d ago

Aliases: CC-BY-4.0

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·20d ago·source ↗

IPO-Mine: Toolkit and Dataset for Multimodal Analysis of Long IPO Filings

Researchers introduce IPO-Mine, comprising an open-source toolkit and a large-scale dataset of over 109,000 IPO filings (1994–2026) with 76,000+ extracted images, structured for section-level analysis. The toolkit parses long regulatory documents (often exceeding 500,000 tokens) into standardized text and image outputs. Benchmark tasks on financial chart quality and misleadingness assessment reveal that state-of-the-art multimodal models frequently diverge from expert human judgments, exposing alignment gaps in long-document multimodal reasoning. The dataset and code are publicly released under CC-BY-4.0.