IPO-Mine: Toolkit and Dataset for Multimodal Analysis of Long IPO Filings
Researchers introduce IPO-Mine, comprising an open-source toolkit and a large-scale dataset of over 109,000 IPO filings (1994–2026) with 76,000+ extracted images, structured for section-level analysis. The toolkit parses long regulatory documents (often exceeding 500,000 tokens) into standardized text and image outputs. Benchmark tasks on financial chart quality and misleadingness assessment reveal that state-of-the-art multimodal models frequently diverge from expert human judgments, exposing alignment gaps in long-document multimodal reasoning. The dataset and code are publicly released under CC-BY-4.0.
Related guides (3)
Related events (8)
Stanford EDGAR Filings Dataset: 152B-token open corpus of SEC filings for LLM pretraining
Stanford researchers introduce the Stanford EDGAR Filings Dataset (SEFD), an open reconstruction of SEC filings into layout-faithful MultiMarkdown, releasing a 152B-token initial snapshot with a larger 550B-token archive described. The dataset targets the growing scarcity of high-quality long-context pretraining data, with less than 0.1% overlap with Common Crawl-derived corpora. Two derived benchmarks are also introduced: EDGAR-Forecast for filing-grounded numerical forecasting and EDGAR-OCR for complex financial table transcription. The work addresses a real gap in open long-context training data outside narrow domains like code.
Opik: open-source LLM observability and evaluation platform by Comet ML
Opik is an open-source toolkit from Comet ML for debugging, evaluating, and monitoring LLM applications, RAG systems, and agentic workflows. It provides tracing, automated evaluations, and production dashboards. The project has accumulated nearly 20K GitHub stars, indicating meaningful adoption in the practitioner community.
Multi-domain benchmark for detecting AI-generated text-rich images from GPT-Image-2
Researchers introduce a new benchmark of 8,602 images across six categories (commercial posters, infographics, academic posters, receipts, tables, UI screenshots) specifically for detecting AI-generated text-rich images produced by OpenAI's GPT-Image-2. Five zero-shot detectors are evaluated, revealing highly domain-dependent performance and severe sensitivity to JPEG compression even in the strongest conventional detector. A multimodal VLM is also explored as a detector, showing promise but limitations on structured formats. The work highlights a gap in existing benchmarks that focus on object-centric rather than text-layout-centric images.
AUDITS: A Comprehensive Benchmark for Image Manipulation Localization Across Multiple Analysis Axes
Researchers introduce AUDITS (Analysis Under Domain-shifts, qualIty, Type, and Size), a benchmark of over 530K images designed to evaluate image manipulation detection across multiple axes including domain shift, manipulation type, and size. The dataset draws from user and news photos and incorporates recent diffusion-based inpaintings. Experiments assess the robustness of existing manipulation detection methods under various domain shifts, aiming to advance development of more generalizable detection approaches.
StakeBench: A Market-Commitment-Grounded Benchmark for Financial Language Understanding
StakeBench is a new evaluation framework linking 560,876 comments from 2,261 resolved prediction markets (Polymarket and Manifold) to verified trading positions, actions, and market-odds records, replacing human annotation with observable market behavior as supervision. Four diagnostic tasks test commitment detection, side identification, action anticipation, and collective odds projection, evaluated across 15 LLMs. Results reveal structural failures: models partially recover position-side signals (Directed Accuracy 0.506–0.599) but collapse on action anticipation and fail to beat naive baselines on odds projection. Notably, model scale shows no correlation with performance, and finance-domain fine-tuning does not improve revealed-side identification.
GPIC: Stanford Releases 28-Trillion-Pixel Permissively Licensed Image Corpus for Visual Generation Research
Stanford Vision Lab introduces GPIC, a Giant Permissive Image Corpus of approximately 28 trillion pixels comprising 100M training, 200K validation, and 1M test images, all permissively licensed for research and commercial use. Images are captioned by a state-of-the-art vision-language model, safety-filtered, deduplicated, and hosted on Hugging Face. The release includes a benchmarking protocol for generative modeling and a reference baseline using pixel-space flow matching. The dataset addresses a key gap in scalable visual generative modeling research by providing a large, stable, and openly licensed resource.
Manga109-v2026: Revised Benchmark Dataset for Manga OCR and Multimodal Understanding
Researchers revisit the widely-used Manga109 dataset and identify five categories of annotation issues including transcription errors, missing text regions, and under-segmented speech balloons. They construct Manga109-v2026 by combining OCR-based issue detection with manual revision, correcting approximately 29,000 dialogue annotations. The updated dataset is intended to better align with modern OCR and multimodal manga understanding systems while preserving manga-specific expressive structures.
Can the stockmarket swallow Anthropic, SpaceX and OpenAI?
The Economist examines whether public markets can absorb the potential IPOs of Anthropic, SpaceX, and OpenAI, three of the largest private companies in their respective sectors. The piece addresses valuation, liquidity, and structural questions around bringing frontier AI labs to public markets. With 368 HN points and 641 comments, the article has generated substantial community discussion. The framing reflects growing investor and analyst attention to the eventual public-market transition of major AI labs.


