Almanac
← Events
6Anthropic News·17d ago

Anthropic introduces Contextual Retrieval to reduce RAG retrieval failures by up to 67%

Anthropic published a technical method called Contextual Retrieval that combines Contextual Embeddings and Contextual BM25 to address the context-loss problem in traditional RAG pipelines. The approach prepends chunk-level context before encoding, reducing failed retrievals by 49% standalone and 67% when combined with reranking. The post also highlights prompt caching as a simpler alternative for knowledge bases under 200K tokens, and provides a cookbook for deployment with Claude.

Related guides (4)

Related events (8)

4arXiv · cs.CL·8d ago·source ↗

UMG-RAG: Training-free hybrid retrieval with uncertainty-aware granularity fusion for long-document RAG

Researchers propose Uncertainty-aware Multi-Granularity RAG (UMG-RAG), a training-free hybrid retrieval framework that addresses the tension between large and fine-grained retrieval chunks in RAG pipelines. The system converts dense and sparse retriever scores across multiple chunk granularities into evidence distributions, estimates reliability via entropy, and fuses candidates using query-specific confidence signals. A variant called UMGP-RAG uses fine-grained hits to locate evidence while returning broader parent chunks for coherence. Experiments on QA benchmarks show improved generation quality with no changes to the underlying retriever or generator.

4Anthropic News·19d ago·source ↗

Anthropic Publishes Quantitative Case Study on Prompt Engineering for Long-Context Recall

Anthropic shares a quantitative case study evaluating prompting techniques to improve Claude's recall over 75,000–90,000 token contexts. Two techniques are tested: extracting reference quotes before answering, and providing few-shot examples of correctly answered questions. The study uses Claude Instant 1.2 on a government document dataset constructed via a 'randomized collage' method, with multiple-choice Q&A pairs generated by Claude itself. Results show measurable recall improvements over a baseline prompt, with methodology and notebooks shared publicly.

4arXiv · cs.CL·12d ago·source ↗

HKVM-RAG: Hypergraph key-value separation improves multi-hop retrieval-augmented generation

A new arXiv preprint introduces HKVM-RAG, an evidence-organization layer for multi-hop RAG that uses weighted hyperedges as retrieval keys while retaining passage text as answer values. Under a fixed-substrate protocol controlling for tuple cache, reader, and evaluation budget, the hypergraph key-value approach improves over KG-PPR by +3.4 F1 on 2WikiMultiHopQA and +3.6 F1 on MuSiQue. A dense-aware controller combining frozen ColBERTv2 with HKVM features reaches 88.8, 65.1, and 85.8 F1 on three benchmarks, outperforming ColBERTv2 alone by 5–11 F1 points. The work positions hypergraph organization as a reusable evidence-control mechanism rather than a dense-retrieval replacement.

6arXiv · cs.CL·24d ago·source ↗

Coverage Illusion: Post-Retrieval Cascade Design Reduces LLM Augmentation Overhead in Production RAG

A case study on the Danish National Encyclopedia's RAG system evaluates five retrieval workflows across 20,000 query-workflow pairs, revealing a 'Coverage Illusion' where synthetic queries overestimate the need for LLM augmentation (90%+) versus real production traffic (27.8%). Pre-retrieval routing cannot detect this gap because augmentation necessity is only revealed after index search. A post-retrieval cascade running workflows cheapest-first and escalating to LLM augmentation only on empty results improves quality by +0.140 Composite Overall points over Always-HyDE, reduces latency by 31.8%, and eliminates LLM augmentation for 72.2% of real queries. The work highlights a structural mismatch between synthetic and real query distributions that affects RAG system design assumptions.

4Github Trending·34h ago·source ↗

HippoRAG: RAG framework combining knowledge graphs and Personalized PageRank for continuous knowledge integration

HippoRAG is an open-source RAG framework published at NeurIPS 2024 by the OSU NLP Group that draws on models of human long-term memory to enable LLMs to continuously integrate knowledge across external documents. It combines retrieval-augmented generation with knowledge graphs and Personalized PageRank to improve multi-hop and associative retrieval. The repository has accumulated 3,742 GitHub stars with ongoing community traction.

5arXiv · cs.CL·4d ago·source ↗

ContextRL: Context-aware reinforcement learning improves grounding in agentic and multimodal LLMs

Researchers introduce ContextRL, a reinforcement learning method that trains LLMs to select the context that supports a given query-answer pair from two highly similar candidates, rather than supervising only final answers. The approach constructs contrastive context pairs in two domains: coding agent trajectories (1k pairs) and multimodal image pairs (7k pairs). ContextRL achieves +2.2% average gains over standard GRPO on 5 long-horizon benchmarks and +1.8% across 12 visual QA benchmarks, with ablations showing the gains stem from the context-selection objective rather than the contrastive data alone.

8Anthropic News·17d ago·source ↗

Anthropic expands Claude context window from 9K to 100K tokens

Anthropic announced a roughly 10x expansion of Claude's context window, from 9K to 100K tokens (~75,000 words), available via API. The capability enables processing of hundreds of pages of documents, full codebases, or hours of transcribed audio in under a minute. Anthropic positions this as superior to vector search for complex multi-document synthesis tasks, and partner AssemblyAI demonstrated the feature on a 58K-word podcast transcript.

6Anthropic News·16d ago·source ↗

Anthropic enables fine-tuning of Claude 3 Haiku via Amazon Bedrock

Anthropic announced that Claude 3 Haiku can now be fine-tuned through Amazon Bedrock using custom prompt-completion pairs, with general availability reached November 1, 2024. The capability targets specialized business workflows, with Anthropic citing a case study showing classification accuracy improvement from 81.5% to 99.6% and 85% token reduction on a content moderation task. Early enterprise adopters include SK Telecom and Thomson Reuters, both reporting measurable performance gains. Fine-tuning is available in the US West (Oregon) region with text support up to 32K context, with vision fine-tuning planned.