Almanac
technique

RAG

techniqueactiverag-f3a6387b·4 events·first seen 1mo ago

Aliases: RAG

Co-occurring entities

More like this (12)

Recent events (4)

6arXiv · cs.CL·21d ago·source ↗

Coverage Illusion: Post-Retrieval Cascade Design Reduces LLM Augmentation Overhead in Production RAG

A case study on the Danish National Encyclopedia's RAG system evaluates five retrieval workflows across 20,000 query-workflow pairs, revealing a 'Coverage Illusion' where synthetic queries overestimate the need for LLM augmentation (90%+) versus real production traffic (27.8%). Pre-retrieval routing cannot detect this gap because augmentation necessity is only revealed after index search. A post-retrieval cascade running workflows cheapest-first and escalating to LLM augmentation only on empty results improves quality by +0.140 Composite Overall points over Always-HyDE, reduces latency by 31.8%, and eliminates LLM augmentation for 72.2% of real queries. The work highlights a structural mismatch between synthetic and real query distributions that affects RAG system design assumptions.

7Qwen Research·1mo ago·source ↗

Generalizing an LLM from 8k to 1M Context using Qwen-Agent

Alibaba's Qwen team describes an agent built on Qwen2 (8k native context) that processes documents up to 1M tokens by decomposing retrieval and reasoning tasks, reportedly outperforming both RAG pipelines and native long-context models. The agent framework was also used to generate synthetic training data for fine-tuning new long-context Qwen models, creating a self-improvement loop. This positions agent-based context extension as a practical alternative to architectural long-context training.

4Hugging Face Blog·28d ago·source ↗

CPU Optimized Embeddings with Optimum Intel and fastRAG

Hugging Face and Intel demonstrate CPU-optimized embedding inference using Optimum Intel and fastRAG, targeting RAG pipeline acceleration without GPU hardware. The post covers quantization and optimization techniques that improve embedding throughput on Intel CPUs. This is relevant to inference economics and enterprise deployment patterns where GPU availability is constrained.

5Github Trending·15d ago·source ↗

ruvnet/ruflo: Agent Meta-Harness for Claude with Multi-Agent Swarm Coordination

Ruflo is an open-source TypeScript framework positioning itself as a meta-harness for Claude-based multi-agent systems. It features adaptive memory, swarm intelligence coordination, RAG integration, and native Claude Code/Codex integration. The project has accumulated 57,231 stars with 354 added today, indicating significant community traction.