RAG
rag-f3a6387b·4 events·first seen 1mo agoAliases: RAG
Co-occurring entities
More like this (12)
Recent events (4)
Coverage Illusion: Post-Retrieval Cascade Design Reduces LLM Augmentation Overhead in Production RAG
A case study on the Danish National Encyclopedia's RAG system evaluates five retrieval workflows across 20,000 query-workflow pairs, revealing a 'Coverage Illusion' where synthetic queries overestimate the need for LLM augmentation (90%+) versus real production traffic (27.8%). Pre-retrieval routing cannot detect this gap because augmentation necessity is only revealed after index search. A post-retrieval cascade running workflows cheapest-first and escalating to LLM augmentation only on empty results improves quality by +0.140 Composite Overall points over Always-HyDE, reduces latency by 31.8%, and eliminates LLM augmentation for 72.2% of real queries. The work highlights a structural mismatch between synthetic and real query distributions that affects RAG system design assumptions.
Generalizing an LLM from 8k to 1M Context using Qwen-Agent
Alibaba's Qwen team describes an agent built on Qwen2 (8k native context) that processes documents up to 1M tokens by decomposing retrieval and reasoning tasks, reportedly outperforming both RAG pipelines and native long-context models. The agent framework was also used to generate synthetic training data for fine-tuning new long-context Qwen models, creating a self-improvement loop. This positions agent-based context extension as a practical alternative to architectural long-context training.
CPU Optimized Embeddings with Optimum Intel and fastRAG
Hugging Face and Intel demonstrate CPU-optimized embedding inference using Optimum Intel and fastRAG, targeting RAG pipeline acceleration without GPU hardware. The post covers quantization and optimization techniques that improve embedding throughput on Intel CPUs. This is relevant to inference economics and enterprise deployment patterns where GPU availability is constrained.
ruvnet/ruflo: Agent Meta-Harness for Claude with Multi-Agent Swarm Coordination
Ruflo is an open-source TypeScript framework positioning itself as a meta-harness for Claude-based multi-agent systems. It features adaptive memory, swarm intelligence coordination, RAG integration, and native Claude Code/Codex integration. The project has accumulated 57,231 stars with 354 added today, indicating significant community traction.