Almanac
← Events
5arXiv cs.AI (Artificial Intelligence)·9d ago

TAHOE: Error-driven hint learning system substantially improves Text-to-SQL on Spider 2.0

TAHOE is a Text-to-SQL system that treats prompt optimization as a dynamic data management problem, building a structured Hint Bank from compiler, execution, and user feedback without updating model parameters. On the Spider 2.0-Snow benchmark using GPT-5.5, it raises pass rate from 61.95% to 79.42% and achieves 100% Snowflake syntax compliance while reducing compiler-feedback rounds from 2.79 to 0.12. The learned Hint Bank transfers to weaker models, yielding a 19.7 percentage-point gain on Doubao-2.0-lite. The approach targets the production deployment gap between Text-to-SQL prototypes and real-world database environments with strict dialects and large schemas.

Related guides (3)

Related events (8)

4Hugging Face Blog·1mo ago·source ↗

Efficient Table Pre-training without Real Data: An Introduction to TAPEX

TAPEX is a table pre-training approach that avoids reliance on real tabular data by instead training a language model to simulate SQL query execution over synthetic tables. The method achieves strong performance on table-based question answering and fact verification benchmarks. This Hugging Face blog post introduces the technique and its integration into the Hugging Face ecosystem.

4Hugging Face Blog·1mo ago·source ↗

Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B

This Hugging Face blog post demonstrates a Text-to-SQL pipeline combining the Hugging Face Dataset Viewer API with MotherDuck's DuckDB-NSQL-7B model, a 7-billion parameter model fine-tuned for natural language to SQL translation. The post walks through using the model to query datasets stored on Hugging Face via DuckDB. It represents a practical integration of a domain-specialized open-weights model with a data infrastructure tool.

6arXiv · cs.LG·22d ago·source ↗

HullFT: Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching

HullFT is a new method for test-time finetuning (TTFT) of language models that addresses the dual bottlenecks of retrieval quality and per-query finetuning cost. It represents query embeddings as sparse convex combinations of training sequences using Frank-Wolfe optimization, yielding diverse and relevant support sets without expensive diversity-aware search. A geometric integerization step converts fractional weights into integer multiplicities, enabling a Gradient Reuse scheme that amortizes forward-backward computation across repeated examples. Experiments show improved quality-efficiency tradeoffs over prior TTFT methods, measured in bits-per-byte at lower total runtime.

6Hugging Face Blog·1mo ago·source ↗

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

TII UAE has released Falcon-H1, a new family of hybrid-head language models combining attention and state-space mechanisms to improve efficiency and performance. The models are published on Hugging Face and represent TII's latest iteration in the Falcon series. The hybrid architecture targets better inference economics and competitive benchmark results relative to model size.

3Github Trending·28d ago·source ↗

prompt-optimizer: Open-Source TypeScript Prompt Optimization Tool

prompt-optimizer is an open-source TypeScript tool designed to help users write better prompts and improve AI outputs. The repository has accumulated 29,603 total stars with 76 new stars today, indicating sustained community interest. It represents a category of tooling focused on prompt engineering automation and optimization.

5Hugging Face Blog·1mo ago·source ↗

Faster Assisted Generation with Dynamic Speculation

Hugging Face introduces dynamic speculation lookahead for assisted (speculative) decoding, a technique that adaptively adjusts the number of candidate tokens generated by a draft model before verification by the main model. This approach aims to improve throughput and reduce latency compared to fixed-lookahead speculative decoding by tuning the speculation depth at runtime. The blog post describes the method and its integration into the Hugging Face Transformers library.

5Hugging Face Blog·1mo ago·source ↗

Faster Text Generation with Self-Speculative Decoding via LayerSkip

This Hugging Face blog post covers LayerSkip, a self-speculative decoding technique that accelerates text generation by using early exit from transformer layers to draft tokens, then verifying them with the full model. Unlike standard speculative decoding, LayerSkip requires no separate draft model, reducing memory overhead while still achieving inference speedups. The post likely covers integration with the Hugging Face ecosystem and practical performance benchmarks.

5arXiv · cs.CL·11d ago·source ↗

DocTrace: Structure-Aware On-Demand Hypergraph Memory for Long-Document QA

Researchers introduce DocTrace, a multi-agent RAG framework for long-document question answering that uses query-triggered knowledge organization rather than costly query-agnostic preprocessing. The system combines a lightweight document structural tree index, on-demand hypergraph working memory, and a graph-structured experience memory that stores successful reasoning plans for reuse. Evaluated on four long-document QA datasets, DocTrace outperforms the strongest baseline (ComoRAG) by up to 8.85% F1 and 4.40% EM while reducing computational cost by 53.32%.