4Hugging Face Blog·1mo ago

Efficient Table Pre-training without Real Data: An Introduction to TAPEX

TAPEX is a table pre-training approach that avoids reliance on real tabular data by instead training a language model to simulate SQL query execution over synthetic tables. The method achieves strong performance on table-based question answering and fact verification benchmarks. This Hugging Face blog post introduces the technique and its integration into the Hugging Face ecosystem.

Evaluation and Benchmarking Agent and Tool Ecosystem TAPEX Hugging Face SQL TableQA

Related guides (3)

Hugging Face

Hugging Face: The Home of Open-Source AI

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

5arXiv · cs.AI·9d ago·source ↗

TAHOE: Error-driven hint learning system substantially improves Text-to-SQL on Spider 2.0

TAHOE is a Text-to-SQL system that treats prompt optimization as a dynamic data management problem, building a structured Hint Bank from compiler, execution, and user feedback without updating model parameters. On the Spider 2.0-Snow benchmark using GPT-5.5, it raises pass rate from 61.95% to 79.42% and achieves 100% Snowflake syntax compliance while reducing compiler-feedback rounds from 2.79 to 0.12. The learned Hint Bank transfers to weaker models, yielding a 19.7 percentage-point gain on Doubao-2.0-lite. The approach targets the production deployment gap between Text-to-SQL prototypes and real-world database environments with strict dialects and large schemas.

Enterprise Deployment Patterns Agent and Tool Ecosystem Spider 2.0 TAHOE Snowflake +2 more

3Hugging Face Blog·1mo ago·source ↗

Training a Language Model with Hugging Face Transformers Using TensorFlow and TPUs

This Hugging Face blog post provides a technical walkthrough for training a language model using TensorFlow and Google TPUs via the Transformers library. It covers the practical setup, data pipeline, and training configuration required to leverage TPU hardware with the TF ecosystem. The post serves as a tutorial bridging Hugging Face tooling with TPU-based infrastructure.

Training Infrastructure Agent and Tool Ecosystem Google TPU Hugging Face Transformers Hugging Face +1 more

4arXiv · cs.CL·18d ago·source ↗

ODTQA-FoRe: Open-Domain Tabular QA Dataset for Future Data Forecasting and Reasoning

The paper introduces ODTQA-FoRe, a new benchmark dataset for open-domain tabular question answering focused on time-series forecasting and forecast-based reasoning using real estate data. The authors also propose TimeFore, an LLM agent framework that decomposes the task into three roles: a SQL-generating Retriever, a Forecaster that calls external time-series models, and an Analyzer that synthesizes results. The work targets a gap in existing tabular QA systems, which typically cannot perform future-oriented numerical prediction. Experiments demonstrate TimeFore's effectiveness on the new benchmark.

Evaluation and Benchmarking Agent and Tool Ecosystem SQL generation TimeFore time-series forecasting +2 more

3Hugging Face Blog·1mo ago·source ↗

Pre-Train BERT with Hugging Face Transformers and Habana Gaudi

This Hugging Face blog post from August 2022 describes how to pre-train a BERT model from scratch using the Hugging Face Transformers library on Habana Gaudi hardware accelerators. It covers the full pipeline including data preparation, tokenizer training, and masked language modeling pretraining. The post serves as both a technical tutorial and a demonstration of Habana Gaudi's viability as an alternative AI training accelerator.

Training Infrastructure Habana Gaudi Hugging Face Transformers Hugging Face +2 more

4arXiv · cs.CL·11d ago·source ↗

TABVERSE benchmark isolates table representation effects across formats in LLMs and VLMs

TABVERSE is a new controlled multimodal benchmark that evaluates LLMs and VLMs on table understanding by holding table content fixed while varying representation format (HTML, Markdown, LaTeX, rendered images). Evaluation across three tasks—Question Answering, Structural Understanding, and Structure Reconstruction—shows that representation choice substantially affects performance, with structured text generally outperforming rendered images and HTML being the most robust text format. The benchmark addresses a gap in existing evaluations where content, format, and modality vary simultaneously, making it impossible to isolate representation effects.

Evaluation and Benchmarking Multimodal Progress TABVERSE

5arXiv · cs.AI·1mo ago·source ↗

Distilling Tabular Foundation Models for Structured Health Data

This paper investigates knowledge distillation from tabular foundation models (TFMs) to lightweight student models for healthcare applications. The authors address context leakage in in-context TFMs via stratified out-of-fold teacher labeling, evaluating across 19 healthcare datasets, 6 TFM teachers, and 4 student families. Distilled students retain at least 90% of teacher AUC while running 26× faster on CPU, with preserved calibration and fairness properties. Multi-teacher ensembles do not consistently outperform the best single teacher.

Evaluation and Benchmarking Inference Economics knowledge distillation Stratified Out-of-Fold Teacher Labeling AUC +2 more

6The Batch·19d ago·source ↗

Test-Time Training End-to-End (TTT-E2E) Retrains Model Weights to Handle Long Inputs

Researchers from Astera Institute, Nvidia, Stanford, UC Berkeley, and UC San Diego introduced TTT-E2E, a method that compresses long context into transformer weights by training the model during inference via meta-learning. The approach uses sliding-window attention restricted to 8,000 tokens and updates only the fully connected layers of the last quarter of the network on each 1,000-token chunk at inference time, keeping per-token generation latency roughly constant as context scales to 128,000 tokens. TTT-E2E slightly outperforms vanilla transformers on next-token prediction loss across long contexts and matches efficient architectures like Mamba 2 and Gated DeltaNet on inference speed, but fails dramatically on Needle-in-a-Haystack retrieval beyond 8,000 tokens and incurs substantially higher training latency. The work reframes long-context handling as a training-inference trade-off rather than an architectural design problem.

Training Infrastructure Long Context Evolution University of California San Diego Mamba Stanford University +13 more

4Hugging Face Blog·1mo ago·source ↗

Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B

This Hugging Face blog post demonstrates a Text-to-SQL pipeline combining the Hugging Face Dataset Viewer API with MotherDuck's DuckDB-NSQL-7B model, a 7-billion parameter model fine-tuned for natural language to SQL translation. The post walks through using the model to query datasets stored on Hugging Face via DuckDB. It represents a practical integration of a domain-specialized open-weights model with a data infrastructure tool.

Enterprise Deployment Patterns Agent and Tool Ecosystem DuckDB Hugging Face DuckDB-NSQL-7B +2 more