DuckDB
duckdb-a7b639f8·3 events·first seen 28d agoAliases: DuckDB
Co-occurring entities
More like this (12)
Recent events (3)
DuckDB Integration for Analyzing 50,000+ Datasets on Hugging Face Hub
Hugging Face announced a DuckDB integration enabling direct SQL-based analysis of over 50,000 datasets hosted on the Hub without downloading them. The integration allows users to query dataset metadata, statistics, and contents using DuckDB's in-process analytical engine. This lowers the barrier to dataset discovery and exploration at scale across the Hugging Face ecosystem.
MLSkip: Data skipping for ML filter predicates using Parquet metadata and neural network verification
MLSkip introduces data skipping techniques for ML-based filter predicates in databases, a problem not addressed by traditional min-max pruning methods. The approach leverages Parquet's existing min-max metadata combined with neural network verification techniques to prune non-qualifying row groups. On TPC-H and TPC-DS benchmarks with ReLU architectures, the method achieves 27.4% average pruning effectiveness for low-selectivity filters, improving to 38.31% with a proposed 2D convex hull metadata structure, yielding a 1.07× end-to-end speedup in DuckDB over PyTorch.
Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B
This Hugging Face blog post demonstrates a Text-to-SQL pipeline combining the Hugging Face Dataset Viewer API with MotherDuck's DuckDB-NSQL-7B model, a 7-billion parameter model fine-tuned for natural language to SQL translation. The post walks through using the model to query datasets stored on Hugging Face via DuckDB. It represents a practical integration of a domain-specialized open-weights model with a data infrastructure tool.