Almanac
product

DuckDB

productactiveduckdb-a7b639f8·3 events·first seen 28d ago

Aliases: DuckDB

Co-occurring entities

More like this (12)

Recent events (3)

4Hugging Face Blog·28d ago·source ↗

DuckDB Integration for Analyzing 50,000+ Datasets on Hugging Face Hub

Hugging Face announced a DuckDB integration enabling direct SQL-based analysis of over 50,000 datasets hosted on the Hub without downloading them. The integration allows users to query dataset metadata, statistics, and contents using DuckDB's in-process analytical engine. This lowers the barrier to dataset discovery and exploration at scale across the Hugging Face ecosystem.

4arXiv · cs.LG·14d ago·source ↗

MLSkip: Data skipping for ML filter predicates using Parquet metadata and neural network verification

MLSkip introduces data skipping techniques for ML-based filter predicates in databases, a problem not addressed by traditional min-max pruning methods. The approach leverages Parquet's existing min-max metadata combined with neural network verification techniques to prune non-qualifying row groups. On TPC-H and TPC-DS benchmarks with ReLU architectures, the method achieves 27.4% average pruning effectiveness for low-selectivity filters, improving to 38.31% with a proposed 2D convex hull metadata structure, yielding a 1.07× end-to-end speedup in DuckDB over PyTorch.

4Hugging Face Blog·28d ago·source ↗

Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B

This Hugging Face blog post demonstrates a Text-to-SQL pipeline combining the Hugging Face Dataset Viewer API with MotherDuck's DuckDB-NSQL-7B model, a 7-billion parameter model fine-tuned for natural language to SQL translation. The post walks through using the model to query datasets stored on Hugging Face via DuckDB. It represents a practical integration of a domain-specialized open-weights model with a data infrastructure tool.