5arXiv cs.LG (Machine Learning)·3d ago

LOCUS: A large-scale corpus of U.S. local ordinances for legal AI research

Researchers introduce LOCUS, a comprehensive machine-readable corpus of U.S. municipal and county ordinance codes covering 9,239 jurisdictions, with a county-harmonized access layer spanning 2,309 of 3,144 U.S. counties. The corpus was assembled using OCR to handle diverse document formats previously locked in vendor platforms, and is released on HuggingFace alongside ModernBERT-based classifiers for analyzing local law along dimensions like opacity and paternalism. The work addresses a significant gap in legal AI training data, as local ordinances govern large swaths of everyday regulation but have been absent from existing corpora.

Evaluation and Benchmarking ModernBERT LocalLaws LOCUS

Related guides (1)

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

5Hugging Face Blog·1mo ago·source ↗

Constitutional AI with Open LLMs

This Hugging Face blog post explores implementing Constitutional AI (CAI) techniques using open-weight language models. The post likely covers how to replicate Anthropic's CAI alignment methodology—using a set of principles to guide model self-critique and revision—without relying on proprietary systems. It represents a practical contribution to democratizing alignment research tooling.

Open Weights Progress AI Safety Research Constitutional AI Hugging Face Anthropic +1 more

4Hugging Face Blog·1mo ago·source ↗

Open-Source Text Generation & LLM Ecosystem at Hugging Face

Hugging Face published a blog post surveying the open-source LLM ecosystem as of mid-2023, covering text generation models, tooling, and deployment patterns available on the platform. The post highlights the breadth of open-weight models and associated infrastructure for inference and fine-tuning. It serves as a reference overview of the state of open-source LLMs at that point in time.

Open Weights Progress Inference Economics Hugging Face +1 more

6Hugging Face Blog·1mo ago·source ↗

StarCoder: A State-of-the-Art LLM for Code

Hugging Face and ServiceNow released StarCoder, a large language model for code trained on permissively licensed data from The Stack dataset. The model targets code generation, completion, and understanding tasks and is positioned as an open-weights alternative to proprietary code models. The release includes model weights, training details, and an associated technical report.

Open Weights Progress Agent and Tool Ecosystem ServiceNow AI BigCode The Stack v2 +2 more

8Openai Blog·1mo ago·source ↗

Evaluating Large Language Models Trained on Code

OpenAI published research on evaluating large language models trained on code, introducing the Codex model and the HumanEval benchmark for assessing code generation capabilities. The work established foundational methodology for measuring functional correctness of code produced by LLMs using a pass@k metric. This paper became a landmark reference for code-focused LLM evaluation and influenced subsequent code generation research across the field.

Frontier Model Releases Evaluation and Benchmarking GPT-3 pass@k OpenAI +3 more

6arXiv · cs.CL·5d ago·source ↗

LOGOS: A unified autoregressive foundation model for natural science tasks across domains

Researchers introduce LOGOS (Language Of Generative Objects in Science), a generative language model that encodes heterogeneous scientific objects and spatial interactions as discrete token sequences within a single autoregressive framework, avoiding explicit coordinates or geometric neural networks. Models are trained at 1B, 3B, and 8B parameter scales and consistently match or outperform domain-specific baselines across diverse scientific tasks. The work argues that AI for Science should converge on shared architectures and training paradigms with LLMs rather than maintaining a separate technical stack. Model weights are released publicly.

Frontier Model Releases Open Weights Progress Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences LOGOS

5arXiv · cs.LG·4d ago·source ↗

ReproRepo: Scalable LLM agent framework for reproducibility auditing using GitHub issues

ReproRepo is a new framework for evaluating LLM agents on reproducibility auditing of ML research, using naturally occurring GitHub issues as supervision signals rather than costly manual curation. The framework is instantiated on 1,149 recent ML papers from major conferences and benchmarks four frontier model-agent configurations. The best-performing agent (Codex with GPT-5.5) surfaces at least one semantically related human-reported reproduction blocker for ~90% of papers, though exact localization of issues remains a weakness. The work provides a reusable, scalable evaluation harness for this underexplored agentic task.

Evaluation and Benchmarking Agent and Tool Ecosystem OpenAI ReproRepo Codex +1 more

3Github Trending·37h ago·source ↗

OpenKB: Open-source LLM knowledge base library gains traction on GitHub

VectifyAI has released OpenKB, an open-source Python library for building LLM-powered knowledge bases. The repository is trending on GitHub with 2,389 total stars and 208 new stars in a single day, suggesting meaningful community interest. No detailed technical description is available from the source snippet.

Agent and Tool Ecosystem OpenKB VectifyAI

8Hugging Face Blog·1mo ago·source ↗

GGML and llama.cpp Join Hugging Face to Ensure Long-Term Progress of Local AI

GGML and llama.cpp, the foundational open-source libraries enabling efficient local inference of large language models, are joining Hugging Face. This move is intended to secure long-term development and sustainability of the projects that underpin much of the local/on-device AI ecosystem. The acquisition or integration represents a significant consolidation of key open-weights inference infrastructure under the Hugging Face umbrella.

Open Weights Progress Inference Economics Georgi Gerganov llama.cpp Hugging Face +2 more