2023, Year of Open LLMs
Hugging Face's year-in-review post surveys the major open-weight large language model releases and milestones of 2023. The piece covers the proliferation of open models from various labs and the ecosystem developments that made them accessible. It serves as a retrospective on how open-source LLMs matured and competed with proprietary systems throughout the year.
Related guides (4)
Related events (8)
Open-Source Text Generation & LLM Ecosystem at Hugging Face
Hugging Face published a blog post surveying the open-source LLM ecosystem as of mid-2023, covering text generation models, tooling, and deployment patterns available on the platform. The post highlights the breadth of open-weight models and associated infrastructure for inference and fine-tuning. It serves as a reference overview of the state of open-source LLMs at that point in time.
Welcome Llama 3 - Meta's new open LLM
Hugging Face published a blog post welcoming Meta's Llama 3 release, covering the new open-weights large language models. Llama 3 represents a significant update to Meta's open model family, with improved capabilities over Llama 2. The post covers integration and availability on the Hugging Face platform.
The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare
Hugging Face has launched the Open Medical-LLM Leaderboard, a public benchmark for evaluating large language models on healthcare and medical tasks. The leaderboard aggregates performance across multiple medical question-answering datasets to enable standardized comparison of open-weight models in clinical and biomedical domains. This initiative aims to accelerate progress in medical AI by providing transparent, reproducible evaluation infrastructure.
Introducing the Open Leaderboard for Japanese LLMs
Hugging Face has launched an open leaderboard specifically for evaluating large language models on Japanese language tasks. The leaderboard aims to provide standardized benchmarking for Japanese LLMs, filling a gap in multilingual evaluation infrastructure. This initiative supports the growing ecosystem of Japanese-language AI development and open evaluation practices.
The Open Arabic LLM Leaderboard 2
Hugging Face has launched the second version of the Open Arabic LLM Leaderboard, a benchmarking platform for evaluating large language models on Arabic language tasks. The updated leaderboard introduces revised evaluation protocols and benchmarks targeting Arabic-specific capabilities. This initiative supports the open research community in tracking progress on Arabic NLP, a historically underserved language in LLM evaluation infrastructure.
Introducing the Open FinLLM Leaderboard
Hugging Face has launched the Open FinLLM Leaderboard, a benchmarking platform specifically designed to evaluate large language models on financial domain tasks. The leaderboard aims to provide standardized, open evaluation of LLMs across finance-specific capabilities such as financial reasoning, document understanding, and numerical analysis. This fills a gap in domain-specific evaluation infrastructure for the financial sector.
What's going on with the Open LLM Leaderboard?
Hugging Face published a commentary examining anomalies and issues observed in the Open LLM Leaderboard, focusing on MMLU benchmark results. The post investigates potential data contamination, evaluation inconsistencies, and scoring discrepancies across open-weight models. It raises concerns about the reliability of MMLU as a benchmark signal and the integrity of leaderboard rankings.
Introducing the Open Arabic LLM Leaderboard
Hugging Face has launched the Open Arabic LLM Leaderboard, a benchmarking platform specifically designed to evaluate large language models on Arabic language tasks. The leaderboard aims to fill a gap in multilingual evaluation infrastructure by providing standardized assessments for Arabic NLP capabilities. This initiative supports the open-source community in tracking progress on Arabic language understanding and generation.



