Almanac
model

Llama

modelactivellama-bbf40db3·15 events·first seen 1mo ago

Aliases: Llama, Llama 4, Llama-4

Co-occurring entities

More like this (12)

Recent events (15)

5Hugging Face Blog·28d ago·source ↗

StackLLaMA: A hands-on guide to train LLaMA with RLHF

Hugging Face published a detailed tutorial demonstrating how to fine-tune Meta's LLaMA model using Reinforcement Learning from Human Feedback (RLHF) on StackExchange data. The guide covers the full pipeline: supervised fine-tuning, reward model training, and PPO-based RL optimization. It serves as a practical reference for practitioners seeking to replicate RLHF workflows on open-weight models using the TRL library.

5Hugging Face Blog·28d ago·source ↗

2023, Year of Open LLMs

Hugging Face's year-in-review post surveys the major open-weight large language model releases and milestones of 2023. The piece covers the proliferation of open models from various labs and the ecosystem developments that made them accessible. It serves as a retrospective on how open-source LLMs matured and competed with proprietary systems throughout the year.

5arXiv · cs.CL·1h ago·source ↗

Study of security and privacy prompts in the wild reveals LLM response quality gaps and inconsistency

Researchers analyzed 14,727 security and privacy (S&P) prompts drawn from WildChat's 3.2M real user-LLM conversations, categorizing them into nine topic areas and evaluating response quality across 270 advice-seeking prompts. Commercial models substantially outperformed open-weight models (GPT achieving 98% 'good enough' responses vs. Llama 4 at 47%), but even high-performing commercial models showed inconsistent responses across repeated runs of the same prompt. The study is the first to analyze real user S&P queries to LLMs rather than expert-authored test sets, surfacing both a capability gap and a reliability concern.

6arXiv · cs.CL·7d ago·source ↗

The Shibboleth Effect: Cross-lingual behavioral skew in frontier LLMs under adversarial geopolitical simulation

Researchers introduce the 'Shibboleth Effect' — systematic behavioral differences in LLMs when operating in different languages — and audit six frontier models (GPT-4o, Llama-4, Mistral-Large, Gemini-3.1-Pro, Qwen3.6-Plus, DeepSeek-R1) using a synthetic maritime territorial dispute wargame played in English versus Turkish. Results are heterogeneous: Llama-4 becomes significantly more coercive in Turkish while Gemini-3.1-Pro and DeepSeek-R1 become less so, and GPT-4o shows no detectable shift. The study identifies two candidate buffering mechanisms — chain-of-thought institutional anchoring and multilingual RLHF alignment — with direct implications for deploying LLMs in diplomatic or crisis-management contexts.

7Anthropic News·15d ago·source ↗

Anthropic Publishes Political Even-Handedness Evaluation for Claude, Open-Sources Methodology

Anthropic has released a detailed account of how it trains and evaluates Claude for political even-handedness, including character traits instilled via reinforcement learning since early 2024 and a new automated evaluation methodology. The evaluation tests thousands of prompts across hundreds of political stances and benchmarks Claude Sonnet 4.5 against GPT-5, Llama 4, Grok 4, and Gemini 2.5 Pro, finding Claude comparable to Grok 4 and Gemini 2.5 Pro and more even-handed than GPT-5 and Llama 4. Anthropic is open-sourcing the evaluation framework to encourage shared industry standards for measuring political bias. The post also discloses the specific system prompt language used on Claude.ai to enforce even-handed behavior.

5arXiv · cs.CL·1h ago·source ↗

Study identifies 'synthetic lived experience paradox' in peer-like AI caregiver support

Researchers examine how LLMs prompted to sound peer-like generate language implying lived experience they cannot authentically possess, studying this in the context of family caregivers of Alzheimer's/ADRD patients. Using caregiver support exchanges from online communities and responses from LLaMA, GPT-4o-mini, and MedGemma, the study finds a 'narrative authenticity gap': AI captures emotional work of peer support but can fabricate experiential grounding. Psycholinguistic analysis shows human peers use significantly more first-person and past-focused language than AI. The authors argue caregiver-support AI needs mechanisms to distinguish supportive framing from fabricated lived experience.

7Meta Ai Blog·1mo ago·source ↗

Meta Announces Four MTIA AI Chip Generations in Two Years: MTIA 300–500 Roadmap

Meta has detailed a rapid four-generation MTIA chip roadmap (300, 400, 450, 500) developed in partnership with Broadcom, spanning ranking/recommendation inference and training through general GenAI workloads. Key advances include a 4.5x HBM bandwidth increase and 25x compute FLOPS improvement from MTIA 300 to 500, with MTIA 450 and 500 targeting GenAI inference with doubled and further-increased HBM bandwidth versus leading commercial products. MTIA 300 is in production for R&R training, MTIA 400 is lab-tested and entering deployment, while MTIA 450 and 500 are scheduled for mass deployment in early 2027 and 2027 respectively. The strategy emphasizes modular chiplet design and short iteration cycles to keep hardware aligned with rapidly evolving AI model requirements.

5Hugging Face Blog·28d ago·source ↗

NVIDIA Llama Nemotron Nano VLM Released on Hugging Face Hub

NVIDIA has released the Llama Nemotron Nano VLM on Hugging Face Hub, a compact vision-language model built on the Llama architecture. The model is part of NVIDIA's Nemotron family targeting efficient multimodal inference. This release makes the model accessible to the broader research and developer community through Hugging Face's model hosting infrastructure.

6Hugging Face Blog·28d ago·source ↗

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

Hugging Face published a blog post detailing the integration of 4-bit quantization via bitsandbytes into the Transformers library, enabling large language models to run on consumer-grade hardware. The post covers NF4 (NormalFloat4) data type and double quantization techniques from the QLoRA paper, which together reduce memory footprint significantly while preserving model quality. It demonstrates how users can load models like LLaMA in 4-bit precision and fine-tune them using QLoRA with minimal code changes.

5arXiv · cs.CL·13d ago·source ↗

Knowledge editing via locate-then-edit transferred to masked diffusion language models, revealing multi-token failure mode

A new arXiv paper investigates whether locate-then-edit knowledge editing methods, developed for autoregressive models, transfer to masked diffusion language models (MDMs) such as LLaDA and Dream. The authors find that causal tracing identifies the same early-to-mid-layer MLP location in both paradigms, but MDMs degrade systematically on multi-token edits due to partially unmasked intermediate states that the edit was never optimized for. A correction targeting these intermediate states substantially restores multi-token editing performance. The work is the first systematic comparison of knowledge editing across autoregressive and diffusion-based language model paradigms.

6Meta Llama·7d ago·source ↗

Meta releases Llama Guard 4 12B multimodal safety classifier on Hugging Face

Meta released Llama Guard 4 12B, a multimodal (image-text-to-text) safety classification model built on the Llama 4 architecture, published to Hugging Face. The model is designed for conversational safety filtering and supports both text and image inputs. With 143K downloads and 102 likes shortly after release, it is seeing meaningful early adoption.

5Github Trending·24d ago·source ↗

OpenPipe ART: Agent Reinforcement Trainer for Multi-Step Agents via GRPO

OpenPipe has released ART (Agent Reinforcement Trainer), an open-source Python library for training multi-step agents on real-world tasks using GRPO (Group Relative Policy Optimization). The framework supports multiple model families including Qwen3, GPT-OSS, and Llama. With nearly 10k GitHub stars and 66 gained today, it is gaining notable community traction as a practical RL fine-tuning tool for agentic workflows.

6The Batch·14d ago·source ↗

The Batch Issue 346: Nvidia Nemotron Super 120B, OpenAI-Amazon Deal, Regulatory Commentary

The Batch's weekly digest covers Nvidia's release of Nemotron 3 Super 120B-A12B, an open-weights hybrid mamba-2/transformer/MoE model with 1M token context trained on 25 trillion tokens, positioned as a speed leader in its size class for agentic applications. The issue also touches on OpenAI's Amazon deal and Grok video pricing cuts. Editor Andrew Ng's letter addresses the White House's proposed federal AI preemption framework and critiques what he characterizes as coordinated anti-AI messaging campaigns. Multiple significant industry developments are bundled in a single newsletter digest.

7The Batch·14d ago·source ↗

Nvidia releases Nemotron 3 Super 120B-A12B open-weights model with hybrid Mamba-2/MoE architecture

Nvidia released Nemotron 3 Super 120B-A12B, an open-weights LLM with a hybrid Mamba-2/transformer/MoE architecture that activates only 12B parameters per token and supports up to 1 million token context. The model claims the fastest inference speed in its size class at 442 tokens/second and leads open-weights models on PinchBench agentic task evaluation, outperforming larger models including Kimi K2.5 (1T parameters). Nvidia is releasing weights, training data, and recipes under a permissive commercial license, and plans a $26B five-year investment in open-weights models — framed partly as a strategic response to Chinese labs building capable open-weights models on non-Nvidia hardware.

5Github Trending·4d ago·source ↗

ms-swift: ModelScope framework for fine-tuning 600+ LLMs and 300+ MLLMs

ms-swift is an open-source Python framework from ModelScope supporting PEFT and full-parameter fine-tuning methods (CPT, SFT, DPO, GRPO) across 600+ LLMs and 300+ multimodal LLMs, including Qwen3, DeepSeek, Llama4, and others. The project has accumulated 14,487 GitHub stars and was accepted at AAAI 2025. It serves as a broad-coverage training harness for the current generation of open-weights frontier models.