Entity · model

Gemma 2 9B

modelactivegemma-2-9b-35821079·8 events·first seen Jun 1, 2026

Aliases: Gemma 2 9B, Gemma 2 27B, Gemma 2 27B IT, Gemma 3-27B-IT, gemma-2-9b-it, Gemma-2-9B, Gemma2 9B

Co-occurring entities

More like this (12)

Gemma 3 12B Instruct Gemma 2 Gemma-3-4B-IT Gemma 3 270M Gemma-4 E4B-it Gemma 3 Gemma3-270M Gemma 3n Gemma 4 Gemma Scope 2 Gemma T5Gemma

Recent events (8)

5arXiv · cs.CL·3d ago·source ↗

Closed-loop validation-repair achieves 99% schema compliance for clinical LLMs across healthcare standards

A new arXiv paper evaluates three open-source models (Qwen2.5 7B, Llama 3.1 8B, Gemma2 9B) on schema compliance with ICD-10, CPT, and HL7 FHIR standards across 960 clinical scenario-model pairs. Baseline compliance ranged from 85.9–91.6%, with 96% of failures being representation-level format violations rather than clinical reasoning errors. A closed-loop validation-repair framework raised overall compliance to 99.0%, with most errors resolving in one or two iterations, suggesting this system-level approach is a viable safeguard for healthcare EHR integration.

Evaluation and Benchmarking Enterprise Deployment Patterns HL7 FHIR R4 Gemma 2 9B Qwen2.5-7B +3 more

5arXiv · cs.CL·Jul 22, 2026·source ↗

MaLoRA and MaRA: Selective state-space adapters improve multi-hop reasoning over LoRA

A new arXiv preprint proposes two adapter families — MaLoRA (token-level dynamic scaling via Mamba recurrence) and MaRA (context-level segment retrieval via cross-segment state tracking) — as improvements over standard LoRA for language model reasoning. Evaluated on three frozen backbones (Qwen-2.5-7B, Llama-3.1-8B, Gemma-2-9B) and two multi-hop QA benchmarks (MuSiQue, 2WikiMultihopQA), the methods yield average gains of +6.8 F1 (+10.5% relative) over LoRA, with up to +18.2% relative improvement on the hardest configuration. Token-level gains also transfer to RULER QA-2 under length-stress conditions.

Long Context Evolution Evaluation and Benchmarking MaRA Gemma 2 9B MaLoRA +5 more

4arXiv · cs.CL·Jul 14, 2026·source ↗

Token probability measurements reveal production-perception asymmetry in LLMs

A new arXiv preprint investigates whether LLMs exhibit a functional analog to the psycholinguistic production-perception distinction, using direct token probability measurements rather than metalinguistic prompting. Using Llama-3.1-8B and four other open-weight models, the authors find that production-perception prompt distances consistently exceed production-production distances by a ratio of ~1.8, with near-ceiling correlations in the production-production control confirming the effect is specific to communicative framing. The effect replicates across five models spanning base and instruction-tuned variants, and temporal analysis shows perception prompts exert strongest influence at sequence beginnings. The findings suggest prompt framing alone induces a production-perception distinction in decoder-only architectures.

Evaluation and Benchmarking Gemma 2 9B Qwen2.5-7B-Instruct-1M Mistral 7B Instruct v0.2 +2 more

5The Batch·Jun 1, 2026·source ↗

Persona Generators: Evolutionary LLM Method for Diverse Synthetic Human Personas

Google researchers Davide Paglieri, Logan Cross, and colleagues propose Persona Generators, a system that uses the AlphaEvolve evolutionary algorithm to generate code that produces 25 diverse persona prompts covering a broad range of attitudes and opinions. The method iteratively optimizes persona prompt diversity using six metrics, outperforming Nemotron Personas (82% vs 76% coverage of possible responses) and a Concordia memory-based baseline (46%). The system uses Gemini 2.5 Pro for questionnaire generation and Gemma 3-27B-IT for persona simulation via the Concordia agent library. The approach reframes persona generation as a coverage optimization problem rather than a data-matching one, enabling more representative synthetic user populations for product research.

Evaluation and Benchmarking Agent and Tool Ecosystem Gemma 2 9B Persona Generators Davide Paglieri +6 more

7Mistral Ai News·Jun 1, 2026·source ↗

Mistral Small 3: 24B Latency-Optimized Open-Weight Model Released Under Apache 2.0

Mistral AI has released Mistral Small 3, a 24B-parameter instruction-tuned model optimized for low latency, achieving over 81% on MMLU at 150 tokens/s on a single GPU. The model is competitive with Llama 3.3 70B and Qwen 32B while being more than 3x faster on equivalent hardware, and is released under Apache 2.0 for both pretrained and instruction-tuned checkpoints. It is explicitly not trained with RL or synthetic data, positioning it as a base model for community fine-tuning and reasoning capability development. Deployment targets include local inference on consumer hardware (RTX 4090, MacBook 32GB RAM), agentic function calling, and domain-specific fine-tuning.

Frontier Model Releases Open Weights Progress Mistral AI Mistral Small 4 Ollama +12 more

7Mistral Ai News·Jun 1, 2026·source ↗

Mistral NeMo: 12B Open-Weights Model with 128k Context, Built with NVIDIA

Mistral AI and NVIDIA jointly release Mistral NeMo, a 12B parameter model under Apache 2.0 license featuring a 128k token context window and a new tokenizer called Tekken based on Tiktoken. The model is designed as a drop-in replacement for Mistral 7B, supports multilingual applications across 11+ languages, and was trained with quantization awareness enabling FP8 inference without performance loss. Benchmark comparisons show competitive performance against Gemma 2 9B and Llama 3 8B. Weights are available on HuggingFace and the model is also packaged as an NVIDIA NIM inference microservice.

Long Context Evolution Frontier Model Releases Mistral AI Gemma 2 9B Apache 2.0 +9 more

6The Batch·Jun 1, 2026·source ↗

Activation Capping Technique Stabilizes LLM Assistant Personas Against Drift and Jailbreaks

Researchers from MATS, Oxford, and Anthropic introduced the 'assistant axis,' a vector derived from LLM layer outputs that quantifies how closely a model adheres to its trained assistant persona. They developed 'activation capping,' an inference-time method that corrects deviations from this axis when similarity falls below a threshold. Testing on Gemma 2 27B, Qwen3 32B, and Llama 3.3 70B showed harmful response rates to jailbreak prompts dropped by roughly half (e.g., 83% to 41% for Qwen3 32B) without degrading benchmark performance. The technique targets character-based jailbreaks that bypass system prompts by manipulating a model's internal representational state.

Evaluation and Benchmarking AI Safety Research Gemma 2 9B assistant axis Llama 3.1 70B +12 more

7Mistral Ai News·Jun 1, 2026·source ↗

Mistral AI Releases Ministral 3B and 8B Edge Models

Mistral AI has introduced two new small language models, Ministral 3B and Ministral 8B, targeting on-device and edge computing use cases. Both models support up to 128k context length and claim state-of-the-art performance in the sub-10B parameter category, outperforming comparable models from Google and Meta on internal benchmarks. Ministral 8B features an interleaved sliding-window attention mechanism for memory-efficient inference and is priced at $0.1/M tokens via API, while Ministral 3B is priced at $0.04/M tokens. Weights for Ministral 8B Instruct are available for research use, with commercial licensing available on request.

Long Context Evolution Frontier Model Releases Mistral AI Gemma 2 9B Ministral 8B +12 more