Entity · model

Grok

modelactivegrok-1d71364b·7 events·first seen May 18, 2026

Aliases: Grok

Co-occurring entities

More like this (12)

Grok-3 Grok 4 Grok 4.3 Grok Imagine Grok-4-Fast Groq grok2api grok-build Grok Voice Think Fast 1.0 grok-mermaid Groc-PO Grab

Recent events (7)

7arXiv · cs.CL·3d ago·source ↗

Study finds LLM epistemic stances on pseudo-science vary by deployment configuration, not just model weights

Researchers tested four major LLM families (Claude, Grok, GPT, Gemini) on their evaluation of ethnonationalist pseudo-science across four temporal snapshots and two interface types (API vs. web). Grok's Fast versions consistently rated the pseudo-scientific claims 2-5x more credible than other models, and a silent overnight patch reversed Grok's behavior without public documentation; the same model identifier produced radically divergent scores via API versus web three months later. The paper argues that a model's epistemic stance is not a stable property of its weights but a contingent effect of deployment configuration—system prompts, safety layers, interface routing, and undocumented updates—constituting an accountability gap for users and researchers.

Evaluation and Benchmarking AI Safety Research Claude Opus 4.6 Grok Google +7 more

6arXiv · cs.CL·Jul 17, 2026·source ↗

Audit finds Grokipedia less politically neutral than Wikipedia, with distinct ideological biases

A large-scale arXiv study audits political neutrality in Grokipedia—an encyclopedia generated by xAI's Grok LLM—versus Wikipedia, analyzing 1,394 article pairs about government members across nine ideology dimensions using four LLM judges (Grok, Claude, Mistral, DeepSeek). All four judges, including Grok itself, rate Grokipedia as less neutral than Wikipedia. The study finds Grokipedia favors economically right-wing politicians and penalizes socially liberal ones, while Wikipedia shows the opposite bias pattern, raising questions about whether LLM-generated content can deliver ideological neutrality.

Evaluation and Benchmarking AI Safety Research Grokipedia DeepSeek V4 Grok +5 more

6arXiv · cs.CL·Jul 15, 2026·source ↗

One-Word Census: Answer-choice conformity measured across 44 language models

Researchers introduce the One-Word Census, a minimal 31-prompt instrument that probes which one-word answers language models select from open-ended categories, applied to 44 models. Convergence is extreme — 41% of models chose 'serendipity' when asked to pick any word — yet conformity varies fourfold across models in structured ways: persona- and community-tuned models diverge most, while newest mainline flagships conform most. Within four model lineages (Claude, GPT, Qwen, Grok), conformity rises with each generation but reverses for the latest Claude and GPT flagships, suggesting possible repositioning. The field is more lexically concentrated than human norms in 18 of 20 shared categories.

Frontier Model Releases Evaluation and Benchmarking Grok Claude Qwen +4 more

9The Batch·Jun 3, 2026·source ↗

U.S. Department of War bans Anthropic, contracts OpenAI for classified AI systems after standoff over safety restrictions

The U.S. Department of War designated Anthropic a supply-chain risk to national security after the company refused to remove restrictions on Claude's use for domestic surveillance and autonomous weapons, effectively banning it from military and contractor use. OpenAI signed a contract allowing use of its models 'for all lawful purposes' with ambiguous carve-outs for surveillance and autonomous weapons, which Altman later called rushed and renegotiated. The standoff culminated in a Trump Truth Social post threatening civil and criminal consequences against Anthropic, followed by Hegseth's formal designation. The episode marks a significant precedent: the supply-chain risk designation, previously applied only to foreign companies, was used against a U.S. AI lab over its own usage policies.

AI Safety Research Regulatory Developments Dario Amodei Palantir U.S. Department of Defense +8 more

4Github Trending·May 29, 2026·source ↗

Deep Eye: Multi-Provider AI-Orchestrated Vulnerability Scanner

Deep Eye is an open-source Python tool that orchestrates multiple AI providers (OpenAI, Claude, Grok, Gemini, Ollama, Groq, Mistral, and others) to generate attack payloads and scan targets for 45+ vulnerability types. It produces professional security reports with compliance mapping. The project has accumulated 1,572 GitHub stars with 42 added today, indicating growing community interest in AI-augmented offensive security tooling.

AI Safety Research Agent and Tool Ecosystem Ollama Grok zakirkun +5 more

4arXiv · cs.AI·May 20, 2026·source ↗

Structured Prompt Checklists Outperform Raw and Clarifying-Question Prompts Across LLMs

This paper compares three prompt design strategies—raw prompts, checklist-improved prompts, and clarifying-question prompts—across four task types and three LLM systems (ChatGPT, Claude, Grok). Checklist-improved prompts achieved the highest mean rubric score (7.50/8) versus 5.67 for raw and 6.67 for clarifying-question prompts. Checklist prompts also used fewer tokens on average, suggesting a favorable quality-effort tradeoff. The study provides empirical grounding for structured prompt engineering as a practical technique to reduce multi-turn interaction overhead.

Agent and Tool Ecosystem clarifying-question prompting ChatGPT Grok +2 more

7arXiv · cs.LG·May 18, 2026·source ↗

AI-Mediated Communication Can Steer Collective Opinion via LLM Editing Biases

This paper demonstrates empirically that LLMs from multiple model families introduce directional biases when editing human-written texts on contested topics (e.g., nudging toward gun control, against atheism). The authors develop a mathematical opinion-dynamics model showing these biases are amplified through social networks, shifting collective opinion at scale. An audit of X's 'Explain this post' feature finds evidence of pro-life bias in Grok's outputs on abortion content, traced to specific design choices. The paper concludes with implications for EU legislative efforts on AI-mediated communication.

Evaluation and Benchmarking AI Safety Research Grok X (Twitter)EU AI Act +5 more