Almanac
organization

Stanford University

organizationactivestanford-university-10763c80·10 events·first seen 1mo ago

Aliases: Stanford University

Co-occurring entities

More like this (12)

Recent events (10)

4Hacker News·16d ago·source ↗

AI Agent Guidelines for CS336 at Stanford

Stanford's CS336 (Language Models from Scratch) course has published explicit guidelines for AI agent behavior within its assignment repository, surfacing as a community discussion item on Hacker News. The CLAUDE.md file provides instructions governing how AI coding assistants should interact with course materials, likely addressing academic integrity and appropriate use boundaries. This represents an early example of educational institutions codifying AI agent behavior policies at the course level.

6arXiv · cs.AI·14h ago·source ↗

Stanford EDGAR Filings Dataset: 152B-token open corpus of SEC filings for LLM pretraining

Stanford researchers introduce the Stanford EDGAR Filings Dataset (SEFD), an open reconstruction of SEC filings into layout-faithful MultiMarkdown, releasing a 152B-token initial snapshot with a larger 550B-token archive described. The dataset targets the growing scarcity of high-quality long-context pretraining data, with less than 0.1% overlap with Common Crawl-derived corpora. Two derived benchmarks are also introduced: EDGAR-Forecast for filing-grounded numerical forecasting and EDGAR-OCR for complex financial table transcription. The work addresses a real gap in open long-context training data outside narrow domains like code.

7Openai Blog·28d ago·source ↗

Concrete Problems in AI Safety

OpenAI, Google Brain, Berkeley, and Stanford researchers co-authored 'Concrete Problems in AI Safety,' a foundational paper exploring research challenges in ensuring modern ML systems operate as intended. The paper identifies and frames specific technical safety problems for the field. Published in June 2016, it became a landmark reference for AI safety research agendas.

6Anthropic News·16d ago·source ↗

How scientists are using Claude to accelerate research and discovery

Anthropic describes how researchers are deploying Claude-powered systems across scientific workflows, highlighting three case studies: Biomni (a Stanford agentic platform integrating hundreds of biomedical tools), the Cheeseman Lab (automating large-scale gene knockout experiment interpretation), and others. The piece details Claude for Life Sciences and the AI for Science program, which provides free API credits to high-impact research projects. Specific benchmarks cited include compressing months-long GWAS analyses to 20 minutes and analyzing 336,000 single-cell datasets to identify novel transcription factors.

5Google Deepmind Blog·29d ago·source ↗

Uncovering repurposed medicines to fight liver fibrosis using Co-Scientist

A Stanford geneticist used Google DeepMind's Co-Scientist AI system to identify potential drug repurposing candidates for chronic liver disease and liver fibrosis. The work represents a real-world application of AI-assisted scientific discovery in a clinical domain. Co-Scientist is DeepMind's AI research assistant designed to accelerate hypothesis generation and experimental planning for scientists.

6The Batch·25d ago·source ↗

Agent Benchmarks Skew Toward Software Engineering, Missing Most Economically Valuable Labor

Researchers from Carnegie Mellon University and Stanford University mapped over 10,000 examples from 43 agent benchmarks to U.S. labor statistics using O*NET occupational taxonomies, finding that current benchmarks heavily over-represent software engineering relative to its share of employment and wages. Office and administrative support (18.2M workers, $869.8B wages) and management (11M workers, $1326.3B wages) are vastly under-represented compared to computer and mathematical occupations (5.2M workers, $563.6B wages). No single benchmark covered more than 50% of work activities, and all 43 benchmarks combined covered only 56.5% of work activities. The study identifies a systematic gap between where agentic AI is being evaluated and where the largest economic opportunity lies.

6The Batch·16d ago·source ↗

Test-Time Training End-to-End (TTT-E2E) Retrains Model Weights to Handle Long Inputs

Researchers from Astera Institute, Nvidia, Stanford, UC Berkeley, and UC San Diego introduced TTT-E2E, a method that compresses long context into transformer weights by training the model during inference via meta-learning. The approach uses sliding-window attention restricted to 8,000 tokens and updates only the fully connected layers of the last quarter of the network on each 1,000-token chunk at inference time, keeping per-token generation latency roughly constant as context scales to 128,000 tokens. TTT-E2E slightly outperforms vanilla transformers on next-token prediction loss across long contexts and matches efficient architectures like Mamba 2 and Gated DeltaNet on inference speed, but fails dramatically on Needle-in-a-Haystack retrieval beyond 8,000 tokens and incurs substantially higher training latency. The work reframes long-context handling as a training-inference trade-off rather than an architectural design problem.

6The Batch·7d ago·source ↗

Data Points: Apple/Google Siri overhaul, Gemma 4 12B, Kimi Code CLI, OpenJarvis, and U.S. OpenAI stake talks

A multi-item digest covers several significant AI developments: Apple is expected to announce a revamped Siri at WWDC that uses Google Gemini models distilled for on-device use alongside cloud routing, marking a notable Apple-Google AI partnership. Google released Gemma 4 12B, an encoder-free multimodal open-weights model designed for consumer laptops under Apache 2.0. Moonshot AI released Kimi Code CLI, an open-source terminal coding agent with native subagent orchestration and conversational MCP configuration. Stanford and Lambda Labs released OpenJarvis, an on-device agent framework claiming near-cloud accuracy at 800× lower API cost. The White House and OpenAI are reportedly negotiating a government equity stake in OpenAI as part of a proposed Public Wealth Fund.

7Anthropic News·1mo ago·source ↗

Anthropic Launches Claude for Healthcare and Expands Life Sciences Capabilities

Anthropic is expanding its healthcare and life sciences offerings with Claude for Healthcare, a HIPAA-ready product suite for providers, payers, and health tech companies, alongside new connectors to CMS databases, ICD-10, NPI Registry, and FHIR development tools. The announcement also highlights Claude Opus 4.5's improved performance on medical benchmarks including MedCalc and MedAgentBench, with extended thinking (64k tokens) and native tool use. New life sciences capabilities include connections to additional scientific platforms and support for clinical trial management and regulatory operations. The release positions Claude as an agentic research and administrative partner across healthcare workflows including prior authorization, claims appeals, and patient care coordination.

4The Batch·1mo ago·source ↗

Gallup Poll Shows AI Boosts Productivity, but Many Workers Haven't Tried It

A Gallup survey of 23,700 U.S. employees found that half used AI at work at least a few times in the past year, with daily use rising from 4% in 2023 to 13% in 2025. Among workers in AI-using organizations, 65% reported productivity improvements, though only 31% said it changed their workflows. Managerial support and organizational strategy were key predictors of adoption. The broader employment impact remains contested, with conflicting signals from macroeconomic data and labor market research.