Entity · model

BERT

modelactivebert-5022c47f·13 events·first seen May 19, 2026

Aliases: BERT

Co-occurring entities

More like this (12)

ModernBERT NeoBERT BioBERT mmBERT ClinicalBERT PubMedBERT BERTopic w2v-BERT 2.0 IndicBERT Romanian BERT mBERT wav2vec2-BERT

Recent events (13)

5arXiv · cs.AI·3d ago·source ↗

MDTransformer: Mode-division photonic transformer accelerator with inverse-designed coherent crossbar

Researchers propose MDTransformer, a hardware-software co-design for photonic transformer acceleration using mode-division optical dataflow instead of multi-wavelength approaches. The system uses inverse-designed multi-mode couplers and Mach-Zehnder IQ modulators to form a compact photonic tensor core, with each guided mode acting as an independent computational lane for four-fold parallelism per waveguide. Evaluated against DeiT and BERT workloads, MDTransformer achieves 40.4% area reduction, 63.6% power saving, and 40.6% energy saving over prior photonic transformer accelerators while maintaining comparable latency.

Training Infrastructure Inference Economics MDTransformer DeiT BERT

3arXiv · cs.CL·Jul 23, 2026·source ↗

Maskability Index: A metric for predicting prompt-objective alignment in pretrained language models

Researchers introduce the Maskability Index (MI), a quantitative metric that estimates whether a knowledge relation is better suited to masked-style or prefix-style prompting in few-shot generation. MI is derived from differences in DepthRank scores between masked and unmasked templates and is evaluated on the ATOMIC2020 knowledge base completion benchmark. Results show MI correlates positively with downstream generation performance, suggesting it can guide template selection for relational knowledge extraction, particularly in low-resource settings.

Evaluation and Benchmarking Maskability Index ATOMIC2020 T5 +1 more

3arXiv · cs.CL·Jul 20, 2026·source ↗

BERT-based candidate-attended dialogue state tracking with zero-shot generalization

A preprint introduces a scalable multi-domain dialogue state tracking (DST) framework that uses pretrained BERT to achieve zero-shot generalization across domains without additional training. The system is evaluated on the Schema-Guided Dialogue (SGD) dataset, showing improvement over prior baselines. The work targets scalability challenges in task-oriented dialogue systems like voice assistants.

Evaluation and Benchmarking Google Assistant Schema-Guided Dialogue Alexa++2 more

3arXiv · cs.CL·Jul 15, 2026·source ↗

Translation-based fine-tuning of English BERT compared to native-language models across six NLP tasks and five languages

A new arXiv preprint systematically compares translation-based fine-tuning of English BERT against native-language BERT models across six NLP tasks using datasets from Bulgarian, Chinese, Dutch, Italian, and Russian. The translation approach was comparable or superior in 53.3% of cases, with strongest gains in Question Answering, POS Tagging, and NLI, but weaker performance on Named Entity Recognition and Hate Speech Detection. Results suggest translation-based fine-tuning is most effective for syntactic tasks and typologically English-close languages like Dutch, offering a resource-efficient path for low-resource NLP.

Evaluation and Benchmarking Translation as a Computationally Efficient Bridge: Feasibility of English BERT for Low-Resource Languages BERT

5arXiv · cs.AI·Jul 15, 2026·source ↗

Pythia: Multi-agent system for fine-tuning-free clinical symptom extraction from notes

Researchers present Pythia, a multi-agent system that autonomously writes and optimizes extraction prompts for clinical concepts in medical notes without manual prompt engineering or model fine-tuning. Running on locally hosted open-weights models, Pythia achieved mean sensitivity of 0.76 and specificity of 0.95 across 72 symptoms from 400 clinical notes, outperforming a curated lexicon on specificity and a per-concept BERT classifier on both metrics. The system's key advantage is recovering high specificity (0.97) for concepts where lexicon-based approaches over-trigger, while remaining deployable on local infrastructure for data privacy. Sensitivity degrades below 5% prevalence, a noted limitation for rare findings.

Enterprise Deployment Patterns Agent and Tool Ecosystem A Multi-Agent System for Autonomous, Fine-Tuning-Free Clinical Symptom Detection: Development and Validation Study Pythia BERT

4arXiv · cs.CL·Jul 10, 2026·source ↗

Procrustes-conditioned Joint SAE extracts cross-seed universal features from BERT models

Researchers introduce a Procrustes-conditioned Joint End-to-end Top-K Sparse Autoencoder (SAE) to address cross-seed feature universality in mechanistic interpretability of BERT models. By applying an orthogonal Procrustes rotation between independently trained models' activation spaces before joint SAE training, the method produces more consistent features (Pearson r ≥ 0.70) than post-hoc alignment baselines across three NLP benchmarks. The work targets a fundamental challenge in dictionary learning: non-convex optimization causes independently trained networks to learn misaligned feature spaces, making it difficult to identify truly universal features. High-universality features are shown to encode interpretable sociolinguistic patterns.

Evaluation and Benchmarking AI Safety Research SST-2 Sparse Autoencoder Cross-seed explainability using Procrustes-conditioned Joint End-to-end Top-K Sparse Autoencoders +2 more

4arXiv · cs.CL·Jul 2, 2026·source ↗

Training-free graph-based framework for reading order inference in complex document layouts

A new arXiv preprint presents a training-free, graph-based method for inferring reading order in complex historical document layouts, including the challenging Glossa Ordinaria manuscript format where text and commentary are spatially interleaved. The approach scores edges in a directed candidate-transition graph using lightweight language model signals (causal LM likelihood and BERT NSP) and recovers global reading order via a degree-constrained directed path cover with a max-regret inference rule. On wrap-around Glossa layouts the method achieves 95% edge accuracy versus 50% for XY-cut, and 88% versus 75% for XY-cut and 25% for LayoutReader on OmniDocBench multi-column pages. The work is relevant to document digitization pipelines and OCR post-processing for historical archives.

Evaluation and Benchmarking OmniDocBench PaddleOCR PP-StructureV3 LayoutReader +2 more

3arXiv · cs.CL·Jul 1, 2026·source ↗

SpikeLogBERT: Spiking transformer network for energy-efficient log parsing via knowledge distillation

SpikeLogBERT is a spiking neural network framework for log parsing that combines a spiking transformer architecture with knowledge distillation from a BERT teacher model. Evaluated on the HDFS dataset, it achieves a parsing accuracy of 0.99997 while reducing estimated theoretical energy consumption by up to 62.6% compared to standard ANN-based approaches. The work targets the inference efficiency bottleneck in neural log analysis pipelines used for anomaly detection and system monitoring.

Inference Economics SpikeLogBERT HDFS BERT

5arXiv · cs.CL·Jun 23, 2026·source ↗

Roofline-inspired scaling model predicts Transformer fine-tuning energy consumption across GPU configurations

A new arXiv preprint presents a framework for modeling energy consumption during Transformer training on multiple GPUs, using BERT architectural sweeps to relate measured energy to proxies for compute, memory traffic, and hardware efficiency. The approach adapts roofline modeling with a speedup-based hardware-efficiency factor that accounts for tensor parallelism and fully sharded data parallelism. The resulting scaling law accurately predicts training energy across heterogeneous configurations, targeting sustainable and cost-aware system design.

Training Infrastructure Inference Economics The Energy Consumption of Transformer Fine-Tuning: A Roofline-Inspired Scaling Model BERT

4arXiv · cs.CL·May 26, 2026·source ↗

Forgotten Words: Benchmarking NeoBERT for Dementia Detection in Low-Resource Conversational Filipino and English Speech

This paper presents the first NLP-based dementia detection study for Filipino speech, constructing a parallel bilingual dataset of 4,000 DementiaBank-derived transcripts with manual Filipino translations. Five model families are evaluated across monolingual, zero-shot cross-lingual, and bilingual fine-tuning settings. English-trained BERT degrades sharply on Filipino (Macro-F1 = 0.455), but bilingual fine-tuning recovers performance to Macro-F1 = 0.969–0.973 across all transformer models. The key finding is that multilingual clinical NLP performance is driven by linguistic coverage during training rather than model scale or architecture.

Evaluation and Benchmarking TF-IDF + Logistic Regression NeoBERT DementiaBank +4 more

3Hugging Face Blog·May 19, 2026·source ↗

Accelerate BERT inference with Hugging Face Transformers and AWS Inferentia

This Hugging Face blog post describes how to deploy BERT models on AWS Inferentia chips using the Hugging Face Transformers library and Amazon SageMaker. It covers the workflow for compiling models with AWS Neuron SDK and running optimized inference on Inferentia hardware. The post targets practitioners looking to reduce inference costs and latency for transformer-based NLP workloads.

Inference Economics Enterprise Deployment Patterns Amazon SageMaker AWS Inferentia2 Hugging Face Transformers +4 more

3Hugging Face Blog·May 19, 2026·source ↗

Pre-Train BERT with Hugging Face Transformers and Habana Gaudi

This Hugging Face blog post from August 2022 describes how to pre-train a BERT model from scratch using the Hugging Face Transformers library on Habana Gaudi hardware accelerators. It covers the full pipeline including data preparation, tokenizer training, and masked language modeling pretraining. The post serves as both a technical tutorial and a demonstration of Habana Gaudi's viability as an alternative AI training accelerator.

Training Infrastructure Habana Gaudi Hugging Face Transformers Hugging Face +2 more

6Hugging Face Blog·May 19, 2026·source ↗

Finally, a Replacement for BERT: Introducing ModernBERT

Hugging Face introduces ModernBERT, a modernized encoder-only transformer model designed as a successor to BERT. The model incorporates architectural improvements developed since BERT's 2018 release, targeting better performance on downstream NLP tasks. ModernBERT aims to fill the gap for efficient encoder models in retrieval, classification, and other discriminative tasks where decoder-only LLMs are often overkill.

Open Weights Progress Inference Economics ModernBERT Hugging Face BERT +1 more