LLaMA-7B
llama-7b-e0b996cb·4 events·first seen 28d agoAliases: LLaMA-7B, LLaMA-2 7B, LLaMA-2-7B
Co-occurring entities
More like this (12)
Recent events (4)
OrpQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization
This paper introduces Orthogonal Residual Projection (ORP), an algorithm-hardware co-design framework for ultra-low-bit quantization of LLMs and Vision Transformers targeting edge deployment. ORP addresses the structural limitations of Power-of-Two (PoT) quantization by formulating quantization as a dual-basis geometric projection that synthesizes higher-resolution residual lattices using only shift-and-add operations, eliminating multipliers. At 3-bit (W3/A16), ORP achieves 6.10 perplexity on LLaMA-2-7B, competitive with MAC-intensive baselines like AWQ, while reducing full-model calibration time to ~15 minutes. RTL synthesis at 28nm confirms hardware efficiency by mitigating timing bottlenecks from dense multiplier trees.
Conditional Scale Entropy: A Wavelet-Derived Tool for Mechanistic Interpretability of Metaphor Processing in Transformers
This paper introduces Conditional Scale Entropy (CSE), a wavelet-derived measure of how transformer computation engages across frequency scales at each layer, and applies it to study metaphor processing in decoder-only language models. The authors prove CSE is invariant to update magnitude, isolating structural computation patterns from intensity. Across architectures ranging from GPT-2 (124M) to LLaMA-2 7B and GPT-oss 20B, metaphorical tokens consistently produce higher spectral breadth than literal tokens in early-to-mid layers, with the effect surviving permutation correction and specificity controls. The work establishes multi-scale coordination as a consistent mechanistic signature of metaphorical language processing and positions CSE as a general interpretability tool for cross-depth structure in transformers.
SETA: Sparse Subspace-to-Expert Sharing for Continual Learning in LLMs
Researchers introduce SETA (Mixture of Sparse Experts for Task Agnostic Continual Learning), a framework addressing catastrophic forgetting in LLMs via adaptive sparse subspace decomposition into task-specific and shared expert modules. The approach uses adaptive elastic anchoring and routing-aware regularization to protect shared knowledge at both weight and routing levels. Experiments on LLaMA-2 7B and Qwen3-4B show competitive or superior performance versus continual learning baselines, with strong retention of early-task knowledge.
GaLore: Advancing Large Model Training on Consumer-grade Hardware
GaLore (Gradient Low-Rank Projection) is a memory-efficient training technique that reduces optimizer state memory by projecting gradients into a low-rank subspace during training, enabling large model training on consumer-grade hardware. The Hugging Face blog post covers integration of GaLore into the transformers and peft ecosystems. Unlike LoRA, GaLore applies low-rank projection to the full training process rather than constraining weight updates, allowing full-parameter learning with reduced memory footprint. This makes training models like LLaMA-7B feasible on single consumer GPUs.