Almanac
company

TNG Technology Consulting

companyactivetng-technology-consulting-ea950960·4 events·first seen 28d ago

Aliases: TNG Technology Consulting

Co-occurring entities

More like this (12)

Recent events (4)

4Hugging Face Blog·28d ago·source ↗

Finetuning olmOCR to be a faithful OCR-Engine

TNG Technology Consulting describes a fine-tuning approach applied to olmOCR, a vision-language model designed for document OCR tasks, to improve its faithfulness and reduce hallucinations. The post covers dataset construction, training methodology, and evaluation results showing improved accuracy on document extraction benchmarks. This represents a practical community contribution to the open-weights document-understanding ecosystem.

4Hugging Face Blog·28d ago·source ↗

How Long Prompts Block Other Requests - Optimizing LLM Performance

This Hugging Face blog post from TNG Technology Consulting examines how long prompts create head-of-line blocking in LLM serving systems, degrading latency for concurrent requests. The post analyzes the mechanics of prompt processing in inference pipelines and discusses optimization strategies to mitigate throughput bottlenecks caused by lengthy context inputs. It is framed as a practical guide for teams deploying LLMs in production environments where mixed prompt-length workloads are common.

4Hugging Face Blog·28d ago·source ↗

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

This Hugging Face blog post from TNG Technology Consulting examines how prefill and decode phases interact under concurrent request loads in LLM serving systems. It analyzes performance bottlenecks that arise when multiple requests share GPU resources, covering throughput-latency tradeoffs and optimization strategies. The piece targets practitioners deploying LLMs at scale who need to understand scheduling and batching behavior.

4Hugging Face Blog·28d ago·source ↗

Efficient Request Queueing – Optimizing LLM Performance

This TNG Technology Consulting post on the Hugging Face blog examines request queueing strategies for improving LLM inference throughput and latency. It addresses how queuing policies and batching decisions affect performance under varying load conditions. The piece is aimed at practitioners deploying LLM inference infrastructure at scale.