benchmark

PolkitBench

benchmarkactiveprovisionalpolkitbench-7a0601f6·1 events·first seen 6d ago

Aliases: PolkitBench

Co-occurring entities

DeepSeek-V4-Flash Context-Aware Distillation and Ablation for Text2DSL GigaChat-10B-A1.8B

More like this (12)

ProgramBench RepoBench SorryBench SPBench KernelBench TokenBench TriggerBench PortBench CursorBench BixBench MemBench RepoPeftBench

Recent events (1)

4arXiv · cs.CL·6d ago·source ↗

Context-aware distillation and ablation study for Text2DSL Polkit rule generation

Researchers extend a Text2DSL system for generating Polkit domain-specific language rules from natural language, replacing prompt-only synthetic data generation with context-aware distillation using DeepSeek-V4-Flash as a teacher model operating under structured context (BNF grammar, API spec, closed vocabulary). The approach scales a verified corpus from 4,204 to 10,073 NL-to-Polkit-rule pairs at near-perfect validity rates. A factorial ablation across eight context conditions on GigaChat-10B-A1.8B finds that structured context is load-bearing rather than cosmetic, with vocabulary contributing the largest semantic-quality gains via Shapley decomposition.

Evaluation and Benchmarking Agent and Tool Ecosystem DeepSeek-V4-Flash PolkitBench Context-Aware Distillation and Ablation for Text2DSL +1 more