benchmark
MosaicLeaks
benchmarkactiveprovisional
mosaicleaks-5cf08717·1 events·first seen 2d agoAliases: MosaicLeaks
Co-occurring entities
More like this (12)
Recent events (1)
MosaicLeaks: Benchmark for evaluating secret-keeping in research agents
ServiceNow published a post on Hugging Face introducing MosaicLeaks, an evaluation focused on whether research agents can maintain confidentiality of sensitive information during task execution. The work targets a specific safety and alignment concern for agentic systems: information leakage during multi-step research workflows. This is relevant to the growing body of work on agent safety and trustworthiness in enterprise contexts.