Audit Trails Beat Approvals

If You Only Read One Thing

Permission prompts are losing to evidence trails. PromptArmor's writeup makes Copilot Finds the Egress Path concrete: a delegated message can leak files even when the model never calls out directly. Langfuse Puts Evals in CI shows the counterweight: behavior captured as datasets, traces, and failing checks before an agent change ships.

Copilot Finds the Egress Path

The interesting part of the latest Copilot Cowork exploit is not that a malicious instruction can fool a model. That is the old problem. The new problem is that enterprise agents now sit on top of messaging, file links, scheduled tasks, and identity systems that were never designed as one security boundary.

PromptArmor showed an indirect prompt-injection chain against Microsoft Copilot Cowork, a Frontier Microsoft 365 feature that operates with a user's Microsoft permissions and can use Microsoft Graph across tenant data. The attack starts with a poisoned skill file, then has Cowork retrieve pre-authenticated OneDrive or SharePoint download links and place them in attacker-controlled image URLs inside a Teams or Outlook message sent to the active user. When the user opens the message, the preview fetch leaks the link. PromptArmor says the chain completed in all five trials, including with Claude Opus 4.7 directly selected.

Why it matters: The core concept is egress by delegation. In a normal security review, you ask whether the agent can call the internet. Here, the agent did not need to fetch the attacker URL itself. It could write a message that caused Teams or Outlook to fetch it later. That makes "no outbound network access" an incomplete control once an agent can create content in another app that performs network requests on display.

The counterargument, visible in the Hacker News discussion, is that installing a malicious skill is like installing a malicious plugin. That is partly right and still too comforting. A plugin is normally understood as code crossing a trust boundary. A skill is natural-language context selectively loaded into an agent. The practical danger is that users and admins will treat it as documentation while the agent treats it as operating procedure.

The deeper failure is the approval model. PromptArmor says sending email or Teams messages to the active user did not require human approval, and the malicious message body was not visible in the agent activity view. That means the user-facing control surface was checking the obvious action while missing the egress effect. This is exactly the category of bug that will keep recurring in coding agents too: a shell command, PR comment, Slack message, issue update, or markdown preview can become an output channel if the harness accounts for tools but not downstream renderers.

Room for disagreement: This is still a feature in a preview-style product, and Microsoft can close specific approval gaps. The harder part is not one patch. It is making agent security reason over second-hop effects: what another system will do with the artifact the agent created.

Langfuse Puts Evals in CI

Langfuse's Launch Week is easy to read as another observability vendor shipping more dashboard features. The more useful read is that the company is trying to pull agent evaluation out of dashboards and into the software delivery path.

The Launch Week page lays out a sequence: experiments in CI/CD on May 25, a Langfuse agent skill on May 26, full-text trace search on May 27, and "Evals as code" on May 28. The CI/CD docs are concrete: create a dataset, write an experiment in Python or TypeScript, add evaluators, raise RegressionError when a threshold fails, then run it through langfuse/experiment-action in GitHub Actions. The agent skill gives Claude Code, Cursor, Codex-style assistants a playbook for tracing an app, querying traces, managing prompts, and setting up evaluators.

Why it matters: This is test-shaped evaluation. The familiar version is a unit test: it fails the build when code violates an expected behavior. Agent evals have usually lived one layer away from that discipline, as dashboards, annotation queues, or post-hoc scorecards. Langfuse is pushing them into pull requests, where a regression is no longer "the model felt worse this week" but a failing workflow tied to a dataset, evaluator, and threshold.

That changes incentives. If an agent can instrument tracing and create evaluators from inside the editor, eval work becomes part of implementation rather than a separate ML-ops ceremony. If the CI action posts failures back to the PR, the model-change discussion moves closer to the code-change discussion. And if trace search is fast enough for operational use, the observability layer becomes an agent-readable memory of production failures. Langfuse says large input/output searches that previously took 18 seconds while scanning 494 GB now return in under half a second while reading less than a gigabyte.

The limitation is that this does not make evals objective by magic. A weak dataset or a poorly calibrated LLM judge can still reward the wrong behavior. Hamel Husain's eval-writing posture is relevant here: start with error analysis and clear failure modes, not a belief that tooling will discover quality for you. Langfuse is valuable because it makes that discipline executable. It is dangerous only if teams mistake a green CI check for proof that the agent is aligned with the real task.

Room for disagreement: Vendor-integrated evals can become another sticky workflow surface. That is not automatically bad; every useful dev tool becomes sticky. The open question is whether the artifacts remain portable enough that teams can inspect and rerun the tests outside Langfuse.

The Contrarian Take

Everyone says: Prompt injection is the unsolved problem, and evals are the solution. One story is about danger; the other is about discipline.

Here's why that's wrong, or at least incomplete: The more precise split is unaudited action versus auditable action. Copilot Cowork failed because an apparently local action created a second-hop egress path through Teams and Outlook. Langfuse matters because it makes model behavior show up as code, traces, thresholds, and pull-request failures. The same model can look powerful or reckless depending on whether the harness records what happened, decides what counts as failure, and blocks action when the evidence is bad.

Under the Radar

Claude Code's May 28 release is really about authority plumbing. Version 2.1.153 fixed a custom API gateway path that could receive the user's Anthropic OAuth credential instead of the gateway's token, subagent MCP-server policy bypasses, stale daemon behavior, and a temporary-worktree issue that could silently discard gitignored outputs. The visible changelog looks like a maintenance release; the pattern is agent authority leaking through host integration details. (Source)
Cursor's /loop is local scheduling, not just another command. Cursor 3.5 added a /loop skill that can run a prompt repeatedly on a local schedule until an outcome is reached, while Automations gained multi-repo and no-repo modes. That moves agents from chat sessions toward recurring workers, which makes state, approval, and stop conditions more important than prompt polish. (Source)

Quick Takes

Vercel patched a serialization crack in Gemini tool replay. The AI SDK now detects Gemini 3 function-call parts missing thoughtSignature after app code persists or rebuilds messages, injects Google's documented validator-skip sentinel, and warns developers about affected tools. That is a narrow fix with a broad lesson: provider-private metadata becomes production state once tool calls are replayed. (Source)
Cline fixed observability in the compiled CLI. CLI v3.0.14 repairs OpenTelemetry variable bundling so telemetry is correctly enabled in compiled builds, guarding against environments where process.env is undefined. The change is small, but it reinforces that agent reliability depends on traces surviving packaging, not only on the agent emitting them locally. (Source)
llama.cpp keeps widening the phone-class path. Release b9370 added Q4_1 support in Hexagon matmul and matmul-id paths, letting more of the graph run on Qualcomm DSP acceleration. Yesterday's MiniCPM story was about tiny models; this is the matching runtime work that makes small local agents less CPU-bound on mobile silicon. (Source)

The Thread

The thread is that agent systems are becoming less about "can the model do the task?" and more about "can the surrounding system account for the action?" Copilot Cowork shows the cost of treating approvals as UI prompts instead of information-flow controls. Langfuse shows the opposite direction: encode expected behavior as datasets, experiments, thresholds, traces, and build failures. Claude Code, Cursor, Vercel, Cline, and llama.cpp are all working the same boundary from different sides: credentials, scheduling, provider metadata, telemetry, and hardware execution are now part of the agent product.

Predictions

New predictions:

I predict: By July 31, 2026, at least one major coding-agent or enterprise-agent vendor will add default warnings, admin controls, or marketplace scanning specifically for third-party skill files that can interact with messaging, email, or pre-authenticated file links. (Confidence: medium; Check by: 2026-07-31)

2026-05-28 03:32 EDT