← WRITING

· policy, approvals, architecture

Allow, approve, deny — why binary policy fails for AI agents

Allow/deny is why people disable security tools. A third decision — pause for human approval — is the difference between a policy engine you leave on and one you wrap in try/except.

Every time someone wires an AI agent to real tools — shell, filesystem, HTTP, a production database — they hit the same wall within a week. They start with allow/deny. It breaks. They loosen it. It breaks in the other direction. Eventually they turn it off.

The problem isn’t the agent. It’s the decision space.

Why binary policy fails

Imagine you’re writing a policy for a coding agent’s shell access.

The safe thing is to deny everything. But then the agent can’t do its job. So you write an allowlist: npm, pnpm, git status, ls, maybe grep. It works for a day.

Then someone asks the agent to “install the missing dependency,” and it needs apt install. Now you need sudo. You add sudo apt install to the allowlist. The next week it needs sudo apt update. You add that too. By month two, your allowlist is 40 entries long and the last three are rules like sudo .* because you gave up.

Or you go the other way. Deny the obviously dangerous things: rm -rf /, curl | sh, :(){ :|:& };:. Allow everything else. And then some day, the agent decides to chown -R your home directory, or mv ~/.ssh ~/.ssh.bak, and you realize your denylist was a list of things you thought of, not a list of things an agent might do.

Both directions are the same failure. The binary doesn’t match the shape of the problem. Agent actions aren’t binary. Most of them are safe. A small minority are genuinely dangerous. A much larger minority — the interesting middle — is conditional. Safe in some contexts. Not in others. Maybe-safe if the human who owns the keyboard sees what’s about to happen and says “yeah, go ahead.”

The third decision

Gatekeeper’s policy has three outcomes, not two.

The approve decision is the one that changes the game. It lets you write policies that are actually strict, because “strict” no longer means “the agent is blocked and the human has to rewrite the policy at 2am.” Strict means “the human sees it before it happens.”

What that looks like

Here’s the pattern we use for the finance-sync workflow in our own personal-assistant stack:

tools:
  http.request:
    allowed_hosts: ["api.finnhub.io", "api.coinbase.com"]
    deny_private_ips: true
    decision: allow             # read-only price fetch — fast path

  trade.place:
    decision: approve           # every trade: human pauses, checks, signs
    approval:
      expires_in: 3600
      notify: ["ntfy://trades-channel"]

  files.write:
    allowed_paths: ["/workspace/**"]
    deny_extensions: [".env", ".pem", ".key"]
    decision: allow             # scoped writes — fine

Three tools, three decisions, three different latency profiles. The agent doesn’t know or care — it makes the same HTTP call to Gatekeeper in every case. Gatekeeper decides what happens next.

Why this isn’t just “notifications with a step”

The thing that makes approve actually work — not just a theatrical pause — is that Gatekeeper is the one holding the request. The agent isn’t polling a queue. It doesn’t know the tool was paused. It’s making a synchronous tool call (or awaiting an async promise) and the response lands when — and only when — you click. Replay the URL and the nonce rejects. Wait an hour and the request expires. Every state change is in the audit log, stamped with which policy hash was active.

That’s the bit that makes developers leave it on. Not the feature list. The fact that it doesn’t get in their way when it shouldn’t, and it catches them when it should.

If you’re building agents

You’ll hit this wall. You don’t have to build the three-decision model yourself — Gatekeeper is Apache-2.0 and does exactly this. But if you build your own, please give yourself a third option. Binary policy is why the 2010s turned into a decade of “security tools everyone disables.” Don’t do it again.


Want Gatekeeper for your agents? Request early access or read the source.