Good guardrails are concrete and enforceable.
Prompt rules help, but real guardrails are code-level constraints: allowlisted tools, permission checks, and sandboxed execution.
Agents should operate with the minimum permissions required. This reduces blast radius if the system behaves incorrectly.
You can define which actions require approval (e.g., production changes) and enforce that in code. That’s observable and testable.
If an agent can act, require: tool allowlists, least-privilege credentials, sandboxing for execution, and human approval for high-impact actions.