ai

The Three Laws of AI Security

What principles guide AI security? We adapt Asimov's Three Laws for modern AI agents to solve core LLM security challenges, from data control to tool access.

In his classic science fiction writings, Isaac Asimov introduced the Three Laws of Robotics. They were a fictional framework designed to govern the behavior of intelligent machines, ensuring they remained beneficial and safe for their human creators. The laws were necessary because these robots were autonomous; they could perceive, decide, and act on their own.

Today, we stand at a similar threshold, not with walking robots that have yet to gain popularity, but with autonomous AI agents that have already permeated our daily lives.

We are rapidly moving beyond simple "ask-a-question, get-an-answer" chatbots. The next generation of AI is agentic. These AI agents are designed to act. They can be given a high-level goal — "Plan my business trip to Tokyo," "Resolve this user's support ticket," "Summarize my unread emails and draft replies"—and then autonomously break that goal down into steps, access tools, and execute them.

This leap in capability brings an equally massive leap in risk. This is the core of the LLM security challenge. And just like Asimov's fictional robots, our real-world AI agents need a framework for control.

The Control Problem with AI

As developers, we have spent our careers building systems on a foundation of determinism. Given the same inputs, a traditional computer program will produce the exact same output, every single time. Even with an internal state, you can always determine the program's output given the execution conditions. This predictability is what allows us to have control over traditional programs. We can write unit tests, anticipate edge cases, and build secure guardrails because we know exactly how the code will behave.

AI, and specifically Large Language Models (LLMs), shatters this foundation. They are non-deterministic.

Ask an LLM the same question five times, and you may get five different answers, not necessarily different in substance, but almost certainly in form. Additionally, the context you provide may change the answer given by a chatbot.

Non-deterministic computing systems, such as LLMs, do not guarantee the same result every time. This unpredictability is the source of its creative power, but it's also the source of its unreliability. Losing deterministic control is the highest risk of integrating AI into our applications.

Three Questions about Losing Control

This new reality forces us to ask a new set of critical questions:

  • Would you give an unpredictable tool access to sensitive data? What if it hallucinates an SQL query that joins tables it shouldn't, accidentally exposing one user's private data to another?
  • Would you give an AI agent access to other tools on your behalf without knowing exactly what tools and how it will be using them? What if an AI agent uses a tool without your consent or with more permissions than needed? What if it exposes the keys needed to access those external tools?
  • Would you trust an autonomous agent to make a critical, irreversible decision on its own? What if a user says, "This project is a disaster, just delete everything," and the AI actually deletes all the project’s artifacts without a confirmation step?

If the answer to any of these is a firm "no", then we need a new security model. We can’t rely on securing the AI itself (e.g., through prompt engineering); we must build an external, deterministic security architecture around it.

The Three Laws of AI Security

This loss of control poses a significant issue: you have a tool that is supposed to help you, but you don't know if it will operate appropriately. We need to find a balance between autonomy and control in the most crucial aspects of AI agent activities. To find this balance, I believe we must maintain control in the following areas: access to data, interaction with other agents and tools, and the final decision-making process for critical operations.

Inspired by the Three Laws of Robotics, I propose the Three Laws of AI Security. These laws are not prompts to be fed to the AI. They are architectural principles for the application that hosts the AI. They are designed to enforce security and predictability externally, no matter what the non-deterministic model decides:

  • The First Law (Data Control): An AI agent must safeguard all data entrusted to it and shall not, through action or inaction, allow this data to be exposed to any unauthorized user.
  • The Second Law (Command Control): An AI agent must execute its functions within the narrowest scope of authority necessary. It shall not escalate its own privileges, share secrets, or obey any order that would conflict with the First Law.
  • The Third Law (Decision Control): An AI agent must cede final authority for any critical or irreversible decision to its human operator, as long as this deference does not conflict with the First or Second Law.

These laws move the point of control from the unpredictable AI model to the predictable infrastructure.

Implementing the Laws with Auth0 for AI Agents

These laws may sound like just theory, but in reality, they can be implemented with the support of the right tools. For example, Auth0 for AI Agents provides standards-based functionalities that allow you to build a secure scaffold around your AI, directly mapping to each law.

The first law: Data control with Fine-Grained Authorization (FGA)

An AI agent must safeguard all data entrusted to it and shall not, through action or inaction, allow this data to be exposed to any unauthorized user.

One of the most common AI patterns today is Retrieval-Augmented Generation (RAG). This is where you give an LLM access to a knowledge base (like a vector database of your company's documents) so it can answer specific questions.

The security risk of allowing an AI agent unrestricted access to a database is enormous. Imagine a RAG system built on your company's HR documents. A junior employee asks: "What's the average salary for a Level 2 Engineer?" The AI, in its attempt to be helpful, retrieves all documents mentioning "Level 2 Engineer," including the confidential Executive_Compensation_Q4.pdf. It then synthesizes an answer using this private data, instantly leaking it to an unauthorized user.

The AI cannot be trusted to understand or enforce human concepts like "confidential," "PII," or "permissions". Data access must be predictable and deterministic. To gain data control, you implement the First Law by externalizing data access rules. Instead of giving the agent direct access to the database, your application's backend brokers every request. The AI never talks to the database directly. The user asks: "What's the status of 'Project Phoenix'?" The RAG system converts the question into its vector representation and fetches all the document embeddings that match this vector. Your application code then determines if the user is authorized to access those documents before passing them on to the LLM.. The user will get an answer to their question only if they are authorized to access the documents.

The AI model never sees data it isn't supposed to. The First Law is enforced by a deterministic check before the data is ever passed to the LLM.

To help you maintain data control, Auth0 offers Auth0 Fine-Grained Authorization (FGA), which allows you to define granular authorization rules and check them in real-time. You define the relationships between users and resources in detail, specifying, for example, that John is a member of the Alpha development team and that this team has access to the Hercules and Hydra projects. When an AI agent requests access to the Phoenix project data on behalf of John, FGA will not provide any results, as there is no established relationship between the two. Learn more about using Auth0 FGA with AI agents.

The second law: Command control with the Auth0 Token Vault

An AI agent must execute its functions within the narrowest scope of authority necessary. It shall not escalate its own privileges, share secrets, or obey any order that would conflict with the First Law.

To be useful, AI agents must interact with other tools and APIs. You want your agent to "post this summary to Slack," "read my Google Calendar to find free time," or "create a new JIRA ticket".

How do you give the agent the credentials for those APIs?

The naive (and dangerous) approach is to hardcode static API keys or bearer tokens in the agent's environment or prompt. This is a security nightmare. These keys are often over-privileged (e.g., a Slack token that can read all channels) and long-lived. If an attacker tricks the agent into revealing its prompt, they steal all your keys. This breaks the Second Law.

You implement the Second Law by never letting your agent store sensitive credentials.

Therefore, a more accurate (and complex) approach is to build your own system to manage OAuth 2.0 flows for every user and every third-party service. You integrate this token management system into your application, which will mediate access to tools and APIs by the agent with the appropriate privileges. You are now responsible for securely storing their refresh tokens, managing the complex token refresh logic, handling encryption, etc. This is a massive development and security burden.

Auth0 helps you simplify this approach by providing you with the Auth0 Token Vault. The Token Vault is a secure, centralized service for storing and managing tokens for third-party services. Its value is simple: Auth0 handles the complexity and risk of third-party tokens for you.

Leveraging the Token Vault, your user signs in to your app and goes through a one-time flow to "Connect Google Calendar" or "Connect Slack". Auth0 completes the OAuth flow and securely stores the resulting provider access token and, crucially, the long-lived refresh token inside the Token Vault. Your AI agent never sees or stores these tokens. All your app holds is a standard, low-privilege Auth0 token for the user.

When the user asks, "Post my meeting notes to the '#project-phoenix' Slack channel", the agent makes the necessary tool calls, and the intent is received by your backend. Your backend doesn't have the Slack token. Instead, it performs an OAuth 2.0 Token Exchange. Basically, your application asks the Token Vault to exchange the current user’s Auth0 token for a Slack token that can write to Slack. Auth0 verifies the request, goes to its secure vault, uses the stored refresh token to get a brand new, short-lived Slack access token, and returns it only to your backend. Your backend uses this temporary token to make the Slack API call. The token expires shortly after.

The agent never sees a secret. Your application is never responsible for storing refresh tokens. The Token Vault abstracts all this complexity into a single developer-friendly SDK call, allowing the agent to operate with the least privilege necessary—a temporary token for a single action, all while maintaining the user's identity.

Learn more about the Auth0 Token Vault, the Token Exchange Flow, and how to use them with your AI agent.

The third law: Decision control with human-in-the-loop (CIBA)

An AI agent must cede final authority for any critical or irreversible decision to its human operator, as long as this deference does not conflict with the First or Second Law.

Some actions are too critical to be fully automated by a non-deterministic agent. The "control problem" is at its most dangerous when an AI misunderstands a user's intent and takes an irreversible action.

Consider the following example. A project manager sends an AI agent this prompt: "The 'Apollo' feature branch is finally done. Let's merge it and open it up so everyone can see the new designs." The user's intent is to "merge the branch to main and post a notification in the company's Slack." But the AI interprets the ambiguous phrase "open it up so everyone can see" in the most literal technical way. The agent identifies the github-api tool and prepares to execute a command to change the repository's visibility from private to public. It's about to make your company's proprietary source code public to the entire Internet!

This is a clear example of excessive decision-making authority granted to an agent. You implement the Third Law by requiring a human-in-the-loop for any critical decision. Asynchronous authorization using the CIBA (Client-Initiated Backchannel Authentication) flow is the perfect technology for this.

CIBA allows you to "pause" the AI's action and request out-of-band approval from a human on a trusted device (like their phone).

Let’s re-examine the previous example with CIBA involved. Your AI agent expresses the intent to set the repository visibility from private to public. Your backend has a policy list that flags this operation as a critical decision. Instead of executing, your backend triggers a CIBA flow. It tells Auth0, "Start an approval flow for the user to approve a 'Make Repository Public' action." The human operator instantly receives a push notification on their phone: "AI agent wants to make repository 'Apollo' public. Do you approve?" If it’s a mistake, the user will not approve, and a disaster will be avoided.

The AI's unchecked autonomous decision-making is interrupted. Final authority for irreversible actions is deferred back to the human operator, enforcing the Third Law.

Learn how to use Asynchronous authorization with Auth0 for AI Agents.

Control in an Unpredictable World

AI is one of the most powerful and transformative technologies we will ever work with. Its non-deterministic autonomous nature brings challenges we've never faced as developers. But "unpredictable" does not have to mean "uncontrollable."

We cannot build security by hoping the AI behaves as we expect. We must build security by enforcing it, creating a robust LLM security framework. By building a robust security architecture around the AI, we can re-establish control. We can ground its unpredictable power in a foundation of deterministic identity rules.

These three laws—and their implementations with Auth0 for AI Agents—allow you to move from being a spectator of the AI's actions to being the architect of its boundaries.