ai

Securing AI Document Agents with LlamaIndex and Auth0

Learn how to build secure AI document agents using LlamaIndex Workflows and Auth0 FGA. Implement fine-grained, relationship-based access control for RAG.

One of the hardest parts of building AI agents for documents is making sure your users (and their agent proxies) only see documents they are authorized to access. Since AI is in the mix, authorization mistakes stop being annoying internal bugs and become serious security vulnerabilities. Given that AI systems are susceptible to a new class of natural-language based attacks, it could allow a threat actor to query their way to sensitive user data and documents.

In this post, we will walk through a real-world example of how you can integrate LlamaIndex Document Agents (powered by LlamaParse and LlamaAgents) and Auth0 FGA (Fine-Grained Authorization) in a Python application to cleanly solve authorization problems.

The demo application is a paycheck insights API: employees can ask natural-language questions about their own pay history, while managers can ask questions across their entire team. The authorization rules enforce that boundary automatically, at every layer of the stack.

The Problem Is That AI Makes Authorization Harder

Traditional authorization in web apps is coarse-grained: you check whether a user has a role, and either allow or deny the whole request. That falls apart in Retrieval-Augmented Generation (RAG) systems for two reasons:

  1. Documents are the unit of access, not endpoints. A single /search endpoint might retrieve from thousands of documents, each with its own owner. You cannot grant or deny access to "the search endpoint”, you have to decide document by document.
  2. LLMs synthesize across multiple documents. Even if you filter the retrieval results correctly, a subtle bug could let one unauthorized document slip into the prompt context, and the model might inadvertently include it in the final response.

The solution is to make authorization a first-class concern of the retrieval pipeline, not an afterthought bolted on at the API layer.

Relationship-Based Access Control with Auth0 FGA

Auth0 FGA is inspired by Zanzibar, Google's globally distributed authorization system, which powers Google Drive, Google Docs, YouTube, Google Cloud, and more. Rather than assigning permissions to roles, FGA models authorization as a graph of relationships between objects.

For our paycheck app, the model looks like this:

type user
  relations:
    define department: [department]
    define manager: manager from department

type department
  relations:
    define manager: [user]

type paycheck
  relations:
    define can_view: owner or manager from owner
    define owner: [user]

This captures a real organizational policy in easily understandable YAML-like syntax:

  • A user can view a paycheck if they are the owner (the employee it belongs to), or
  • If they are a manager of the department that the owner belongs to.

The power here is in the indirect relation: manager from owner which traverses the graph automatically. When Mary is set as manager of the Developer Relations department and John is a member of Developer Relations, FGA automatically derives that Mary can view John's paychecks without any application code explicitly checking that.

When a paycheck is uploaded, we write a single tuple with the following structure:

// user:john → owner → paycheck:abc123
{
    "user": "user:john",
    "relation": "owner",
    "object": "paycheck:abc123"
}

And that is it. Now, Mary's manager access flows through the pre-existing department relations, the authorization model does the rest, and no additional code is required on the application side.

Structured Extraction from Messy PDFs with LlamaParse

Beyond authorization, another challenge arises when working with document agents: most knowledge is locked into unstructured documents, and in our case, the paychecks are PDFs. PDFs are notoriously machine-unfriendly. They are designed for rendering and not for data extraction. Therefore, we cannot upload them as-is to our pipeline, since that would make querying paychecks for insights with natural language basically impossible. We need a text extraction layer, but not a naive one. Paychecks often contain tables, different fonts, and alignments, and a simple parsing strategy can lead to significant loss of structured information.

LlamaParse solves this. It is a document parsing service purpose-built for use in RAG pipelines and handles:

  • Table extraction that tries to preserve row/column relationships
  • Layout-aware parsing that understands multi-column and multi-section documents
  • Markdown output that downstream LLMs can consume directly
  • Async processing with polling, so your API is not blocked waiting on parse jobs

In our upload flow, the sequence looks like this:

# 1. Upload the raw PDF to LlamaCloud
file = llama.files.create(file=(filename, content, "application/pdf"))

# 2. Parse with LlamaParse — returns structured markdown
job = llama.parsing.parse(file_id=file.id)
# ... poll until complete ...
markdown = llama.parsing.result_markdown(job.id)

The result is clean, structured markdown that represents the paycheck's layout. Pay period, gross wages, deductions, net pay, and year-to-date summaries are all preserved in a format the LLM can reason over accurately.

After parsing, we immediately write the FGA tuple:

fga_client.write(
    body=ClientWriteRequest(
        writes=[
            ClientTuple(user=f"user:{user_id}", relation="owner", object=f"paycheck:{file.id}")
        ]
    )
)

Authorization is established at upload time before any query can reach the document.

Authorization-Aware RAG Orchestration with LlamaIndex Workflows

LlamaIndex Workflows provide a typed, async-first framework for building multi-step AI pipelines. This makes the authorization logic explicit and auditable. It is not hidden in middleware and is a named step in the pipeline.

Our workflow has two steps: retrieve and synthesize.

Step 1: Retrieve

@step
async def retrieve(self, ev: InputEvent) -> RetrievedEvent:
    retriever = LlamaCloudRetriever(user_id=ev.user_id)
    fga_retriever = FGARetriever(retriever, ...)
    
    nodes = await fga_retriever.aretrieve(ev.query)
    members = await get_department_members(ev.user_id)
    
    return RetrievedEvent(
        query=ev.query,
        user_id=ev.user_id,
        nodes=nodes,
        members=members,
    )

The retrieval step uses two layers of FGA authorization:

Layer 1 — list_objects: Ask FGA "Which paychecks can this user view?" and get back a list of authorized document IDs. In this demo, we're using list_objects, but this could be any database query or API call that returns a set of documents to work with.

Layer 2 — batch_check (via FGARetriever from the auth0-ai-llamaindex SDK): For each document in the retrieved set, verify that the user has can_view access before it is sent to the LLM.

Strictly speaking, this second check is redundant here — list_objects already guarantees the results are scoped to the user. But it's good practice in any RAG pipeline: if you ever swap out the list_objects call for a different data source or API, the per-document authorization check stays in place and your authorization logic doesn't silently break.

The FGARetriever is a drop-in wrapper from the auth0-ai-llamaindex SDK:

from auth0_ai_llamaindex.fga import FGARetriever

fga_retriever = FGARetriever(
    retriever=my_retriever,
    build_query=lambda node: {"user": user_id, "relation": "can_view", "object": f"paycheck:{node.id_}"}
)

It intercepts every document before it reaches downstream steps and filters out any unauthorized content.

Step 2: Synthesize

@step
async def synthesize(self, ev: RetrievedEvent) -> StopEvent:
    context = build_prompt(ev.nodes, ev.members, ev.user_id)
    response = await self.llm.acomplete(context + ev.query)
    return StopEvent(result=response.text)

By the time we reach synthesis, every document in ev.nodes has passed two independent FGA checks. The LLM is only ever given authorized content. We can prompt Claude confidently, knowing the context window is clean.

Putting It All Together

Here's the end-to-end flow for an insights query:

  1. API call to the /pay/insights endpoint, which follow the payload reported here: POST /pay/insights Authorization: Bearer <jwt> { "query": "How has my gross pay changed this year?" }
  2. Extract user_id from JWT sub claim
  3. Start LlamaIndex Workflow

    • [Retrieve Step]

         FGA list_objects  → get authorized paycheck IDs  
         LlamaCloud fetch  → get parsed markdown for each ID  
         FGA batch_check   → verify each document (defense-in-depth)  
         FGA list_objects  → get department members (if manager)
    • [Synthesize Step]

         Build prompt with authorized documents  
         Add department context (if manager)  
         Claude synthesizes the answer
  4. Return the final answer to the user

The API endpoint’s access token (JWT) is the only trust boundary with the outside world. Within the system, every access decision is made by Auth0 FGA, and the Workflow structure prevents skipping the authorization steps.

Why This Combination Works

LlamaParse solves the document ingestion problem. PDFs become structured, LLM-ready text without losing the semantic content of tables, headers, and multi-section layouts.

LlamaIndex Workflows solve the orchestration problem. The retrieval and synthesis steps are explicit, typed, and async-first, making it natural to insert authorization checks as named steps rather than cross-cutting concerns.

Auth0 FGA solves the authorization problem. Complex organizational policies (employee owns paycheck and manager sees team) are expressed as a relationship model, not as application code. The rules live in one place and are enforced consistently everywhere.

Together, they form a stack where security is structural. You cannot "forget" to check authorization because the check is baked into the retrieval step. You cannot accidentally expose a document because it has to pass two independent FGA checks before the LLM sees it.

Getting Started

The full source code for this demo is available in this GitHub repository. To run it yourself, you will need:

  1. Clone the repo, copy .env.example to .env
  2. Fill in your credentials
  3. Run uv run scripts/setup_fga.py to initialize the authorization model
  4. uv run run-server to start the API.

The patterns (LlamaParse for ingestion, FGA for relationship-based authorization, and Workflows for typed pipeline orchestration) apply directly to any domain where documents have owners and access rules. The same architecture works for all of them: HR records, financial reports, legal documents, and customer support tickets.

Learn More About Auth0 and LlamaIndex

If you want to explore more of the LlamaIndex ecosystem and how you can use it to build document agents, you can always refer to the official documentation, and you can find more technical insights and deep dives from the LlamaIndex blog page.

To learn more about Auth0 FGA and how to implement relationship-based access control to secure your AI agents, visit the official Auth0 FGA page. The complete source code for this demo is available in this GitHub repository, where you can also report any setup issues or bugs.