Fine-Grained Machine-to-Machine Token Quotas Early Access

The Client Credentials Flow enables Machine-to-Machine (M2M) applications, such as CLIs, daemons, or backend services, to obtain access tokens on their own behalf without any user interaction. 

Implementing access token caching best practices can minimize round trips to Auth0 and control the volume of token issuance. However, some applications, such as third-party systems or those in large, complex deployments, might not implement proper caching and exhibit bursty behavior. This can lead to Auth0 generating a large volume of M2M tokens in short periods of time, impacting your tenant's overall M2M token quota and causing unexpected service behavior.  

Auth0's Fine-grained M2M token quotas enable you to set hourly and daily M2M access token limits for applications and Auth0 Organizations. This provides granular operational control over token issuance to prevent excessive token consumption. 

Use cases

  • Control runaway applications: Apply application-level quotas to limit M2M applications from requesting tokens too frequently (due to bugs, no caching, etc.) without impacting other services or your tenant’s overall M2M token quota.

  • Ensure fair multi-tenant usage: Assign organization-level quotas to different customers (Auth0 Organizations) to prevent one from degrading service/quota for others.

  • Set tenant-wide safeguards: Use tenant-level default quotas as a baseline limit against excessive M2M token requests from applications and organizations.

  • Monitor before enforcement: Deploy quotas with enforce: false to observe token consumption via Consumption warning logs and Auth0 quota headers. Then, set informed limits before active blocking.

How it works

You can apply fine-grained M2M token quotas to the following entities:

  • Application: Limits tokens generated for a client to control individual application behavior.

  • Organization: Limits tokens for requests made with an organization to help manage M2M activity for the SaaS customer or business unit across applications utilizing tokens associated with an org_id.

Auth0 supports two types of fine-grained M2M token quotas:

  • Hourly: The maximum number of tokens that can be obtained within a single hour. The quota resets at the beginning of each UTC hour.

  • Daily: The maximum number of tokens that can be obtained within a 24-hour period. The quota resets at the beginning of each UTC day.

Quota evaluation and enforcement rules

Auth0 determines and enforces fine-grained M2M token quotas for applications and organizations associated with your tenant:

  1. Client-Level Quota:

    • If you have configured an Application-Specific Quota for the client_id, Auth0 applies the quota to the client.

    • Otherwise, Auth0 applies the Tenant-Level Default Quota for Clients to the client. If you have not configured a tenant-level default for clients, Auth0 does not apply a fine-grained quota to the client.

  2. Organization-Level Quota:

    • If you have configured an Organization-Specific Quota for the org_id associated with the token request, Auth0 applies the quota to the organization.

    • Otherwise, Auth0 applies the Tenant-Level Default Quota for Organizations to the org_id associated with the token request. If you have not configured a tenant-level default for Organizations, Auth0 does not apply a fine-grained quota for the org_id associated with the token request.

  3. Depending on whether the token request has an associated organization, Auth0 checks for an applicable quota:

    • If the token request has an associated organization defined explicitly or via default organization settings, Auth0 checks the client-level and organization-level quotas concurrently.

    • If the token request does not have an associated organization, Auth0 only checks the client-level quota.

Auth0 enforces quotas if you set the enforce flag to true in the quota’s configuration:

  • When a quota is enforced (enforce: true): If an enforced quota is exceeded, the token request is rejected with an HTTP 429 Too Many Requests error. To learn more, read Error responses for exceeded quotas.

  • When a quota is not enforced (enforce: false): Token requests are not rejected by this quota, even if its limit is exceeded. Auth0 still counts tokens and generates Consumption warning logs at thresholds, allowing for monitoring before enabling enforcement.

Observability

To provide visibility into quota consumption, Auth0 generates Consumption warning logs. These logs are triggered when a quota reaches 60%, 80%, and 100% of its limit. You can use these logs to proactively monitor token usage and identify potential issues.

Applications can also programmatically determine their remaining quota using Auth0 quota headers. These HTTP headers are included in both successful and error responses to Client Credentials Flow requests, providing real-time information on quota usage.

Configure M2M token quotas

You can configure Fine-Grained M2M Token Quotas using the Management API.

Tenant-level default quotas

Tenant-level quotas serve as the default settings for all applications and organizations within your Auth0 tenant. You can use tenant-level quotas to establish baseline limits for M2M token usage. These quotas are overridden by application-specific and organization-specific quotas.

You can specify default token quotas for applications and/or organizations associated with your tenant. To configure tenant-level quotas, use the Update a tenant endpoint:

curl --request PATCH 'https://YOUR_DOMAIN/api/v2/tenants/settings' \
  --header 'Authorization: Bearer YOUR_MANAGEMENT_API_TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{
    "default_token_quota": {
      "clients": {
        "client_credentials": {
          "per_hour": 100,
          "per_day": 1000,
          "enforce": true
        }
      },
      "organizations": {
        "client_credentials": {
          "per_hour": 500,
          "per_day": 5000,
          "enforce": false
        }
      }
    }
  }'

Was this helpful?

/

The default_token_quota object contains the following application-level and organization-level token quota configurations:

  • clients.client_credentials: (Optional) Defines M2M token quotas for individual applications.

  • organizations.client_credentials: (Optional) Defines M2M token quotas for organizations.

  • per_hour: (Optional) Sets the hourly token quota. Use when per_day is not specified.

  • per_day: (Optional) Sets the daily token quota. Use when per_hour is not specified.

  • enforce: (Optional) Determines whether token requests are rejected when the quota is exceeded. The default is true.

In the example, all applications in the tenant are limited by default to 100 M2M tokens per hour and 1000 per day. For Organizations, the default is 500 tokens per hour and 5000 per day, but enforcement is currently disabled (enforce: false). This allows you to monitor organization token usage before enforcing limits. 

Application-specific quotas

You can configure quotas for specific applications, overriding the tenant-level defaults. This is useful for fine-grained control over individual application behavior.

You can set application-specific quotas when creating or updating an application.

To create an application with a token quota, use the Create a client endpoint:

curl --location 'https://YOUR_DOMAIN/api/v2/clients' \
--header 'Authorization: Bearer YOUR_MANAGEMENT_API_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
    "name": "APP_NAME",
    "app_type": "non_interactive",
    "token_quota": {
      "client_credentials": {
        "per_hour": 10,
        "per_day": 50,
        "enforce": true
      }
    }
}'

Was this helpful?

/

To update an application's token quota, use the Update a client endpoint:

curl --location --request PATCH 'https://YOUR_DOMAIN/api/v2/clients/a01FUMJHEtb0q8jcXm7y2k9EAGe5fcxZ' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer YOUR_MANAGEMENT_API_TOKEN' \
--data '{
    "token_quota": {
        "client_credentials": {
            "per_hour": 5,
            "enforce": true
        }
    }
}'

Was this helpful?

/

The token_quota object contains the token quota configuration for the application:

  • client_credentials: Defines M2M token quotas.

  • per_hour: Sets the hourly token quota.

  • per_day: Sets the daily token quota.

  • enforce: Determines whether token requests are rejected when the quota is exceeded. The default is true.

In this example, the application with YOUR_CLIENT_ID is limited to 10 M2M tokens per hour and 50 per day, overriding any tenant-level settings.

You can retrieve an application's token quota configuration using the Get a client by ID endpoint.

Organization-specific quotas

You can also configure quotas for specific organizations, overriding tenant-level defaults. This allows you to manage M2M token usage for individual SaaS customers or partners.

You can set organization-specific quotas when creating or updating an organization.

To create an organization with a token quota, use the Create an organization endpoint:

curl --location 'https://YOUR_DOMAIN/api/v2/organizations' \
--header 'Authorization: Bearer YOUR_MANAGEMENT_API_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
    "name": "acme",
    "display_name": "Acme", 
    "token_quota": {
      "client_credentials": {
        "per_hour": 50,
        "per_day": 250,
        "enforce": true
      }
    }
}'

Was this helpful?

/

To update an organization's token quota, use the Update an organization endpoint:

curl --request PATCH 'https://YOUR_DOMAIN/api/v2/organizations/YOUR_ORG_ID' \
--header 'Authorization: Bearer YOUR_MANAGEMENT_API_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
    "token_quota": {
        "client_credentials": {
            "per_hour": 50,
            "per_day": 250,
            "enforce": false
        }
    }
}'

Was this helpful?

/

The token_quota object contains the token quota configuration for the organization:

  • client_credentials: Defines M2M token quotas.

  • per_hour: Sets the hourly token quota.

  • per_day: Sets the daily token quota.

  • enforce: Determines whether token requests are rejected when the quota is exceeded. The default is true.

In this example, the organization with YOUR_ORG_ID is limited to 50 M2M tokens per hour and 250 per day.

You can retrieve an organization's token quota configuration using the Get an organization endpoint.

Monitor token quota consumption

Auth0 quota headers

Auth0 includes HTTP headers in Client Credentials Flow responses (both successful and 429 error responses) to provide real-time information about quota consumption:

  • Auth0-Client-Quota-Limit: Provides quota information for the application.

  • Auth0-Organization-Quota-Limit: Provides quota information for the organization.

Auth0 only includes the headers corresponding to the quotas for the token request in the response.

The following code sample is an example Auth0 quota header:

Auth0-Client-Quota-Limit: b=per_hour;q=10;r=7;t=3540,b=per_day;q=50;r=47;t=43200
Auth0-Organization-Quota-Limit: b=per_hour;q=50;r=47;t=3540,b=per_day;q=250;r=247;t=43200

Was this helpful?

/

The header values are comma-separated lists of quota buckets. Each bucket is represented as a semicolon-separated list of key-value pairs:

  • b (bucket_name): The name of the quota bucket (per_hour or per_day).

  • q (quota): The configured quota limit for the bucket.

  • r (remaining): The number of remaining tokens in the bucket.

  • t (time): The number of seconds until the bucket resets.

In the Auth0-Client-Quota-Limit example:

  • The application has an hourly quota (b=per_hour) of 10 tokens (q=10). It has 7 tokens remaining (r=7), and the quota resets in 3540 seconds (t=3540).

  • The application also has a daily quota (b=per_day) of 50 tokens (q=50). It has 47 tokens remaining (r=47), and the quota resets in 43200 seconds (t=43200).

Auth0 SDKs provide built-in utilities for reading and parsing quota headers, enabling apps to utilize real-time quota information for debugging, implementing slow-down strategies, and more. The following is an example using Node.js and the node-auth0 SDK:

const {
    AuthenticationClient,
    HttpResponseHeadersUtils,
  } = require("auth0");

const auth0 = new AuthenticationClient({
    domain: '{{YOUR_DOMAIN}}',
    clientId: '{{YOUR_CLIENT_ID}}',
    clientSecret: '{{YOUR_SECRET}}'
});

async function getAccessToken() {
    try {
        //get the token
        const response = await auth0.oauth.clientCredentialsGrant({
            audience: '{{YOU_AUDIENCE}}'
        });

        // Surface M2M token quota information to your application to monitor 
        const clientQuota = HttpResponseHeadersUtils.getClientQuotaLimit(response.headers);
        console.log("clientQuota ", clientQuota);
        console.log("clientQuota per day - quota:", clientQuota?.perDay?.quota);
        console.log("clientQuota per day - remaining:", clientQuota?.perDay?.remaining);
        console.log("clientQuota per day - reset after:", clientQuota?.perDay?.resetAfter);
        console.log("clientQuota per hour - quota:", clientQuota?.perHour?.quota);
        console.log("clientQuota per hour - remaining:", clientQuota?.perHour?.remaining);
        console.log("clientQuota per hour - reset after:", clientQuota?.perHour?.resetAfter);

        return response.data.access_token;

    } catch (error) {

        //handle error here

    }
}

getAccessToken();

Was this helpful?

/

Error responses for exceeded quotas

When an enforced quota is exceeded, the Auth0 Authentication API returns an HTTP 429 Too Many Requests error. In the response body, Auth0 returns the error code with a more detailed description. Auth0 also issues an event log of type feccft (for Failed exchange of Access Token for a Client Credentials Grant).

The following code sample is an example error response for an exceeded quota:

{
  "error": "too_many_requests",
  "error_description": "Client quota exceeded"
}

Was this helpful?

/

In addition to the response body, Auth0 returns the following headers:

  • Auth0-Client-Quota-Limit or Auth0-Organization-Quota-Limit: The Auth0 quota header corresponding to the consumed quota for the application or the organization. To learn more about the header format, read Auth0 quota headers.

  • X-RateLimit-Limit: The configured limit for the quota that has been consumed.

  • X-RateLimit-Remaining: Set to zero, indicating that the quota has been fully consumed.

  • X-RateLimit-Reset: A UNIX timestamp (in seconds) representing the time when the quota is expected to reset and further requests will be allowed.

  • Retry-After: The number of seconds until the quota resets and further requests will succeed. 

To learn more about rate-limiting headers, read Predict when requests to a tenant will be rate-limited.

Use the Retry-After header to determine the wait time for obtaining a new M2M token, allowing your application to decide whether to retry or throw an error. This is also important to avoid consuming the rate limit of your tenant.

The following code sample is a Node.js example that tries to get a new M2M token, receives a 429 error, and re-tries again if the wait time is less than 60 seconds. Otherwise, it throws an error:

const {
    AuthenticationClient,
    HttpResponseHeadersUtils,
  } = require("auth0");

const auth0 = new AuthenticationClient({
    domain: '{{YOUR_DOMAIN}}',
    clientId: '{{YOUR_CLIENT_ID}}',
    clientSecret: '{{YOUR_SECRET}}'
});

const MAX_RETRY_AFTER = 60;

async function sleep(seconds) {
    return new Promise(resolve => setTimeout(resolve, seconds * 1000));
}

async function getAccessToken(retry = true) {
    try {
        //get the token
        const response = await auth0.oauth.clientCredentialsGrant({
            audience: '{{YOU_AUDIENCE}}'
        });

        return response.data.access_token;

    } catch (error) {

        if (error.statusCode === 429 && retry) {
            // if there is a 429 error, compute the time I need to wait to get a new token
            const retryAfter = parseInt(error.headers.get('retry-after'), 10);
            if (retryAfter < MAX_RETRY_AFTER) {
                // Retry is less than a max number of seconds
                console.warn(`Rate limited. Retrying in ${retryAfter} seconds...`);
                await sleep(retryAfter);
                return getAccessToken(false); // Retry only once
            }
        }

        console.error('Error fetching access token:', error.response?.data || error.message);
    }
}

getAccessToken();

Was this helpful?

/

Consumption warning logs

Auth0 generates token_quota_consumption_warning log events when the consumption for a quota reaches 60%, 80%, and 100%. You can analyze these logs to monitor token usage patterns, identify potential issues before limits are strictly enforced, and help you decide on appropriate quota values.

The following code sample is an example consumption warning log as a JSON log entry:

{
  "date": "2025-05-08T08:39:10.838Z",
  "type": "token_quota_consumption_warning",
  "description": "60% of client per_day quota consumed",
  "connection_id": "",
  "client_id": "QAxE5Z8LrvmQ2jxlzzEACeo39hO3xjFV",
  "client_name": "My_M2M_App",
  "ip": "xxxx",
  "client_ip": "xxxxx",
  "user_agent": "Other 0.0.0 / Other 0.0.0",
  "details": {
    "bucket": "per_day",
    "entity_type": "client",
    "entity_id": "QAxE5Z8LrvmQ2jxlzzEACeo39hO3xjFV",
    "quota": 15,
    "quota_consumption_percentage": 60,
    "quota_consumption": 11
  },
  "hostname": "xxxxx",
  "user_id": "",
  "user_name": "",
  "$event_schema": {
    "version": "1.0.0"
  },
  "log_id": "90020250508083910862978000000000000001223372036854918178",
  "tenant_name": "lozano",
  "_id": "90020250508083910862978000000000000001223372036854918178",
  "isMobile": false,
  "id": "90020250508083910862978000000000000001223372036854918178"
}

Was this helpful?

/

The example consumption warning log contains the following fields: 

  • type: token_quota_consumption_warning. Identifies the warning for M2M token quota consumption that’s reaching a threshold.

  • description: (String) Human-readable summary in the format of <percent>% <entity_type> <bucket> quota consumed. For example, "60% of client per day quota consumed."

  • details: (Object) Describes specifics about the quota and its consumption:

    • bucket: (String) per_day or per_hour.

    • entity_type: (String) client or organization.

    • entity_id: (String) ID of the client or organization.

    • quota: (Integer) Configured limit for this bucket.

    • quota_consumption_percentage: (Integer) Percentage consumed (e.g., 60, 80, 100).

    • quota_consumption: (Integer) Actual tokens counted.

Other fields like date, client_name, ip, and log_id are standard Auth0 log fields.

Best practices

  • Prioritize Token Caching: Emphasize proper access token caching in M2M applications. Quotas are safeguards, not replacements for efficient token management.

  • Appropriate Quota Levels: Set quotas based on legitimate traffic patterns and reasonable burst capacity.

  • Monitor Before Enforcing: Start with enforce: false to observe usage via logs and headers, then set informed limits before enabling enforce: true.

  • Client-Side Handling: Applications should handle 429 errors gracefully using Retry-After for backoff strategies. 

  • Proactively check quota headers: Monitor token usage to identify potential issues.

  • Review Logs Regularly: Monitor token_quota_consumption_warning logs to adjust quotas proactively.

Limitations

  • Not API Call Limits: Fine-Grained M2M Token Quotas apply to the number of access tokens obtained via Client Credentials Flow, not calls made to your APIs using those tokens.

  • Eventual Consistency: Token counting is eventually consistent; minor, brief overruns are possible before enforcement on subsequent requests.

Troubleshoot

  • Legitimate Traffic Blocked:

    • Consider setting enforce: false.

    • Check Auth0 tenant logs for token_quota_consumption_warning and failed exchange events (feccft).

    • Examine Auth0-...-Quota-Limit headers in responses.

    • Review application token request patterns and caching.

    • Consider if the quota limit needs adjustment.

  • Interpret 429 Errors: Use X-RateLimit-*, Retry-After, and the specific Auth0-...-Quota-Limit headers to identify the consumed limit and reset time.