BYOK · per-tenant keys · v2.0

Bring your own key (BYOK)

Multi-tenant SaaS apps can charge AI traffic to each tenant’s own provider account — OpenAI, Anthropic, Azure OpenAI — instead of routing everything through a shared host key. The built-in IAiKeyVault abstraction is the seam.

How it works

On every AI request, the resolver consults IAiKeyVault.GetKeyAsync with an AiKeyRequestContext that carries the active HttpContext, provider name, and AI mode. Your vault implementation extracts the tenant id (typically from a claim) and returns the right AiKeyMaterial — an ApiKey plus an optional KeyId for audit attribution.

The default OptionsBackedAiKeyVault returns Empty, so resolvers fall back to AiResolverOptions.ApiKey. Non-BYOK customers see zero behaviour change.

The interface

public interface IAiKeyVault
{
    Task<AiKeyMaterial> GetKeyAsync(AiKeyRequestContext context, CancellationToken ct = default);
}

public sealed class AiKeyMaterial
{
    public string ApiKey { get; init; }       // the secret (never logged)
    public string? KeyId { get; init; }       // non-sensitive id for audit

    // Azure-only per-tenant overrides:
    public string? AzureEndpoint { get; init; }
    public string? AzureDeploymentName { get; init; }
    public string? AzureApiVersion { get; init; }
}

Wiring up the in-memory reference vault

InMemoryAiKeyVault implements both IAiKeyVault and IAiKeyVaultAdmin. State vanishes on restart; not for production.

builder.Services.AddRichTextBox();

builder.Services.AddSingleton<InMemoryAiKeyVault>();
builder.Services.AddSingleton<IAiKeyVault>(sp => sp.GetRequiredService<InMemoryAiKeyVault>());
builder.Services.AddSingleton<IAiKeyVaultAdmin>(sp => sp.GetRequiredService<InMemoryAiKeyVault>());

builder.Services.AddRichTextBoxOpenAiResolver(opts =>
{
    // No baked-in ApiKey — the vault provides per-tenant keys.
    opts.AllowEmptyApiKey = true;
    opts.Model = "gpt-4o-mini";
});

// Admin endpoints behind your auth middleware (off by default).
app.MapRichTextBoxUploads();
app.MapRichTextBoxAiKeyVaultAdmin().RequireAuthorization("AdminOnly");

Admin REST endpoints

RouteBody / QueryReturns
POST /richtextbox/ai/vault/keys{ tenantId, provider, apiKey, keyId?, azureEndpoint?, azureDeploymentName?, azureApiVersion? }200 + entry metadata (no secret)
GET /richtextbox/ai/vault/keys?tenant=<id>Optional filter200 + array of entries
DELETE /richtextbox/ai/vault/keys/{keyId}204 / 404

Browser-side: typed admin client

The npm package ships a typed TS client at @richscripts2/richtexteditor/admin — tenant-settings UIs (where customers paste their keys) call into it instead of hand-rolling fetch.

import { createAdminClient } from "@richscripts2/richtexteditor/admin";

const admin = createAdminClient({
    baseUrl: "/richtextbox/ai/vault/keys",
    fetch: (url, init) => fetch(url, {
        ...init,
        headers: { ...init?.headers, "Authorization": "Bearer " + adminJwt },
    }),
});

await admin.upsert({ tenantId: "acme", provider: "OpenAI", apiKey: "sk-..." });
const keys = await admin.list("acme");
await admin.delete(keys[0].keyId);
Auth. The admin endpoints are off by default. The host calls app.MapRichTextBoxAiKeyVaultAdmin() explicitly and must wrap the route group in their own auth middleware. The library does not assume an auth model.

Audit logging

Vault hits and misses emit structured EventIds under category RichTextBox.Audit:

EventIdNameFires when
8204AiKeyVaultMissVault returned Empty + no fallback ApiKey → client sees friendly “AI not configured”.
8205AiKeyVaultHitVault returned a key. KeyId is logged; the secret never is.
Never log the secret. The library logs only KeyId on hit. Custom vault implementations should follow the same discipline.

Per-call cost attribution

BYOK pairs naturally with the per-call cost ledger. Implement IRichTextBoxAiCostSink and you’ll get an AiUsageRecord per AI call — provider, model, mode, input/output/total tokens, latency, and the KeyId from the vault hit. Forward to your billing system for chargeback:

public sealed class BillingAiCostSink : IRichTextBoxAiCostSink
{
    public Task RecordAsync(AiUsageRecord record, CancellationToken ct = default)
    {
        return _billing.EnqueueAsync(new
        {
            keyId  = record.KeyId,
            model  = record.Model,
            tokens = record.TotalTokens,
            mode   = record.Mode,
            ts     = record.TimestampUtc,
        }, ct);
    }
}

Production: Redis-backed vault

For multi-instance deployments where the in-memory reference doesn’t fit and Azure Key Vault is overkill, a Redis-backed implementation gives you persistence + cross-instance consistency in ~50 lines. After dotnet add package StackExchange.Redis:

public sealed class RedisAiKeyVault : IAiKeyVault, IAiKeyVaultAdmin
{
    private readonly IConnectionMultiplexer _redis;
    private readonly IHttpContextAccessor _http;
    private const string KeyPrefix = "rtb:vault:";

    public RedisAiKeyVault(IConnectionMultiplexer redis, IHttpContextAccessor http)
    {
        _redis = redis; _http = http;
    }

    public async Task<AiKeyMaterial> GetKeyAsync(AiKeyRequestContext ctx, CancellationToken ct = default)
    {
        var tenant = _http.HttpContext?.User.FindFirstValue("tenant_id");
        if (string.IsNullOrEmpty(tenant)) return AiKeyMaterial.Empty;

        var raw = await _redis.GetDatabase().StringGetAsync(KeyPrefix + tenant + ":" + ctx.Provider);
        if (raw.IsNullOrEmpty) return AiKeyMaterial.Empty;
        return JsonSerializer.Deserialize<AiKeyMaterial>(raw!) ?? AiKeyMaterial.Empty;
    }
    // UpsertAsync / ListAsync / DeleteAsync omitted for brevity. See
    // richtextbox.com/ByokVault for the full implementation.
}
Encryption at rest. The snippet above stores plaintext keys in Redis. For PCI / SOC2 / HIPAA workloads, layer IDataProtectionProvider on top — encrypt before StringSetAsync, decrypt after StringGetAsync. The shipped FileBackedAiKeyVault is a good reference for the encrypt/decrypt envelope.

Production: Azure Key Vault

For HIPAA / PCI / SOC2 compliance, persist keys in Azure Key Vault and let the platform handle KMS-backed encryption, access policies, audit logging, and rotation. Add SDK packages Azure.Security.KeyVault.Secrets + Azure.Identity:

public sealed class AzureKeyVaultAiKeyVault : IAiKeyVault, IAiKeyVaultAdmin
{
    private readonly SecretClient _client;
    private readonly IHttpContextAccessor _http;

    public AzureKeyVaultAiKeyVault(SecretClient client, IHttpContextAccessor http)
    {
        _client = client; _http = http;
    }

    public async Task<AiKeyMaterial> GetKeyAsync(AiKeyRequestContext ctx, CancellationToken ct = default)
    {
        var tenantId = _http.HttpContext?.User.FindFirstValue("tenant_id");
        if (string.IsNullOrEmpty(tenantId)) return AiKeyMaterial.Empty;

        try
        {
            var secret = await _client.GetSecretAsync(
                $"rtb-{ctx.Provider.ToLowerInvariant()}-{tenantId}", cancellationToken: ct);
            return new AiKeyMaterial
            {
                ApiKey = secret.Value.Value,
                KeyId  = secret.Value.Properties.Version,
            };
        }
        catch (RequestFailedException ex) when (ex.Status == 404)
        {
            return AiKeyMaterial.Empty;
        }
    }
    // UpsertAsync / ListAsync / DeleteAsync — see richtextbox.com/ByokVault.
}

builder.Services.AddSingleton(_ => new SecretClient(
    new Uri(builder.Configuration["AzureKeyVault:Uri"]!),
    new DefaultAzureCredential()));
Cost & latency. Every AI request triggers a Key Vault read. For high-traffic tenants, layer an IMemoryCache with a short TTL (~60 s) on top so a key rotation propagates within a minute or two while not paying Key Vault on every call.

Cost-ledger batching

For high-traffic tenants, the per-call IRichTextBoxAiCostSink can paper-cut a downstream billing API. Wrap it with BufferingAiCostSink — records accumulate in a bounded queue and a background drainer flushes them in batches:

builder.Services.AddSingleton<IRichTextBoxAiCostSink>(sp =>
    new BufferingAiCostSink(
        inner:         new MyBillingApiCostSink(sp.GetRequiredService<HttpClient>()),
        flushInterval: TimeSpan.FromSeconds(30),
        maxBatchSize:  500,
        logger:        sp.GetService<ILogger<BufferingAiCostSink>>()));

RecordAsync never blocks — queue overflow drops records and bumps an observable DroppedCount metric. Inner-sink failures are caught per record so one bad write doesn’t poison the batch. DisposeAsync drains pending records before returning.

Health probe for the AI dependency

/richtextbox/health reports license + service-registration status by default. To also assert the configured AI provider is actually reachable, register a resolver implementing IRichTextBoxAiResolverProbe and enable the probe:

builder.Services.AddRichTextBox(opts =>
{
    opts.AiResolverHealthProbeEnabled = true;
    opts.AiResolverHealthProbeTimeout = TimeSpan.FromSeconds(3);
});

The endpoint reports 503 with { status: "ai_unreachable", aiResolverProbe: { ... } } on probe failure. Built-in resolvers don’t implement the probe by default — spending tokens on health checks is an explicit ops decision — but a custom resolver can wrap one (typically by hitting a cheap reachability endpoint like OpenAI’s GET /v1/models).

Output filtering: PII redaction, link rewriting, content blocking

IRichTextBoxAiResponseFilter runs after the resolver but before the response goes to the wire. Customers redact PII / mask secrets / rewrite links / block policy violations without re-implementing every resolver. The reference RegexAiResponseFilter covers ~80% of practical PII redaction:

var emailMask = new Regex(@"[\w.+-]+@[\w-]+\.[\w.-]+", RegexOptions.IgnoreCase);
var ssnMask   = new Regex(@"\b\d{3}-\d{2}-\d{4}\b");

builder.Services.AddSingleton<IRichTextBoxAiResponseFilter>(
    new RegexAiResponseFilter(
        (emailMask, "[email-redacted]"),
        (ssnMask,   "[ssn-redacted]")));

Filters compose linearly — each receives the previous filter’s output. Throwing filters are caught and logged but don’t take AI traffic down. EventId 8206 AiResponseFilterMatched fires when a filter mutated the response (the filter type is logged, never the body).

Companion docs

See AI providers & streaming for the resolver wiring, and the Configuration doc for the full options surface.