How I Secured an iOS AI App Without Shipping API Keys

How I Secured an iOS AI App Without Shipping API Keys

15 min read
by Ufuk Ozen
iOS
AI
Cloudflare Workers
Durable Objects
App Attest
API Security
Indie Hackers
Mobile Backend

A production-minded guide for vibe coders and indie hackers on how to secure API keys in an iOS AI app using Cloudflare Workers, Durable Objects, and Apple App Attest.

Shipping provider keys in a mobile AI app means exposing your bill to the internet.

The usual first version of an AI feature looks like this:

client -> provider API

That is fast to ship, easy to demo, and expensive to defend. It may look like product velocity, but it is really architectural debt with a nice UI on top. client -> provider API is not a product; it is an expensive demo.

When teams hit bad output, latency spikes, or runaway invoices, they often blame the model. In most mobile AI apps, that diagnosis is wrong. The real issue is that the request path was never designed for production. The problem is not the model, the problem is the architecture.

This is the difference between a toy wrapper and a system you can actually grow:

  • A toy app sends user input straight to the model provider and hopes abuse stays low.
  • A production app inserts a policy layer between the client and the provider.
  • A durable business treats identity, rate limits, retries, quotas, and billing exposure as architecture, not as follow-up tasks.

That is the layer I built for an iOS AI app: Cloudflare Workers at the edge, Durable Objects for stateful control, and Apple App Attest to make the request path harder to fake and more expensive to abuse.

Why Direct Provider Access Fails in Production

There is a specific reason mobile AI apps keep getting into trouble: the mobile client is not a trusted place to store secrets or enforce policy.

You can obfuscate a key, split it across files, hide it behind build flags, or rename variables until the code looks unrecognizable. None of that changes the threat model. If the app can recover the secret at runtime, a motivated attacker can recover it too.

Once the provider key lives in the app bundle or device memory, several problems appear immediately:

  • Your cost controls are weak because the provider sees a valid key, not your product rules.
  • Your abuse surface expands because requests can be replayed, scripted, or repackaged.
  • Your release cycle slows down because architecture mistakes now require app updates.
  • Your observability is incomplete because provider logs do not explain product intent, user tier, or app state.

This is why so many indie AI apps feel fine at ten users and chaotic at one thousand. The architecture scales the blast radius before it scales the product.

The Anti-Pattern

Before talking about the fix, it helps to name the failure mode clearly:

  • secret in client
  • control in client
  • cost risk in client
  • abuse surface in client

If all four live in the same place, you do not have an AI backend. You have a public spending endpoint.

What I Built Instead

I moved provider access out of the app entirely and treated the backend as a request-control layer, not a relay.

The high-level flow looks like this:

example.txt
1iOS app 2 -> signed request + App Attest assertion 3 -> Cloudflare Worker 4 -> input validation and policy checks 5 -> Durable Object for quota, replay, and idempotency state 6 -> provider adapter with server-side secret 7 -> AI provider
TEXT
UTF-8

That change does three things at once:

  1. Provider keys never ship to the client.
  2. Every expensive call passes through a policy layer I control.
  3. The app can prove more about where a request came from before money leaves the system.

The result is not "perfect security." Mobile does not give you that. The result is a much better asymmetry: legitimate users have a smooth path, while abuse gets slower, noisier, and easier to shut down.

1. Cloudflare Worker as a Policy Edge, Not a Thin Proxy

The Worker is the real boundary. I did not build a generic pass-through that forwards arbitrary JSON to a provider. That would only move the API key from the client to the edge without fixing the underlying architecture.

Instead, the Worker acts as a policy engine.

Before it calls any model, it can:

  • authenticate the session
  • validate App Attest material
  • enforce allowed models and tools
  • clamp token budgets and context size
  • normalize request shape
  • inject server-side provider credentials
  • attach app version, user tier, and internal request identifiers
  • reject malformed, oversized, or suspicious traffic

That last point matters. Many AI incidents are not "hacks" in the dramatic sense. They are boring request-shaping failures:

  • giant prompts that explode token cost
  • uncontrolled retry storms
  • attachments that should have been blocked earlier
  • repeated background calls from broken clients
  • model/tool combinations the product never intended to allow

Prompt engineering does not win. Request engineering does.

That sentence sounds small, but it changes where the work goes. The main leverage in a production AI app is not clever prompt wording. It is the system that controls identity, context, retries, budgets, routing, and tool access around the prompt.

2. Durable Objects for Idempotency, Quotas, and Replay Control

Stateless edges are great until you need to coordinate expensive requests. AI workloads almost always do.

I used Durable Objects for the state that decides whether a request should run at all. That includes:

  • per-user or per-install request windows
  • quota counters
  • soft budget tracking
  • temporary blocks for suspicious behavior
  • idempotency keys for safe retries
  • replay detection for signed requests

Idempotency is especially important on mobile. Connections drop. Users retry. The OS resumes work. A weak client can easily send the same expensive call multiple times.

The pattern here is simple:

  1. The app generates an idempotency key per user action.
  2. The Worker routes that request to the correct Durable Object shard.
  3. The Durable Object checks whether the request is new, in flight, completed, or conflicting.
  4. Only one provider call is allowed to proceed for that key.

If the same key arrives again with the same request fingerprint, I can return the existing result or the in-flight state. If the same key arrives with different input, that is not a retry; that is a conflict and should be rejected.

This sounds operational, but it is directly tied to cost control. Duplicate inference calls are not a UX bug. They are a billing bug.

3. Apple App Attest to Raise the Cost of Abuse

Apple App Attest is not magic and it is not a replacement for backend controls. What it does well is make it harder to impersonate a legitimate app install at scale.

That matters because without something like App Attest, the backend is forced to trust whatever client metadata shows up in a JSON body. Attackers can script that cheaply.

The shape of the flow is:

  1. The app creates or registers an App Attest key for the install.
  2. The backend issues a challenge for registration or request verification.
  3. The app returns attestation or assertion data bound to that challenge.
  4. The Worker verifies freshness, key identity, challenge binding, and install mapping before allowing a high-value request through.

In practice, this gives you a stronger signal that:

  • the request came from a real app instance, not just a copied HTTP call
  • the request is tied to a known install record
  • the signed material is fresh enough to reduce replay value

This does not eliminate abuse. A determined adversary can still spend time and money attacking the system. That is fine. Good backend security is often about moving abuse from "cheap and repeatable" to "costly and detectable."

For indie apps, that trade is huge. You usually do not need military-grade guarantees. You need to stop casual extraction, scripted replay, and trivial repackaging from turning into a provider invoice.

4. Provider Access Becomes a Controlled Capability

Once the Worker owns the provider key, the key stops being the product boundary. Your backend becomes the boundary.

That lets you turn AI access into a controlled capability:

  • allow only specific models for specific user tiers
  • route cheap requests to cheaper models first
  • cap output size for low-value actions
  • disable risky tool use unless a feature explicitly needs it
  • strip unsupported parameters before they reach the provider
  • redact or trim context before it becomes expensive

This is also where provider portability becomes real. If the client only talks to your edge, you can swap providers, compare vendors, or add fallback routing without forcing users to update the app.

That is another reason the model is rarely the core problem. Teams think they need a "better model" when what they really need is the ability to route requests properly, shape payloads correctly, and fail gracefully.

5. Cost Control Has to Happen Before the Model Call

If cost control starts after the provider responds, it is already too late.

The backend layer should decide whether a request deserves to exist before the expensive operation starts. In my setup, that means checking things like:

  • plan entitlements
  • concurrency limits
  • per-user daily budget
  • feature-level token ceilings
  • attachment size and count
  • recent failure and retry patterns

This is where architecture turns into margin.

A direct-to-provider mobile app usually has no serious answer to questions like:

  • What happens when one user generates 500 retries in ten minutes?
  • What happens when a bug loops background requests after resume?
  • What happens when a copied key gets shared in a Discord server?
  • What happens when the app sends large contexts for a feature that only needed a short completion?

With a real edge layer, those are controllable incidents. Without one, they are provider charges.

6. Observability Gets Better When the Request Path Belongs to You

Provider dashboards are useful, but they are not product observability.

I want to know:

  • which feature triggered the request
  • which app version sent it
  • which user tier it belonged to
  • whether it was a retry or a fresh action
  • whether the request was blocked by policy before hitting the provider
  • whether a spike came from growth, bugs, or abuse

That information does not appear by accident. It appears when your backend defines the request contract and assigns identifiers before the provider ever sees the call.

This becomes critical when you start iterating on prompts, tools, or model routing. If you cannot separate product traffic from bad traffic, you will make bad decisions about model quality too.

7. The Architecture Changes How You Build the Product

Once this layer exists, the product roadmap gets better:

  • You can launch AI features without permanently exposing provider credentials.
  • You can test model routing or prompt formats without app releases.
  • You can throttle or disable one feature without taking the whole app down.
  • You can add per-plan usage rules instead of blanket rate limits.
  • You can reason about unit economics at the feature level.

That is why I keep coming back to the same thesis: the problem is not the model, the problem is the architecture.

A better model can improve output quality. It cannot fix a request path that leaks secrets, trusts the client too much, and has no reliable control over spend.

A Minimal Tutorial You Can Actually Build

If you want the shortest practical version of this architecture, build it in four moves:

  1. The iOS app asks your Worker for a challenge.
  2. The app returns App Attest material tied to that challenge.
  3. The Worker verifies the proof and stores install state in a Durable Object.
  4. Every expensive AI request reserves ownership in the Durable Object before the provider call starts.

That is the smallest version of the pattern that still behaves like production architecture instead of demo architecture.

Step 1: Issue a challenge from the Worker

The app should not invent trust locally. It should ask your edge for a fresh challenge before registration or generation.

example.txt
1export async function handleChallenge(request: Request, env: Env): Promise<Response> { 2 const body = await request.json() as { 3 purpose: 'register' | 'generate' 4 clientInstanceId: string 5 keyId?: string 6 requestHash?: string 7 } 8 9 const id = env.CLIENT_STATE.idFromName(body.clientInstanceId) 10 const stub = env.CLIENT_STATE.get(id) 11 12 const result = await stub.issueChallenge({ 13 purpose: body.purpose, 14 clientInstanceId: body.clientInstanceId, 15 keyId: body.keyId ?? null, 16 requestHash: body.requestHash ?? null, 17 }) 18 19 return Response.json(result, { status: 200 }) 20}
TS
UTF-8

The point is not just freshness. The point is binding. Registration and generation should be tied to a server-issued challenge the client cannot predict in advance.

Step 2: Verify App Attest during registration

Registration is where you bind a device-backed key to your server-side install record.

example.txt
1export async function handleRegister(request: Request, env: Env): Promise<Response> { 2 const body = await request.json() as { 3 clientInstanceId: string 4 challengeId: string 5 keyId: string 6 attestation: string 7 } 8 9 const id = env.CLIENT_STATE.idFromName(body.clientInstanceId) 10 const stub = env.CLIENT_STATE.get(id) 11 12 const challenge = await stub.getRegisterChallenge({ challengeId: body.challengeId }) 13 14 const verified = await verifyRegistrationProof( 15 { 16 kind: 'apple', 17 keyId: body.keyId, 18 attestation: body.attestation, 19 }, 20 challenge, 21 env.RUNTIME_CONFIG, 22 ) 23 24 await stub.completeRegister({ 25 challengeId: body.challengeId, 26 keyId: verified.keyId, 27 publicKey: verified.publicKey, 28 attestationEnvironment: verified.attestationEnvironment, 29 }) 30 31 return Response.json({ ok: true }, { status: 200 }) 32}
TS
UTF-8

One runtime caveat matters here: do not assume every Node-flavored X.509 path will behave the same way on Cloudflare Workers. In practice, it is safer to keep certificate-chain validation and signature verification aligned with Workers-compatible tooling and WebCrypto.

Step 3: Reserve the request before calling the model

This is where most mobile AI apps are too naive. They call the provider first and think about retries later. Reverse that order.

example.txt
1export class ClientStateDO extends DurableObject { 2 async reserveGenerate(input: { 3 requestKey: string 4 requestHash: string 5 challengeId: string 6 keyId: string 7 counter: number 8 nowMs: number 9 }) { 10 return this.ctx.storage.transaction(async (tx) => { 11 const existing = await tx.get<RequestState>(`req:${input.requestKey}`) 12 13 if (existing?.status === 'completed') { 14 return { kind: 'completed_replay', response: existing.cachedResponse } 15 } 16 17 if (existing?.status === 'in_progress' && existing.leaseUntil > input.nowMs) { 18 return { kind: 'in_progress_existing', retryAfterSeconds: 3 } 19 } 20 21 const attemptId = crypto.randomUUID() 22 const leaseUntil = input.nowMs + 110_000 23 24 await tx.put(`req:${input.requestKey}`, { 25 status: 'in_progress', 26 attemptId, 27 requestHash: input.requestHash, 28 leaseUntil, 29 updatedAt: input.nowMs, 30 }) 31 32 return { 33 kind: existing ? 'accepted_takeover' : 'accepted_new', 34 attemptId, 35 } 36 }) 37 } 38}
TS
UTF-8

That one transaction gives you three important controls:

  • duplicate retries do not automatically become duplicate provider calls
  • stalled requests can be taken over safely
  • request ownership becomes explicit before money leaves the system

Step 4: Finalize only if the current attempt still owns the request

The provider response should not blindly overwrite state. Only the active attempt should be allowed to close the request.

example.txt
1async finalizeGenerate(input: { 2 requestKey: string 3 attemptId: string 4 outcome: 'completed' | 'failed_transient' | 'failed_final' 5 responseBody?: unknown 6}) { 7 return this.ctx.storage.transaction(async (tx) => { 8 const existing = await tx.get<RequestState>(`req:${input.requestKey}`) 9 10 if (!existing || existing.attemptId !== input.attemptId) { 11 return { ok: true, stale: true } 12 } 13 14 await tx.put(`req:${input.requestKey}`, { 15 ...existing, 16 status: input.outcome, 17 cachedResponse: input.responseBody ?? null, 18 updatedAt: Date.now(), 19 }) 20 21 return { ok: true, stale: false } 22 }) 23}
TS
UTF-8

This prevents an old attempt from overwriting a newer one after a timeout, retry, or takeover. In other words, it protects correctness and cost at the same time.

Step 5: Make the iOS client understand in-flight duplication

When the same expensive request is already running, the client should not treat that as a fatal backend error.

example.swift
1if httpResponse.statusCode == 409, 2 let apiError = try? JSONDecoder().decode(APIErrorEnvelope.self, from: data), 3 apiError.error.code == "request_in_progress" { 4 let retryAfter = apiError.error.retryAfterSeconds ?? 3 5 try await Task.sleep(nanoseconds: UInt64(retryAfter) * 1_000_000_000) 6 return .waitingForExistingWork 7}
SWIFT
UTF-8

That small branch matters because mobile clients retry for many boring reasons: dropped networks, app lifecycle transitions, and user impatience. Your system should turn those into predictable control flow, not duplicate bills.

Step 6: Keep the client pointed only at your edge

This is the entire rule in one tiny contrast:

example.swift
1// Wrong: 2let providerKey = "sk-live-..." 3 4// Correct: 5let workerBaseURL = URL(string: "https://your-worker.example.workers.dev")!
SWIFT
UTF-8

That may look obvious, but it is the architectural line that many teams refuse to draw. Once you draw it, model routing, quota policy, provider switching, and abuse controls become much easier to manage.

A Practical Rule for Vibe Coders and Indie Hackers

If you are shipping an iOS AI app today, the minimum responsible move is not "find a better way to hide the key." It is "stop shipping the key."

You do not need a giant platform team to do that. You need a simple, production-minded control layer:

  • keep provider secrets server-side
  • make the client prove more before expensive calls
  • centralize quota and retry state
  • treat idempotency as mandatory
  • log product intent around every request

That is enough to move from demo architecture to production architecture.

If you're still calling the provider directly from the client, the real problem is not model quality. It's architecture debt. I help teams design that layer properly, fix what is fragile, and harden it for growth.

Shipping AI on iOS?

Need help fixing the architecture before it becomes a billing problem?

I help teams design secure AI request flows, fix fragile backend edges, and harden mobile AI systems for scale and growth.

consulting / freelance availabilityUsually replies within 24 hours.
Share:
Tags:
iOS
AI
Cloudflare Workers
Durable Objects
App Attest
API Security
Indie Hackers
Mobile Backend