Case Study · SaaS

Mini SaaS: RAG for Teams

Multi-tenant AI product with users/orgs, test billing flow, per-tenant limits, audit logs, and a minimal admin panel.

ProblemMost AI demos ignore platform concerns needed for real SaaS adoption.

SolutionAdd tenancy, usage controls, and governance around a RAG core.

ImpactDemonstrates end-to-end product ownership beyond model calls.

Architecture

Product Systems

Designed as a tenant-governed runtime: identity-scoped authorization, plan entitlements, usage metering, and auditable control operations.

access token -> claims verify -> org/user binding -> feature + incident policy
      |                |                |                    |
      v                v                v                    v
 auth context     role + scope     tenant isolation     safety guardrails
      |                                                        |
      v                                                        v
rate + quota control -> RAG answer -> usage meter -> billing line -> audit/event log

Why this matters: every query is treated as a governed platform transaction, not just an LLM call.

Identity + Tenant Isolation

Token claims are bound to org and role context before any retrieval, billing, or admin action is allowed.

Usage + Billing Control Plane

Per-plan rate limits, token quota checks, and invoice line generation are executed in the request path.

Audit + Incident Governance

Every decision path writes structured audit events with incident policy override traces for operations review.

Evidence

Platform Readiness

Tenant Isolation

Tests ensure no cross-org retrieval leakage.

Rate Limit Control

Per-plan throttling measured under burst traffic.

Audit Coverage

Track auth events, retrieval calls, and admin actions.

Delivery Stack

FastAPIPostgresStripe TestRBACAdmin Panel

Presentation path: projects/rag-for-teams-saas/presentations/upcoming/

Technical Peek

Governed Tenant Runtime

def process_team_query(env: RequestEnvelope) -> dict[str, Any]:
    started = time.perf_counter()
    ctx = platform.authorize(env.token, env.request_id)
    policy = plans.get(ctx.org.tier)
    top_k = clamp_top_k(env.top_k, policy)

    platform.assert_org_state(ctx.org)
    platform.assert_feature_gate(ctx, env.endpoint, policy)
    platform.assert_incident_policy(ctx=ctx, endpoint=env.endpoint, incident=incident)
    platform.assert_safety(env.query, ctx.org)

    rate = platform.enforce_rate_limit(ctx, env.endpoint, policy)
    answer = rag.answer(org_id=ctx.org.org_id, query=env.query, top_k=top_k, mode=env.mode)
    latency_ms = int((time.perf_counter() - started) * 1000)

    quota, billed_cents = platform.record_usage_and_billing(
        ctx=ctx, env=env, answer=answer, latency_ms=latency_ms, policy=policy
    )
    platform.write_query_audit(...)
    platform.write_admin_event(...)

Why this matters: tenant isolation, monetization, and compliance are enforced directly in runtime orchestration instead of being bolted on later.

Advanced Breakdown

Most Important Engineering Decisions

1. Hard Tenant Isolation Across Layers

Auth context is bound to org IDs at API, data, and retrieval layers so vector queries, database reads, and audit trails all enforce the same tenant boundary.

Why this matters: cross-organization data leakage is blocked by architecture, not convention.

Security

2. Plan Entitlements + Feature Gates

Product capabilities are controlled by explicit entitlement checks per plan, separating permission logic from UI and preventing unauthorized backend access.

Why this matters: monetization rules are enforceable and testable server-side.

Monetization

3. Quota + Burst Rate-Limit Enforcement

Per-tenant request quotas and burst throttles protect shared infrastructure while maintaining fair performance between organizations on different usage profiles.

Why this matters: a few heavy tenants cannot degrade service for everyone else.

Platform Stability

4. Auditable Platform Event Model

Auth actions, retrieval operations, billing state changes, and admin interventions are captured as structured events for compliance and debugging.

Why this matters: enterprise trust depends on traceable behavior and governance.

Governance

5. Admin Controls + Safety Switches

Minimal admin panel capabilities include tenant suspend, rate override, and incident-level request disable switches for fast operational response.

Why this matters: incidents can be contained without emergency code deployments.

Operations