# Developer Portal — Implementation Plan

> Meta-platform that lets developers configure, deploy, monitor, and manage multiple instances of the VertexAI RAG application. Each instance is a separate Cloud Run service with its own persona, LLM/RAG config, branding, and domain.

---

## 1. Goals

| # | Goal | Why |
|---|------|-----|
| G1 | Spin up a new RAG app from a form, no code edits | Today, a new persona (e.g. "Sales", "Field Tech") requires changing constants and redeploying. We want config-driven deploys. |
| G2 | Each app is an independent Cloud Run service | True isolation: per-app scaling, billing, secrets, custom domain, blast-radius containment. |
| G3 | Single pane of glass: health, logs, crashes, AI traces, cost | Replace tab-juggling across GCP console + Langfuse + Sentry. |
| G4 | Reversible deploys (rollback, traffic split) | Per-app revisions on Cloud Run. Roll back in one click. |
| G5 | Sight (visibility) toggle per app | A "dark launch" / kill-switch flag stored in config, read by the app on each request. |

---

## 2. Non-Goals (v1)

- **No custom code editing inside the portal.** All variation is configuration; if a customer needs custom code, they fork the repo.
- **No CI/CD replacement.** We trigger Cloud Build, we don't reimplement it.
- **No billing/quota engine.** Phase 2; for now we surface GCP cost via the Cloud Billing API.
- **No marketplace / self-serve signup for end users.** Portal users are internal developers only.

---

## 3. Core Concepts

| Term | Meaning |
|------|---------|
| **App Template** | The base codebase (`vertexai-rag/`) versioned by Git tag (e.g. `v1.4.2`). |
| **App Instance** | A deployed Cloud Run service derived from a template version + a config record. Identified by `app_id` (slug). |
| **Persona** | A named bundle of `{system_prompt, first_message, document_locked_message, voice_settings}`. Bound to an app at deploy time. |
| **Deployment** | One immutable Cloud Run revision tied to `{app_id, template_version, config_snapshot_id}`. |
| **Sight Toggle** | Boolean flag in app config that gates whether the app is reachable. When off, all routes return 503. |

---

## 4. Phased Delivery

### Phase 0 — Foundations (Week 1)

- New repo: `vertexai-rag-portal/` (backend + frontend, separate from RAG app).
- New GCP project OR reuse existing — recommend separate project: `vertexai-rag-portal-prod`.
- Bootstrap: FastAPI + React (Vite) + shadcn/ui, mirroring stack of main app for team familiarity.
- Auth: Firebase Auth with a dedicated `portal-developers` tenant. Only invited developers; no signup.
- Firestore database for portal state (see Section 6 for schema).
- Decide branch strategy in the RAG repo: portal deploys from tagged releases only (never `master` HEAD).

### Phase 1 — Configuration & Manual Deploy (Week 2–3)

- Portal UI: create app form (name, slug, persona, LLM config, RAG config, branding).
- Portal API: CRUD on `app_configs` collection.
- "Deploy" button stub that calls a Cloud Build trigger via REST.
- Cloud Build pipeline (`cloudbuild.yaml` in RAG repo) parameterized by `_APP_ID`, `_TEMPLATE_VERSION`, `_CONFIG_URL`:
  1. Pull config JSON from portal API.
  2. Build Docker image, tag `gcr.io/.../rag-app:<app_id>-<short_sha>`.
  3. `gcloud run deploy rag-<app_id>` with config as env vars + Secret Manager refs.
  4. Map custom domain via Cloud Run domain mappings (`<slug>.apps.yourdomain.com`).
- Wildcard DNS `*.apps.yourdomain.com` → Cloud Run load balancer (one-time setup).
- Validate end-to-end: form submit → 5-8 min later, app is live on its subdomain.

### Phase 2 — Personas & Runtime Config (Week 3–4)

- Persona library: CRUD `personas` collection. Each persona has `{name, system_prompt, first_message, document_locked_message, voice_settings, default_llm}`.
- App config references a persona by `persona_id` + allows per-app overrides.
- Modify RAG app (`backend/core/config.py` + `service/rag_service.py`):
  - Read `APP_ID` env var on boot.
  - Fetch app config from portal API on startup, cache in memory.
  - Hot-reload on SIGHUP or via `/admin/reload-config` endpoint (auth: portal SA only).
- Sight toggle: middleware in RAG app that checks `config.sight_enabled`; returns 503 with branded page if off.

### Phase 3 — Observability Aggregation (Week 4–5)

- Cloud Logging: structured JSON logs already in place; ensure every log line carries `app_id` label (set via Cloud Run env var → log enrichment via `RequestIdFormatter`).
- Cloud Error Reporting: enabled by default on Cloud Run; portal queries Error Reporting API filtered by `service_name=rag-<app_id>`.
- Cloud Monitoring: portal queries Metrics API for `request_count`, `request_latencies`, `instance_count` per app.
- Langfuse integration in RAG app:
  - Add `langfuse` SDK to `backend/requirements.txt`.
  - Wrap LLM calls in `service/rag_service.py` and `service/openai_text_client.py` with Langfuse `@observe` decorator.
  - Each trace tagged with `app_id`, `user_id` (hashed), `conversation_id`.
  - Langfuse project per app OR shared project with `app_id` tag — recommend **shared project, tag-filtered** for cost.
- Portal "Observability" page per app: tabs for Health / Logs / Errors / Traces / Cost.

### Phase 4 — Deployment Operations (Week 5–6)

- Revisions list per app (queried from Cloud Run Admin API).
- One-click rollback (set 100% traffic to previous revision).
- Traffic split UI (canary: 90/10).
- Deployment audit log (Firestore `deployment_events` with who/when/what).
- Webhook on Cloud Build failure → notifies portal → status update.

### Phase 5 — Polish (Week 6+)

- Secret rotation UI (writes to Secret Manager, triggers redeploy).
- Per-app cost dashboard (Cloud Billing API + BigQuery billing export).
- Domain mapping UI (custom domains beyond `*.apps.yourdomain.com`).
- App archival (graceful tear-down: drain traffic, delete Cloud Run service, archive Firestore data, keep config record).

---

## 5. Build vs Reuse

| Concern | Approach |
|---------|----------|
| Container build | Cloud Build (managed, no Jenkins/self-hosted runners). |
| Image registry | Artifact Registry (newer than GCR, regional). |
| Secrets | Secret Manager, referenced via Cloud Run `--set-secrets`. |
| DNS | Cloud DNS, wildcard A record pre-provisioned. |
| Domain mapping | Cloud Run domain mappings (managed certs via Google). |
| Auth (portal) | Firebase Auth (existing infra). |
| Auth (each RAG app) | Already there — per-app Firebase tenant configured in app config. |
| Observability | Cloud Logging / Error Reporting / Monitoring + Langfuse for LLM traces. |

---

## 6. Risks & Mitigations

| Risk | Mitigation |
|------|------------|
| Cloud Run service quota (default ~1000 services per region per project) | Use a dedicated GCP project per environment; request quota increase early. |
| Build queue contention (concurrent deploys) | Cloud Build has 10 concurrent builds default; serialize per-app deploys, queue UI feedback. |
| Cost explosion (idle instances per app) | Set `min-instances=0` by default; only pin for production apps. |
| Config drift between portal record and deployed env vars | Snapshot config at deploy time → `config_snapshots/{snapshot_id}`. Revision points to snapshot, not the live config. |
| Secret leakage in build logs | Use Secret Manager references, never `--set-env-vars` for secrets. |
| Portal becomes single point of failure | Portal outage must NOT take down deployed apps. Apps cache config on boot and run independently. Portal is for deploy/observe, not runtime serving. |
| Wildcard cert renewal | Google-managed certs auto-renew, but only after domain mapping is live. Monitor cert status. |

---

## 7. Open Questions

- **Per-app database isolation?** Currently the RAG app uses one Firestore DB with `tenant_id`. Per-app deploys could share that or get their own Firestore DB. Recommend **shared Firestore with `app_id` partitioning** for v1 to avoid migration complexity; revisit if a customer needs hard data isolation.
- **Authentication model for end users of deployed apps.** Each app's users still go through Firebase Identity Platform tenants. Portal needs to either (a) auto-create a new tenant per app, or (b) reuse a shared tenant and partition by `app_id`. Recommend (a) for clean isolation — Identity Platform supports many tenants per project.
- **How is the RAG app's source code updated across deployed apps?** When the template ships `v1.5`, do we auto-upgrade all apps or require manual click-to-upgrade per app? Recommend **manual upgrade with version pinning**; portal shows "Update available: v1.4.2 → v1.5.0".

---

## 8. Success Criteria

- A developer can go from "I need a Sales persona app" to a live, branded URL in **< 10 minutes** without touching code.
- The portal shows the health of all 20+ deployed apps on one page with **< 2s load**.
- An app can be rolled back to a prior revision in **< 30 seconds**.
- An LLM trace from a deployed app is visible in the portal **within 60 seconds** of the request
