Skip to main content

Scaling Strategy

GospeLib's infrastructure evolves across three phases. Decisions in Phase 1 must not block Phase 2 or Phase 3. The code stays the same — only the deployment topology changes.

Phase Summary

graph LR
P1["Phase 1: Solo/Seed<br/>0-10K users<br/>$150-300/mo"] --> P2["Phase 2: Growth<br/>10K-100K users<br/>$1.5-3K/mo"]
P2 --> P3["Phase 3: Enterprise<br/>100K+ users<br/>$5-10K/mo"]

Phase 1: Solo/Seed (0–10K users)

Timeline: Months 0–18
Team: 1 engineer
Infrastructure: Single-node k3s (or Docker Compose on a $40/month VPS)

Topology

graph TB
subgraph Single Node
GW["Gateway"] --> Content["Content"]
GW --> Auth["Auth"]
GW --> Billing["Billing"]
GW --> AI["AI"]
GW --> Notif["Notifications"]
Content --> FDB["FalkorDB"]
Content --> Redis["Redis"]
Content --> TS["Typesense"]
Auth --> PG["PostgreSQL"]
Billing --> PG
end

Key Characteristics

  • Gateway + Content may run as a single process with chi router delegating by path prefix
  • Auth service is just Clerk webhooks
  • Billing is a single webhook handler + DB
  • The monorepo structure is identical — code is separated at the module level even if it runs in the same binary
  • All nx affected CI runs take <5 minutes

Cost Estimate

ResourceProviderMonthly Cost
K3s node (or VPS)EC2 t3.micro / DigitalOcean$0–40
PostgreSQLRDS db.t3.micro (free tier)$0
Redis cacheElastiCache t3.micro (free tier)$0
FalkorDBDocker on same node$0
Container registryECR (500 MB free)$0
DNSRoute53$0.50
SecretsAWS Secrets Manager~$2
Total~$2.50–42/mo

Phase 2: Growth (10K–100K users)

Timeline: Months 18–48
Team: 3–12 engineers
Infrastructure: EKS cluster, RDS Multi-AZ, ElastiCache

Topology

graph TB
subgraph EKS Cluster
Ingress["Nginx Ingress<br/>(ALB)"]
GW["Gateway<br/>3 pods (HPA)"]
Content["Content<br/>2 pods (HPA)"]
Auth["Auth<br/>2 pods"]
Billing["Billing<br/>2 pods"]
AI["AI<br/>2 pods"]
Notif["Notifications<br/>2 pods"]
FDB["FalkorDB<br/>(StatefulSet)"]
TS["Typesense<br/>(StatefulSet)"]
end

RDS["RDS PostgreSQL<br/>(Multi-AZ)"]
EC["ElastiCache Redis"]
CDN["CloudFront CDN"]

Ingress --> GW
GW --> Content & Auth & Billing & AI & Notif
Content --> FDB & TS
Content & Auth & Billing & AI & Notif --> RDS & EC

Scaling Triggers

MetricThresholdAction
Gateway CPU>70%HPA scales pods
Content CPU>60%HPA scales pods
FalkorDB memory>80%Vertical scale to r6i.2xlarge
PostgreSQL connections>80%Add read replicas
Typesense queries/secDegraded latencyMove to dedicated node

Key Challenges

  • FalkorDB does not horizontally scale — vertical scale + Redis cache layer in front of graph (already designed in Phase 1)
  • Typesense: single node sufficient to 10M documents; cluster for redundancy only
  • PostgreSQL: read replicas for study data queries; write primary for auth/billing

Cost Estimate

ResourceProviderMonthly Cost
EKS cluster (3 t3.medium)AWS EKS~$200
Compute (pods)EC2~$300–600
PostgreSQLRDS db.r6g.large (Multi-AZ)~$350
RedisElastiCache r6g.large~$200
FalkorDBDedicated node~$150
CDNCloudFront~$50
MonitoringGrafana stack~$100
Total~$1,500–3,000/mo

Phase 3: Enterprise (100K+ users)

Timeline: Month 48+
Team: 12+ engineers
Infrastructure: Multi-AZ EKS, Aurora, service mesh

Additions Over Phase 2

AdditionPurpose
Service mesh (Istio)mTLS between services, traffic management
HashiCorp VaultDynamic DB credentials, PKI, audit trail
Aurora Serverless v2Auto-scaling PostgreSQL
ElastiCache Cluster ModeRedis high availability
FalkorDB EnterpriseGraph clustering + HA
Multi-regionData residency, latency reduction
SIEMSecurity audit trail
SOC 2 Type IICompliance certification

What Does NOT Change

Even at enterprise scale, these remain constant:

  • API contracts — versioned from day 1
  • Service boundaries — drawn correctly from day 1
  • Database schemas — migrate forward, never backward
  • Frontend code — scales independently via CDN
  • Ingest pipeline — runs as a job, not a server
  • The codebase — same monorepo, same module structure