Skip to main content
Back to Blog
15 May 202414 min read

E-Commerce Scalability: Handling 10x Traffic Spikes

E-CommerceScalabilityRetailArchitecture

Lessons from building retail platforms that handle holiday traffic surges. Caching strategies, database optimization, and capacity planning.


E-Commerce Scalability: Handling 10x Traffic Spikes

Retail systems face unique scalability challenges—traffic can spike 10x or more during sales events, holidays, and flash promotions. At Interflora, Valentine's Day and Mother's Day meant preparing for traffic surges that dwarfed our baseline. Here's how we achieved 99.95% uptime during peak shopping periods.

Understanding Retail Traffic Patterns

The Reality of Spikes

EventTraffic MultiplierDuration
Flash sale announcement5-10x30-60 minutes
Holiday (Valentine's, Mother's Day)8-15x2-3 days
Black Friday/Cyber Monday10-20x4-5 days
TV advertisement3-5x15-30 minutes

The Cascade Effect

When one component slows, everything suffers:

Normal: User → CDN → App → DB → Response (200ms)

Under load:
User → CDN → App (waiting) → DB (saturated) → Timeout
              ↓
      Connection pool exhausted
              ↓
      New requests queued
              ↓
      Cascade failure

Capacity Planning

Baseline Measurement

Before you can plan for 10x, you need to know your 1x:

Key baseline metrics:
- Average requests per second (RPS)
- Peak RPS (daily, weekly patterns)
- Database queries per request
- Cache hit ratio
- Average response time by endpoint
- Error rate baseline

Capacity Model

Peak Planning Formula:
Required capacity = Baseline peak × Expected multiplier × Safety margin

Example:
- Normal peak: 500 RPS
- Black Friday multiplier: 15x
- Safety margin: 1.5x
- Required capacity: 500 × 15 × 1.5 = 11,250 RPS

Load Testing Strategy

# k6 load test script example stages: - duration: '2m', target: 100 # Warm up - duration: '5m', target: 500 # Normal load - duration: '2m', target: 2500 # Ramp to 5x - duration: '5m', target: 2500 # Hold at 5x - duration: '2m', target: 5000 # Ramp to 10x - duration: '10m', target: 5000 # Hold at 10x - duration: '2m', target: 7500 # Push to 15x - duration: '5m', target: 7500 # Breaking point test

Caching Architecture

Multi-Layer Caching

Layer 1: CDN (Cloudflare/CloudFront)
├── Static assets (images, CSS, JS)
├── Product images
└── API responses (with proper cache headers)

Layer 2: Application Cache (Redis)
├── Session data
├── User cart state
├── Product catalog
└── Inventory counts (with short TTL)

Layer 3: Database Query Cache
├── Prepared statement cache
└── Query result cache

Cache-First Architecture

async function getProduct(productId: string): Promise<Product> { // Layer 1: Memory cache (hot items) const memCached = memoryCache.get(productId); if (memCached) return memCached; // Layer 2: Redis const redisCached = await redis.get(`product:${productId}`); if (redisCached) { const product = JSON.parse(redisCached); memoryCache.set(productId, product, 60); // 60 second local cache return product; } // Layer 3: Database (with cache population) const product = await db.products.findById(productId); if (product) { await redis.setex(`product:${productId}`, 300, JSON.stringify(product)); memoryCache.set(productId, product, 60); } return product; }

Cache Invalidation for E-Commerce

// Inventory updates need careful invalidation async function updateInventory(productId: string, delta: number): Promise<void> { // Update database await db.inventory.decrement(productId, delta); // Invalidate product cache await redis.del(`product:${productId}`); // Publish event for CDN purge await events.publish('inventory-change', { productId, requiresCdnPurge: true }); // For flash sales: invalidate listing caches await redis.del('featured-products'); await redis.del(`category:${product.categoryId}:products`); }

Database Optimization

Read Replica Strategy

// Route reads to replicas, writes to primary const readPool = new Pool({ host: 'replica.db.example.com', max: 100, idleTimeoutMillis: 30000 }); const writePool = new Pool({ host: 'primary.db.example.com', max: 20, idleTimeoutMillis: 30000 }); async function getProducts(categoryId: string): Promise<Product[]> { // Read from replica return readPool.query('SELECT * FROM products WHERE category_id = $1', [categoryId]); } async function createOrder(order: Order): Promise<Order> { // Write to primary return writePool.query( 'INSERT INTO orders (user_id, items, total) VALUES ($1, $2, $3) RETURNING *', [order.userId, order.items, order.total] ); }

Connection Pool Tuning

Connection pool sizing:
- Too small: Requests wait for connections
- Too large: Database overwhelmed

Formula:
connections = (core_count * 2) + effective_spindle_count

For cloud databases:
- Start with 20 connections per application instance
- Monitor wait time and adjust
- Consider PgBouncer for connection pooling at scale

Query Optimization for Spikes

-- BEFORE: Full table scan during traffic spike SELECT * FROM products WHERE category_id = $1 ORDER BY created_at DESC LIMIT 20; -- AFTER: Indexed query with covering index CREATE INDEX idx_products_category_created ON products (category_id, created_at DESC) INCLUDE (name, price, image_url); -- Result: Query time 200ms → 2ms

Auto-Scaling Configuration

Kubernetes HPA

apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: ecommerce-api spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: ecommerce-api minReplicas: 10 maxReplicas: 100 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 60 - type: Pods pods: metric: name: requests_per_second target: type: AverageValue averageValue: "100" behavior: scaleUp: stabilizationWindowSeconds: 60 policies: - type: Percent value: 100 periodSeconds: 60 scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 10 periodSeconds: 60

Pre-Scaling for Known Events

# Scale up before Valentine's Day traffic kubectl scale deployment ecommerce-api --replicas=50 # Or use scheduled scaling apiVersion: autoscaling.k8s.io/v1 kind: CronHorizontalPodAutoscaler spec: schedule: "0 6 14 2 *" # 6 AM on Feb 14 targetReplicas: 100

Graceful Degradation

Feature Flags for Load Shedding

const loadSheddingConfig = { level0: { // Normal recommendations: true, reviews: true, relatedProducts: true, searchSuggestions: true }, level1: { // High load recommendations: true, reviews: true, relatedProducts: false, // Disable searchSuggestions: true }, level2: { // Very high load recommendations: false, // Disable reviews: false, // Disable relatedProducts: false, searchSuggestions: false }, level3: { // Critical // Essential checkout flow only recommendations: false, reviews: false, relatedProducts: false, searchSuggestions: false, guestCheckout: true, // Force guest checkout paymentMethods: ['card'] // Reduce payment options } }; async function getLoadLevel(): Promise<number> { const metrics = await getSystemMetrics(); if (metrics.errorRate > 5 || metrics.p99Latency > 5000) return 3; if (metrics.errorRate > 2 || metrics.p99Latency > 2000) return 2; if (metrics.cpuUsage > 80 || metrics.p99Latency > 1000) return 1; return 0; }

Queue-Based Checkout

During extreme load, queue checkout requests:

async function initiateCheckout(cart: Cart): Promise<CheckoutResponse> { if (await isSystemOverloaded()) { // Queue the checkout const ticketId = await checkoutQueue.add(cart); return { status: 'queued', ticketId, estimatedWaitSeconds: await checkoutQueue.getEstimatedWait(), message: 'High demand! Your order is queued and will be processed shortly.' }; } // Normal checkout flow return processCheckout(cart); }

Static Fallback Pages

// Serve cached product pages when database is overwhelmed app.get('/products/:id', async (req, res, next) => { try { const product = await getProduct(req.params.id); res.json(product); } catch (error) { if (error.code === 'ECONNREFUSED' || error.code === 'ETIMEDOUT') { // Serve static fallback const fallback = await cdn.get(`/static/products/${req.params.id}.json`); if (fallback) { res.set('X-Served-From', 'fallback'); return res.json(fallback); } } next(error); } });

Monitoring During Spikes

Real-Time Dashboard Metrics

Critical metrics during peak:
├── Requests per second (by endpoint)
├── Error rate (4xx, 5xx)
├── P50, P95, P99 latency
├── Database connections (active, waiting)
├── Redis memory and hit rate
├── Pod count and CPU utilization
└── Cart and checkout conversion rate

Automated Alerting

# Prometheus alerting rules groups: - name: ecommerce-peak rules: - alert: HighErrorRate expr: rate(http_requests_total{status=~"5.."}[1m]) > 0.01 for: 2m labels: severity: critical annotations: summary: Error rate above 1% - alert: CheckoutLatency expr: histogram_quantile(0.99, rate(checkout_duration_seconds_bucket[5m])) > 10 for: 5m labels: severity: warning annotations: summary: Checkout P99 latency above 10 seconds

Key Takeaways

  1. Know your baseline: You can't plan for 10x if you don't know 1x
  2. Cache aggressively: Multi-layer caching reduces database load exponentially
  3. Read replicas scale reads: Most e-commerce traffic is read-heavy
  4. Pre-scale for known events: Auto-scaling alone isn't fast enough for flash sales
  5. Plan graceful degradation: Know which features to disable and in what order
  6. Queue don't reject: A queued checkout is better than a failed one
  7. Monitor in real-time: Have dashboards ready and teams on standby during peaks

Retail scalability isn't about handling average load—it's about surviving the moments that make or break your year. Prepare for the spike, not the baseline.

Share this article