Docker Best Practices: Building Production-Ready Containers

After years of shipping containers to production, I've learned that containers are easy to build badly. The difference between a development container and a production-ready one involves security, size, reliability, and observability considerations that are easy to overlook.

Multi-Stage Builds

Multi-stage builds are essential for production containers. They separate build-time dependencies from runtime, dramatically reducing image size and attack surface.

Basic Multi-Stage Pattern

# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Production stage
FROM node:20-alpine AS production
WORKDIR /app

# Create non-root user
RUN addgroup -g 1001 -S nodejs && \\
    adduser -S nextjs -u 1001

# Copy only what's needed
COPY --from=builder --chown=nextjs:nodejs /app/dist ./dist
COPY --from=builder --chown=nextjs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nextjs:nodejs /app/package.json ./

USER nextjs
EXPOSE 3000
CMD ["node", "dist/server.js"]

Go Application Example

# Build stage
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-w -s" -o /app/server

# Production stage - distroless for minimal attack surface
FROM gcr.io/distroless/static-debian12
COPY --from=builder /app/server /
USER nonroot:nonroot
ENTRYPOINT ["/server"]

The Go example produces a ~10MB image instead of ~800MB with the Go SDK.

Base Image Selection

Your base image choice affects size, security, and compatibility.

Image Comparison

Base Image	Size	Use Case
ubuntu:22.04	~77MB	Full Linux environment, debugging
debian:bookworm-slim	~74MB	Standard Linux, smaller than Ubuntu
alpine:3.19	~7MB	Minimal, musl libc (check compatibility)
distroless/static	~2MB	Compiled binaries only
scratch	~0MB	Absolute minimum (static binaries only)

My Recommendations

Node.js: node:20-alpine for most cases
Go: gcr.io/distroless/static for production, golang:alpine for development
Python: python:3.12-slim (avoid alpine due to compilation issues)
Java: eclipse-temurin:21-jre-alpine

Security Hardening

Security isn't optional—it's the default expectation in production.

Run as Non-Root User

Never run containers as root. Create a dedicated user:

# Alpine
RUN addgroup -g 1001 -S appgroup && \\
    adduser -S appuser -u 1001 -G appgroup

# Debian/Ubuntu
RUN groupadd -g 1001 appgroup && \\
    useradd -u 1001 -g appgroup -s /bin/sh appuser

USER appuser

Don't Include Secrets

Secrets should never be in your image. They leak through:

Environment variables in Dockerfile
COPY commands
Image layer history

Bad:

# DON'T DO THIS
ENV API_KEY=sk_live_abc123
COPY .env /app/

Good:

# Pass secrets at runtime
# docker run -e API_KEY=$API_KEY myimage
# Or use Docker secrets, Kubernetes secrets, or vault

Read-Only Filesystem

Run with read-only root filesystem when possible:

# docker-compose.yml
services:
  app:
    read_only: true
    tmpfs:
      - /tmp
      - /var/run

Drop Capabilities

Remove unnecessary Linux capabilities:

# docker-compose.yml
services:
  app:
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE  # Only if needed

Scan for Vulnerabilities

Integrate scanning into your CI/CD:

# Trivy (recommended)
trivy image myapp:latest

# Docker Scout
docker scout cves myapp:latest

# Snyk
snyk container test myapp:latest

Image Optimization

Smaller images are faster to pull, use less storage, and have smaller attack surfaces.

Layer Caching Strategy

Order instructions from least to most frequently changed:

# GOOD: Dependencies first, code last
FROM node:20-alpine
WORKDIR /app

# These rarely change - cached layers
COPY package*.json ./
RUN npm ci --only=production

# This changes often - invalidates only this layer
COPY . .
RUN npm run build

Combine RUN Commands

Each RUN creates a layer. Combine related operations:

# BAD: Multiple layers, larger image
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get clean

# GOOD: Single layer, cleanup included
RUN apt-get update && \\
    apt-get install -y --no-install-recommends curl && \\
    apt-get clean && \\
    rm -rf /var/lib/apt/lists/*

Use .dockerignore

Exclude unnecessary files from the build context:

# .dockerignore
.git
.gitignore
node_modules
npm-debug.log
Dockerfile*
docker-compose*
.env*
*.md
.vscode
.idea
coverage
tests
__pycache__
*.pyc

Pin Versions

Always pin versions for reproducible builds:

# BAD: Unpredictable builds
FROM node:latest
RUN npm install express

# GOOD: Reproducible builds
FROM node:20.11.0-alpine3.19
COPY package-lock.json ./
RUN npm ci

Runtime Configuration

Health Checks

Health checks enable orchestrators to detect and replace unhealthy containers:

HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \\
  CMD curl -f http://localhost:3000/health || exit 1

For non-HTTP services:

HEALTHCHECK --interval=30s --timeout=3s --retries=3 \\
  CMD pg_isready -U postgres || exit 1

Resource Limits

Always set memory and CPU limits in production:

# docker-compose.yml
services:
  app:
    deploy:
      resources:
        limits:
          cpus: '1.0'
          memory: 512M
        reservations:
          cpus: '0.5'
          memory: 256M

Graceful Shutdown

Handle SIGTERM for graceful shutdown:

// Node.js example
process.on('SIGTERM', async () => {
  console.log('SIGTERM received, shutting down gracefully');
  await server.close();
  await database.disconnect();
  process.exit(0);
});

Use exec form in CMD to receive signals properly:

# GOOD: Receives signals
CMD ["node", "server.js"]

# BAD: Signals go to shell, not app
CMD node server.js

Logging Best Practices

Log to stdout/stderr for container log collection:

// Node.js - use console or structured logger
console.log(JSON.stringify({
  level: 'info',
  message: 'Server started',
  port: 3000,
  timestamp: new Date().toISOString()
}));

Don't log to files inside containers—use log drivers:

# docker-compose.yml
services:
  app:
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"

Common Mistakes to Avoid

1. Running as Root

Problem: Security vulnerability, privilege escalation risk

Solution: Always use USER directive

2. Using Latest Tag

Problem: Unpredictable builds, silent breaking changes

Solution: Pin specific versions

3. Large Images

Problem: Slow deployments, wasted resources

Solution: Multi-stage builds, minimal base images

4. Secrets in Images

Problem: Credentials exposed in image layers

Solution: Runtime injection, secrets management

5. No Health Checks

Problem: Dead containers stay in rotation

Solution: Add HEALTHCHECK instruction

6. Ignoring Signal Handling

Problem: Data loss on shutdown, stuck containers

Solution: Handle SIGTERM, use exec form CMD

CI/CD Integration

GitHub Actions Example

name: Build and Push

on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Build image
        run: docker build -t myapp:${{ github.sha }} .

      - name: Scan for vulnerabilities
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: myapp:${{ github.sha }}
          exit-code: '1'
          severity: 'CRITICAL,HIGH'

      - name: Push to registry
        run: |
          docker tag myapp:${{ github.sha }} registry/myapp:${{ github.sha }}
          docker push registry/myapp:${{ github.sha }}

Key Takeaways

Multi-stage builds are non-negotiable: Separate build and runtime environments
Security by default: Non-root users, minimal base images, no secrets in images
Pin everything: Base images, package versions, for reproducible builds
Health checks enable resilience: Let orchestrators detect and recover from failures
Handle signals: Graceful shutdown prevents data loss and connection issues
Log to stdout: Let the platform handle log aggregation
Scan regularly: Vulnerabilities emerge constantly; automate scanning in CI/CD

Production containers aren't just "it works on my machine" wrapped in Docker. They're secure, efficient, observable, and resilient. Every Dockerfile decision should consider what happens when this runs at scale with real traffic.