Friday, 1 May 2026

TASK H - GUIDE - Dockerize the App

 

Containerization & Image Management (Docker) — Step-by-Step Guide

Overview

In this task, you containerize the HealthPulse Portal using Docker. You will build a production-ready Docker image, scan it for vulnerabilities, push it to a registry, and deploy it to your k3s cluster manually. Every step here is something the CI pipeline will eventually automate in Task F — you are doing it by hand first so you understand what the automation is actually doing.

Why now? You just finished Task G (bare-metal deployment) where you manually copied files to an Nginx server, reloaded services, and had no versioning, no rollback, and no environment parity. Docker solves all of those problems by packaging the application and its server into a single, immutable, versioned artifact.

What you'll do:

  1. Understand the multi-stage Dockerfile
  2. Build the Docker image locally
  3. Run and test the container
  4. Use Docker Compose for consistent configuration
  5. Scan the image for security vulnerabilities (manually)
  6. Tag and push to a container registry
  7. Deploy to k3s manually using kubectl
  8. Compare bare-metal vs container deployment
  9. Document in MkDocs
  10. Clean up

Time estimate: This is a Week 5 task, typically completed after Task G.


Prerequisites

Before starting Task H, ensure you have completed:

  •  Task G — Bare-metal deployment done (you have felt the pain of manual SCP, Nginx config, no versioning)
  •  Docker installed (docker --version → v24+ or v27+)
  •  Docker Compose installed (docker compose version → v2.x)
  •  The project builds locally (pnpm build produces dist/)

Why Task G first? If you skip straight to Docker, you won't appreciate what it solves. Task G is intentionally painful — it is the "before" picture. Task H is the "after."


Step 1: Understand the Dockerfile

1.1 — Open and Read the Dockerfile

Open docker/Dockerfile and study it line by line:

# ============================================
# Stage 1: Build the application
# ============================================
FROM node:20-alpine AS build

WORKDIR /app

# Copy dependency files first for layer caching
COPY package.json package-lock.json ./
RUN npm ci

# Copy source code
COPY . .

# Build arguments for environment-specific builds
ARG VITE_API_URL=http://localhost:3000/api
ARG VITE_ENV=production
ARG VITE_APP_VERSION=1.0.0

# Build the application
RUN npm run build

# ============================================
# Stage 2: Serve with Nginx
# ============================================
FROM nginx:1.27-alpine AS production

# Remove default nginx config
RUN rm /etc/nginx/conf.d/default.conf

# Copy custom nginx config
COPY docker/nginx.conf /etc/nginx/conf.d/default.conf

# Copy built application from build stage
COPY --from=build /app/dist /usr/share/nginx/html

# Add healthcheck
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:80/ || exit 1

# Expose port 80
EXPOSE 80

CMD ["nginx", "-g", "daemon off;"]

1.2 — What Is a Multi-Stage Build?

A multi-stage build uses multiple FROM statements. Each FROM starts a new stage. Only the final stage becomes the image you ship.

Stage 1: "build" (node:20-alpine)          Stage 2: "production" (nginx:1.27-alpine)
┌────────────────────────────────┐          ┌────────────────────────────────┐
│  Node.js 20                    │          │  Nginx 1.27                    │
│  npm                           │          │                                │
│  package.json + package-lock   │          │  nginx.conf (custom)           │
│  All source code (src/, etc.)  │          │  dist/ (copied from Stage 1)   │
│  node_modules/ (hundreds of MB)│          │                                │
│  dist/ (build output)          │  ──────> │  HEALTHCHECK configured        │
│                                │  COPY    │                                │
│  ~ 400-800 MB                  │ --from=  │  ~ 40-60 MB                    │
│                                │  build   │                                │
│  DISCARDED after build         │          │  THIS IS YOUR FINAL IMAGE      │
└────────────────────────────────┘          └────────────────────────────────┘

What stays: Only the dist/ folder (your compiled HTML/CSS/JS) and the Nginx server.

What is discarded: Node.js, npm, node_modules, source code, TypeScript files — everything needed to build but not to run. This is why the final image is ~50 MB instead of ~800 MB.

1.3 — Why Alpine?

Both stages use Alpine Linux variants (node:20-alpinenginx:1.27-alpine).

Base ImageSizeUse Case
node:20 (Debian)~1 GBFull OS, every tool included, good for development
node:20-alpine~130 MBMinimal OS, only what's needed, good for builds
nginx:1.27 (Debian)~190 MBFull Nginx with extras
nginx:1.27-alpine~45 MBMinimal Nginx, perfect for serving static files

Alpine uses musl libc instead of glibc and apk instead of apt. It strips out everything you don't need — man pages, shell utilities, package caches. For a production image that just serves static files, this is ideal.

1.4 — How the nginx.conf Works

The custom Nginx config (docker/nginx.conf) is copied into the image at build time:

server {
    listen 80;
    server_name _;
    root /usr/share/nginx/html;
    index index.html;

    # Security headers
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    ...

    # Gzip compression
    gzip on;
    ...

    # Cache static assets aggressively
    location /assets/ {
        expires 1y;
        add_header Cache-Control "public, immutable";
    }

    # SPA routing — serve index.html for all routes
    location / {
        try_files $uri $uri/ /index.html;
    }

    # Health check endpoint
    location /health {
        access_log off;
        return 200 '{"status":"healthy"}';
        add_header Content-Type application/json;
    }
}

Key points:

  • try_files — This is critical for React Router. Without it, navigating to /dashboard and refreshing gives a 404 because there is no dashboard file on disk. try_files tells Nginx: "If the file doesn't exist, serve index.html and let React handle the route."
  • /health — A synthetic health endpoint. Docker's HEALTHCHECK and Kubernetes probes will hit this to determine if the container is alive.
  • Security headers — Prevent clickjacking (X-Frame-Options), MIME sniffing (X-Content-Type-Options), and XSS (X-XSS-Protection).
  • Gzip — Compresses text-based assets before sending to the browser, reducing transfer size by 60-80%.

1.5 — Layer Caching Strategy

Notice the order of COPY instructions in Stage 1:

COPY package.json package-lock.json ./  # Step A — changes rarely
RUN npm ci                              # Step B — expensive (30-60s)
COPY . .                                # Step C — changes every commit
RUN npm run build                       # Step D — depends on source code

Docker caches each layer. If a layer's input hasn't changed, Docker reuses the cached version. By copying dependency files before source code:

  • If you only changed source code (Step C), Steps A and B are cached — npm ci is skipped entirely
  • A rebuild takes seconds instead of minutes

If you put COPY . . before npm ci, every code change would invalidate the dependency cache, forcing a full reinstall every build. This is the #1 Dockerfile performance mistake beginners make.


Step 2: Build the Docker Image Locally

2.1 — Run the Build

From the project root (not the docker/ directory):

docker build -t healthpulse-portal:local -f docker/Dockerfile .
FlagMeaning
docker buildBuild a Docker image from a Dockerfile
-t healthpulse-portal:localTag the image with name healthpulse-portal and tag local
-f docker/DockerfileUse this specific Dockerfile (not the default ./Dockerfile)
.Build context — the directory Docker sends to the build daemon. . means the project root, so COPY . . copies everything in the project

2.2 — Watch the Build Output

You will see Docker executing each instruction:

[+] Building 45.2s (15/15) FINISHED
 => [build 1/6] FROM node:20-alpine@sha256:...                    2.3s
 => [build 2/6] WORKDIR /app                                      0.0s
 => [build 3/6] COPY package.json package-lock.json ./             0.1s
 => [build 4/6] RUN npm ci                                        28.4s
 => [build 5/6] COPY . .                                          0.3s
 => [build 6/6] RUN npm run build                                  8.1s
 => [production 1/4] FROM nginx:1.27-alpine@sha256:...             1.5s
 => [production 2/4] RUN rm /etc/nginx/conf.d/default.conf         0.2s
 => [production 3/4] COPY docker/nginx.conf ...                    0.1s
 => [production 4/4] COPY --from=build /app/dist ...               0.1s
 => exporting to image                                             0.2s

The first build will be slow (~45s) because there are no cached layers. Subsequent builds with only source code changes will be much faster (~10s) thanks to layer caching.

2.3 — Verify the Image

docker images healthpulse-portal
REPOSITORY            TAG       IMAGE ID       CREATED          SIZE
healthpulse-portal    local     a1b2c3d4e5f6   30 seconds ago   47.2MB

Compare this to bare-metal (Task G): On the EC2 server, you had Node.js installed, npm, build tools, source code — all living on the server. The Docker image contains only Nginx and the compiled static files. 47 MB vs a full Ubuntu server.


Step 3: Run and Test Locally

3.1 — Start the Container

docker run -d --name healthpulse -p 8080:80 healthpulse-portal:local
FlagMeaning
-dDetached mode — run in the background (don't lock your terminal)
--name healthpulseGive the container a human-readable name
-p 8080:80Map port 8080 on your machine to port 80 inside the container
healthpulse-portal:localThe image to run

Port mapping explained:

Your Machine                     Docker Container
┌──────────────┐                ┌──────────────────┐
│              │  -p 8080:80   │                  │
│  localhost   │──────────────>│  Nginx           │
│  :8080       │               │  :80             │
│              │               │                  │
│  Browser     │               │  /usr/share/     │
│  curl        │               │  nginx/html/     │
└──────────────┘               └──────────────────┘

3.2 — Test with curl

# Health check
curl http://localhost:8080/health
# → {"status":"healthy"}

# Home page (should return HTML)
curl -s http://localhost:8080/ | head -5
# → <!DOCTYPE html>
# → <html lang="en">
# → ...

3.3 — Test in the Browser

Open http://localhost:8080 in your browser. You should see the HealthPulse Portal. Navigate around — try /dashboard/appointments, then refresh the page. If the page loads correctly on refresh, the try_files SPA fallback is working.

3.4 — Inspect the Running Container

# See running containers
docker ps
CONTAINER ID   IMAGE                      COMMAND                  STATUS                    PORTS                  NAMES
a1b2c3d4e5f6   healthpulse-portal:local   "/docker-entrypoint.…"  Up 2 minutes (healthy)    0.0.0.0:8080->80/tcp   healthpulse

Note the (healthy) status — that is the HEALTHCHECK in the Dockerfile working.

# View container logs (Nginx access logs)
docker logs healthpulse

# Follow logs in real-time (Ctrl+C to stop)
docker logs -f healthpulse

# Inspect container details (image, network, mounts, etc.)
docker inspect healthpulse

# Execute a command inside the running container
docker exec -it healthpulse /bin/sh

# Once inside, explore:
ls /usr/share/nginx/html/      # Your built app
cat /etc/nginx/conf.d/default.conf  # Your Nginx config
nginx -t                       # Test Nginx config
exit

3.5 — Stop the Container

docker stop healthpulse

Checkpoint: You have built and run the HealthPulse Portal in a Docker container. The image contains everything needed to serve the app — Nginx, config, static files — in a single 47 MB artifact.


Step 4: Docker Compose

4.1 — Why Compose?

Running docker run with all its flags is error-prone. Docker Compose captures the entire runtime configuration in a YAML file so you can start the app with one command and get the same result every time.

Without ComposeWith Compose
docker build -t healthpulse-portal:local -f docker/Dockerfile .docker compose -f docker/docker-compose.yml up -d
docker run -d --name healthpulse -p 8080:80 healthpulse-portal:local(one command does both build + run)
Must remember every flagFlags are in the YAML file
Multiple commands for multiple containersOne file, one command

4.2 — Review the Compose File

Open docker/docker-compose.yml:

version: "3.8"

services:
  healthpulse:
    build:
      context: ..
      dockerfile: docker/Dockerfile
      args:
        VITE_API_URL: ${VITE_API_URL:-http://localhost:3000/api}
        VITE_ENV: ${VITE_ENV:-development}
        VITE_APP_VERSION: ${VITE_APP_VERSION:-1.0.0-dev}
    ports:
      - "${APP_PORT:-3000}:80"
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:80/"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s

Key details:

  • context: .. — The build context is the parent directory (project root), not the docker/ folder
  • ${APP_PORT:-3000} — Uses the APP_PORT environment variable if set, defaults to 3000
  • restart: unless-stopped — Container restarts automatically if it crashes (unless you explicitly stop it)
  • healthcheck — Same concept as the Dockerfile HEALTHCHECK, but configured at the Compose level

Step 5: Manual Security Scanning

Before pushing your image to a registry, you should scan it for known vulnerabilities. In production, the CI pipeline (Task F) will automate this. Here, you do it by hand to understand what the automation does.

5.1 — What Is Vulnerability Scanning?

Docker images are built on base images (like nginx:1.27-alpine), which contain OS packages. Those packages may have known security vulnerabilities (CVEs). A scanner compares the packages in your image against public vulnerability databases and reports what it finds.

Your Image: healthpulse-portal:local
├── nginx:1.27-alpine (base)
│   ├── alpine 3.20 (OS)
│   │   ├── openssl 3.1.4  ← CVE-2024-XXXX (HIGH)
│   │   ├── curl 8.5.0     ← no known CVEs
│   │   ├── musl 1.2.5     ← no known CVEs
│   │   └── ...
│   └── nginx 1.27.0       ← CVE-2024-YYYY (MEDIUM)
├── Your static files (dist/)  ← not scanned (no executable code)
└── nginx.conf                 ← not scanned (config file)

5.2 — Option A: Trivy (Recommended — Open Source)

Trivy is the most widely used open-source container scanner. Install it, then scan:

# Install Trivy (macOS)
brew install trivy

# Install Trivy (Linux)
sudo apt-get install -y wget apt-transport-https gnupg lsb-release
wget -qO - https://aquasecurity.github.io/trivy-repo/deb/public.key | sudo apt-key add -
echo "deb https://aquasecurity.github.io/trivy-repo/deb $(lsb_release -sc) main" | sudo tee /etc/apt/sources.list.d/trivy.list
sudo apt-get update && sudo apt-get install -y trivy

# Install Trivy (Windows — via scoop or download binary)
scoop install trivy

Run the scan:

trivy image healthpulse-portal:local

Example output:

healthpulse-portal:local (alpine 3.20.3)

Total: 5 (UNKNOWN: 0, LOW: 2, MEDIUM: 2, HIGH: 1, CRITICAL: 0)

┌──────────────┬────────────────┬──────────┬────────────────┬───────────────┬─────────────────────────────────┐
│   Library    │ Vulnerability  │ Severity │ Installed Ver  │  Fixed Ver    │            Title                │
├──────────────┼────────────────┼──────────┼────────────────┼───────────────┼─────────────────────────────────┤
│ libssl3      │ CVE-2024-XXXX  │ HIGH     │ 3.1.4-r0       │ 3.1.4-r1      │ openssl: buffer overflow in ... │
│ libcrypto3   │ CVE-2024-XXXX  │ HIGH     │ 3.1.4-r0       │ 3.1.4-r1      │ openssl: buffer overflow in ... │
│ curl         │ CVE-2024-YYYY  │ MEDIUM   │ 8.5.0-r0       │ 8.5.1-r0      │ curl: header injection via...   │
│ busybox      │ CVE-2024-ZZZZ  │ LOW      │ 1.36.1-r15     │ 1.36.1-r16    │ busybox: unsafe temp file...    │
│ musl         │ CVE-2024-WWWW  │ LOW      │ 1.2.5-r0       │               │ musl: minor memory leak...      │
└──────────────┴────────────────┴──────────┴────────────────┴───────────────┴─────────────────────────────────┘

5.3 — Option B: Docker Scout

Docker Scout is built into Docker Desktop (v4.17+):

docker scout cves healthpulse-portal:local

Example output:

    i New version 1.14.0 available (installed version is 1.13.0)
    ✓ Image stored for indexing
    ✓ Indexed 45 packages

  ## Overview

                      │ Analyzed Image
  ────────────────────┼──────────────────────────
    Target            │ healthpulse-portal:local
    digest            │ sha256:abc123...
    platform          │ linux/amd64
    vulnerabilities   │ 0C  1H  2M  2L
    size              │ 47 MB
    packages          │ 45

  ## Vulnerabilities

    1H  libssl3       3.1.4-r0   (fixed in 3.1.4-r1)
    1M  curl          8.5.0-r0   (fixed in 8.5.1-r0)
    ...

5.4 — Option C: Snyk CLI

If your organization uses Snyk:

# Install Snyk CLI
npm install -g snyk

# Authenticate
snyk auth

# Scan the image
snyk container test healthpulse-portal:local

5.5 — Understanding Severity Levels

SeverityMeaningAction
CRITICALActively exploited, remote code execution possibleFix immediately — do not deploy
HIGHSerious vulnerability, exploit likely existsFix before production deployment
MEDIUMVulnerability exists, exploit requires specific conditionsFix in next release cycle
LOWMinor issue, theoretical riskTrack and fix when convenient

5.6 — Common Fixes

ProblemFix
Vulnerabilities in base imagePin a specific patched version: FROM nginx:1.27.1-alpine instead of FROM nginx:1.27-alpine
Unnecessary packages in imageRemove packages not needed at runtime: RUN apk del <package>
Image too largeUse multi-stage builds (you already do), use Alpine variants
Outdated base imageUpdate to the latest patch: check Docker Hub for newer tags

5.7 — Document Your Findings

Record the scan results in your MkDocs wiki. Include:

  • Which scanner you used
  • How many vulnerabilities at each severity level
  • Whether fixes are available
  • What actions you would take for each finding

Key insight: This is what the CI pipeline will automate in Task F. In the pipeline, a scan runs on every build, and the build fails if CRITICAL or HIGH vulnerabilities are found. You are doing it by hand first so you understand what the pipeline is checking and why.


Step 6: Tag and Push to Registry

Your image only exists on your local machine. To deploy it elsewhere (k3s cluster, other servers, teammates), you need to push it to a container registry — a centralized store for Docker images.

6.1 — Image Tagging Strategy

Before pushing, tag your image with a version strategy:

# Tag with a specific version
docker tag healthpulse-portal:local <REGISTRY>/healthpulse-portal:1.0.0

# Also tag as latest
docker tag healthpulse-portal:local <REGISTRY>/healthpulse-portal:latest

Why two tags?

TagPurpose
1.0.0Immutable version — this exact build, forever. Used for rollback, auditing, and reproducibility.
latestFloating tag — always points to the most recent build. Convenient but dangerous in production (you don't know exactly which version you're running).

Best practice: Always deploy by version tag (1.0.0), never by latest. Use latest only for development convenience.

6.2 — Option A: JFrog Artifactory (Enterprise Registry)

If your organization uses JFrog Artifactory:

# Set your registry URL (get this from your instructor or Artifactory admin)
REGISTRY="your-artifactory.jfrog.io/healthpulse-docker"

# Tag the image for Artifactory
docker tag healthpulse-portal:local $REGISTRY/healthpulse-portal:1.0.0
docker tag healthpulse-portal:local $REGISTRY/healthpulse-portal:latest

# Log in to Artifactory
docker login your-artifactory.jfrog.io
# → Username: your-username
# → Password: your-api-token (NOT your password — generate a token in Artifactory)

# Push both tags
docker push $REGISTRY/healthpulse-portal:1.0.0
docker push $REGISTRY/healthpulse-portal:latest

Verify in the Artifactory UI:

  1. Open your Artifactory URL in a browser
  2. Navigate to Artifacts → healthpulse-docker → healthpulse-portal
  3. You should see tags 1.0.0 and latest

6.3 — Option B: Docker Hub (Public Registry)

If you are using Docker Hub:

# Your Docker Hub username
DOCKERHUB_USER="your-dockerhub-username"

# Tag the image for Docker Hub
docker tag healthpulse-portal:local $DOCKERHUB_USER/healthpulse-portal:1.0.0
docker tag healthpulse-portal:local $DOCKERHUB_USER/healthpulse-portal:latest

# Log in to Docker Hub
docker login
# → Username: your-dockerhub-username
# → Password: your-access-token (generate at hub.docker.com → Account Settings → Security)

# Push both tags
docker push $DOCKERHUB_USER/healthpulse-portal:1.0.0
docker push $DOCKERHUB_USER/healthpulse-portal:latest

Verify at https://hub.docker.com/r/<your-username>/healthpulse-portal/tags.

This is what the CI pipeline will automate in Task F. On every successful build, the pipeline will: build the image, scan it, tag it with the build number, and push it to the registry. You are doing each step manually so you understand the full flow.


Step 6: Deploy the Application

6.1 — Review What Gets Created

The kubernetes/deployment.yml creates:

  • Deployment with 2 replicas (pods)
  • Each pod runs your Docker image (Nginx + dist/)
  • Health probes: liveness (is the app alive?) and readiness (can it serve traffic?)
  • Rolling update strategy: zero-downtime deploys
  • Resource limits: CPU and memory boundaries

The kubernetes/service.yml creates:

  • ClusterIP Service — internal to the cluster only
  • External traffic reaches the app via Traefik Ingress (see Step 6.4)

Why ClusterIP and not LoadBalancer? k3s has a built-in load balancer (ServiceLB/Klipper) that makes LoadBalancer type services work. However, it assigns the node's IP as the EXTERNAL-IP and tries to bind port 80 on the node — which conflicts with Traefik, which already owns port 80 on every k3s node. The correct k3s pattern is:

ClusterIP Service  ←  Traefik reads Ingress rules and routes to it
     ↑
Traefik (port 80 on node)
     ↑
Browser request to your domain

6.2 — Deploy to Dev

Using the management script:

MASTER_IP=<MASTER_IP> \
NAMESPACE=healthpulse-dev \
VERSION=1.0.0 \
DOCKER_REGISTRY=<ARTIFACTORY_URL> \
./scripts/k8s-manage.sh deploy

Or manually with kubectl:

# Replace the image placeholder and apply
sed "s|ARTIFACTORY_REGISTRY/healthpulse-portal:VERSION_TAG|<ARTIFACTORY_URL>/healthpulse-portal:1.0.0|g; s|namespace: healthpulse-prod|namespace: healthpulse-dev|g" \
  kubernetes/deployment.yml | kubectl apply -f -

sed "s|namespace: healthpulse-prod|namespace: healthpulse-dev|g" \
  kubernetes/service.yml | kubectl apply -f -

6.3 — Watch the Rollout

# Watch pods come up in real-time
kubectl get pods -n healthpulse-dev -w

# Wait for the rollout to complete
kubectl rollout status deployment/healthpulse-portal -n healthpulse-dev --timeout=120s

Expected:

NAME                                  READY   STATUS    RESTARTS   AGE
healthpulse-portal-7f8c9d6b4-abc12   1/1     Running   0          30s
healthpulse-portal-7f8c9d6b4-def34   1/1     Running   0          30s

If pods show ImagePullBackOff: The Artifactory secret is wrong or the image doesn't exist. Check:

kubectl describe pod <POD_NAME> -n healthpulse-dev
# Look at the Events section at the bottom

6.4 — Check the Service

kubectl get svc -n healthpulse-dev
NAME                  TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
healthpulse-service   ClusterIP   10.43.24.101   <none>        80/TCP    1m

EXTERNAL-IP shows <none> — that is correct. This is a ClusterIP service — it only has an internal cluster IP. External traffic reaches it through Traefik Ingress.

6.5 — Apply the Traefik Ingress

This is what exposes the app to the outside world on a proper hostname:

# Apply the ingress for the dev namespace
kubectl apply -f kubernetes/ingress-dev.yml

# Verify it was created
kubectl get ingress -n healthpulse-dev
NAME                 CLASS     HOSTS                         ADDRESS        PORTS   AGE
healthpulse-ingress  traefik   k8s.team-healthpulse.com      10.43.0.1      80      30s

Traffic flow:

Browser → http://k8s.team-healthpulse.com
    → DNS (Route 53) → k3s Master EIP
    → Traefik (port 80 on node)
    → Ingress rule (host match)
    → healthpulse-service (ClusterIP 10.43.x.x:80)
    → Pod (Nginx:80)

Test with curl once DNS is configured (Task E):

curl http://k8s.team-healthpulse.com/health
# → {"status":"healthy"}

If DNS is not yet configured, use one of these approaches:

How kubectl port-forward binds: Port-forward only listens on 127.0.0.1 (loopback) on whichever machine runs the command.

  • Run it on the EC2 → only 127.0.0.1:PORT works on that machine. <MASTER_IP>:PORT from your browser will not work.
  • Run it on your laptop → localhost:PORT in your browser works.
  • Run it on the EC2 with --address 0.0.0.0 → <MASTER_IP>:PORT works, but this is a debug shortcut only.

Option A — LoadBalancer on high port (recommended until DNS is ready):

6.6 — Quick Test Without DNS (LoadBalancer on High Port)

The cleanest option when DNS is not yet configured. Switch the service temporarily to LoadBalancer on a high port — this avoids the Traefik port 80 conflict and is reachable directly at <MASTER_IP>:<PORT>:

# Temporarily patch the service type and port for a quick test
kubectl patch svc healthpulse-service -n healthpulse-dev \
  --type='json' \
  -p='[{"op":"replace","path":"/spec/type","value":"LoadBalancer"},{"op":"replace","path":"/spec/ports/0/port","value":3001}]'

kubectl get svc -n healthpulse-dev
NAME                  TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)          AGE
healthpulse-service   LoadBalancer   10.43.24.101   <NODE_IP>      3001:3XXXX/TCP   1m
curl http://<MASTER_IP>:3001/health
# → {"status":"healthy"}

Then revert back to ClusterIP when done:

kubectl apply -f kubernetes/service.yml -n healthpulse-dev

This is for testing only. Use ClusterIP + Ingress for all real environments. **Option A.1 kubectl port-forward --address 0.0.0.0 svc/healthpulse-service 30001:80

Option B — kubectl port-forward from your laptop (if you want to avoid changing the service):

# 1. Copy kubeconfig from the EC2 master to your laptop
scp ubuntu@<MASTER_IP>:/etc/rancher/k3s/k3s.yaml ~/.kube/healthpulse-config

# 2. Fix the server address (kubeconfig has 127.0.0.1 — replace with public IP)
sed -i 's/127.0.0.1/<MASTER_IP>/g' ~/.kube/healthpulse-config

# 3. Use the config
export KUBECONFIG=~/.kube/healthpulse-config

# 4. Now run port-forward on your LAPTOP — browser can reach localhost
kubectl port-forward svc/healthpulse-service 8080:80 -n healthpulse-dev
# Open: http://localhost:8080

Setting up kubectl on your laptop (Option B) is worth doing for the rest of the capstone — it means you can run all kubectl commands from your machine without SSHing into the EC2 every time.


Step 7: Deploy to Additional Namespaces

# Deploy to QA
MASTER_IP=<MASTER_IP> \
NAMESPACE=healthpulse-qa \
VERSION=1.0.0 \
DOCKER_REGISTRY=<ARTIFACTORY_URL> \
./scripts/k8s-manage.sh deploy

# Deploy to Prod
MASTER_IP=<MASTER_IP> \
NAMESPACE=healthpulse-prod \
VERSION=1.0.0 \
DOCKER_REGISTRY=<ARTIFACTORY_URL> \
./scripts/k8s-manage.sh deploy

Verify all environments:

kubectl get pods -A | grep healthpulse
healthpulse-dev    healthpulse-portal-xxx   1/1   Running   0   5m
healthpulse-dev    healthpulse-portal-xxx   1/1   Running   0   5m
healthpulse-qa     healthpulse-portal-xxx   1/1   Running   0   2m
healthpulse-qa     healthpulse-portal-xxx   1/1   Running   0   2m
healthpulse-prod   healthpulse-portal-xxx   1/1   Running   0   1m
healthpulse-prod   healthpulse-portal-xxx   1/1   Running   0   1m

Key insight: 3 isolated environments running on the same 3-node cluster. With bare-metal (Task G), you'd need 3 separate servers configured identically. With Docker (Task H), you'd need 3 separate hosts running Docker. With Kubernetes, you use namespaces — one cluster, multiple environments, fully isolated.


Step 8: Configure Auto-Scaling (HPA)

8.1 — What is HPA?

HPA (Horizontal Pod Autoscaler) automatically increases or decreases the number of pods (container copies) based on how busy they are.

Simple analogy — a restaurant:

  • Without HPA: You always have 2 waiters, whether it's Monday lunch (empty) or Saturday night (packed). Customers wait, or you're overpaying idle staff.
  • With HPA: You start with 2 waiters. When the restaurant fills up (CPU goes above 70%), a 3rd waiter automatically clocks in. When it's quiet again, the extra waiter goes home.

What it looks like in your cluster:

Normal load (2 pods):
┌─────────┐  ┌─────────┐
│  Pod 1  │  │  Pod 2  │   CPU: 30%  <- comfortably serving traffic
│  Nginx  │  │  Nginx  │
└─────────┘  └─────────┘

Traffic spike hits (HPA scales to 4 pods):
┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐
│  Pod 1  │  │  Pod 2  │  │  Pod 3  │  │  Pod 4  │   CPU: 65% <- handling it
│  Nginx  │  │  Nginx  │  │  Nginx  │  │  Nginx  │
└─────────┘  └─────────┘  └─────────┘  └─────────┘
                                         ^ auto-created by HPA

Traffic drops (HPA scales back to 2 pods):
┌─────────┐  ┌─────────┐
│  Pod 1  │  │  Pod 2  │   CPU: 20%  <- extra pods removed
└─────────┘  └─────────┘

8.2 — Your HPA Configuration (kubernetes/hpa.yml)

minReplicas: 2          # Never go below 2 pods (high availability)
maxReplicas: 6          # Never go above 6 pods (cost control)
metrics:
  - cpu: 70%            # Scale up when average CPU exceeds 70%
  - memory: 80%         # Scale up when average memory exceeds 80%

The decision loop (runs every 15 seconds):

1. Metrics-server collects CPU/memory from all pods
2. HPA checks: is average CPU > 70% or memory > 80%?
   |-- YES --> add pods (up to max 6)
   |-- NO  --> is average CPU < 70% AND traffic low?
               |-- YES --> remove pods (down to min 2)
               |-- NO  --> do nothing

8.3 — Use Case for HealthPulse

ScenarioWhat Happens
Normal day2 pods handle all patient portal traffic
Monday 9 AMPatients check appointments — traffic spikes — HPA scales to 4 pods
Lab results releasedHundreds of patients check at once — HPA scales to 6 pods
2 AMNobody using the portal — HPA scales back to 2 pods
Pod crashesKubernetes restarts the pod AND HPA ensures minimum 2 are always running

8.4 — Why This Matters (Compare with Task G and H)

Bare-Metal (Task G)Docker (Task H)Kubernetes + HPA (Task I)
Traffic spikeServer overloaded, users waitManually start more containersAuto-scales in seconds
Traffic dropsServer idle, still payingManually stop containersAuto-scales down, saves cost
Pod/container diesApp is down until you fix itApp is down until you restartAuto-heals, no human needed

Bottom line: HPA is Kubernetes doing what a human ops engineer would do (add servers when busy, remove when quiet) — but automatically, 24/7, in seconds instead of minutes.

8.5 — Apply HPA

# Apply HPA to dev
sed "s|namespace: healthpulse-prod|namespace: healthpulse-dev|g" \
  kubernetes/hpa.yml | kubectl apply -f -

# Check HPA status
kubectl get hpa -n healthpulse-dev
NAME              REFERENCE                       TARGETS           MINPODS   MAXPODS   REPLICAS   AGE
healthpulse-hpa   Deployment/healthpulse-portal   5%/70%, 12%/80%   2         6         2          30s

Note: k3s ships with metrics-server pre-installed — HPA works out of the box with no extra setup.

8.6 — Test Auto-Scaling (Optional)

Generate some load to see HPA in action:

# In one terminal — watch HPA (it updates every 15 seconds)
kubectl get hpa -n healthpulse-dev -w

# In another terminal — generate load
kubectl run load-test --image=busybox -n healthpulse-dev --restart=Never -- \
  /bin/sh -c "while true; do wget -q -O- http://healthpulse-service/health; done"

# Watch the TARGETS column — CPU will climb, then REPLICAS will increase
# This may take 1-2 minutes

# After watching it scale up, delete the load generator
kubectl delete pod load-test -n healthpulse-dev

# Watch it scale back down (takes ~5 minutes — K8s is cautious about scaling down)

Step 9: Demonstrate Rollback

9.1 — Deploy a "Bad" Version

Make a small visible change to the app, rebuild, push to Artifactory as version 2.0.0, then deploy:

MASTER_IP=<MASTER_IP> \
NAMESPACE=healthpulse-dev \
VERSION=2.0.0 \
DOCKER_REGISTRY=<ARTIFACTORY_URL> \
./scripts/k8s-manage.sh deploy

Verify the new version is running:

kubectl get pods -n healthpulse-dev -o wide

9.2 — Rollback

# Undo the last deployment
kubectl rollout undo deployment/healthpulse-portal -n healthpulse-dev

# Watch the rollback
kubectl rollout status deployment/healthpulse-portal -n healthpulse-dev

# Verify we're back to the previous version
kubectl get pods -n healthpulse-dev -o jsonpath='{.items[0].spec.containers[0].image}'
# → should show version 1.0.0

9.3 — Check Rollout History

kubectl rollout history deployment/healthpulse-portal -n healthpulse-dev
REVISION  CHANGE-CAUSE
1         <none>         ← version 1.0.0
2         <none>         ← version 2.0.0
3         <none>         ← rollback to version 1.0.0

Compare with bare-metal rollback (Task G): Ansible had to find the latest tar backup, extract it, reload Nginx. Kubernetes rollback is instant — it just switches which ReplicaSet is active.


Step 10: Explore the Cluster

These commands help you understand what's running and why:

Pods and Deployments

# Detailed pod info — shows node placement, IP, status
kubectl get pods -n healthpulse-dev -o wide

# Why is this pod on this node? How much CPU/memory is it using?
kubectl describe pod <POD_NAME> -n healthpulse-dev

# Pod logs (like docker logs)
kubectl logs <POD_NAME> -n healthpulse-dev

# Exec into a pod (like docker exec)
kubectl exec -it <POD_NAME> -n healthpulse-dev -- /bin/sh

Services and Networking

# What services exist?
kubectl get svc -n healthpulse-dev

# What endpoints does the service route to?
kubectl get endpoints healthpulse-service -n healthpulse-dev

Resource Usage

# Node resource usage
kubectl top nodes

# Pod resource usage
kubectl top pods -n healthpulse-dev

Cluster-Wide View

# Everything in all healthpulse namespaces
kubectl get all -n healthpulse-dev
kubectl get all -n healthpulse-qa
kubectl get all -n healthpulse-prod

# All pods across all namespaces
kubectl get pods -A

Step 11: Explore with k9s

11.1 — What is k9s?

k9s is a terminal-based UI for Kubernetes. Think of it as a real-time dashboard that runs in your terminal — you can view pods, tail logs, shell into containers, and watch rollouts without typing kubectl commands over and over.

Once you try it, you'll wonder how you managed without it. Most Kubernetes engineers use k9s as their daily driver.

11.2 — Install k9s

You can run k9s in two places — pick one or do both:


Option A: On the k3s Master (Recommended First)

SSH into the master and install k9s directly there. It's a single binary, no package manager needed:

ssh -i ~/.ssh/healthpulse-key.pem ubuntu@<MASTER_IP>

# Download and install the binary
curl -sL https://github.com/derailed/k9s/releases/latest/download/k9s_Linux_amd64.tar.gz \
  | sudo tar xz -C /usr/local/bin k9s

# Verify
k9s version

k3s stores its kubeconfig at /etc/rancher/k3s/k3s.yaml — point k9s at it:

# For the current session only
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml

# Or add to your shell profile so it persists across logins
echo 'export KUBECONFIG=/etc/rancher/k3s/k3s.yaml' >> ~/.bashrc
source ~/.bashrc

Then launch:

k9s

Why this works without any extra setup: You're running k9s on the same machine as the k3s API server. No kubeconfig copying, no IP changes — it just connects to the local cluster immediately.


Option B: On Your Laptop

Install k9s locally and connect to the cluster remotely. Requires kubeconfig to be copied from the EC2 master first (see Step 3).

Mac:

brew install derailed/k9s/k9s

Linux:

curl -sL https://github.com/derailed/k9s/releases/latest/download/k9s_Linux_amd64.tar.gz \
  | sudo tar xz -C /usr/local/bin k9s

Windows:

scoop install k9s
# or
choco install k9s

Then point it at your copied kubeconfig and launch:

export KUBECONFIG=~/.kube/healthpulse-config
k9s

11.3 — Launch k9s

k9s

You'll see a full-screen terminal UI showing your cluster resources. It refreshes in real-time — no need to re-run commands.

11.4 — Key Commands

Navigate k9s using keyboard shortcuts. Press : to open the command prompt, then type a resource name:

Key / CommandWhat It Does
:podsView all pods
:deployView deployments
:svcView services
:ingView ingresses (Traefik routing rules)
:nsView / switch namespaces
:nodeView cluster nodes and resource usage
0Show resources across all namespaces
lTail logs for the selected pod
sShell into the selected pod
dDescribe the selected resource (like kubectl describe)
ctrl-dDelete the selected resource
?Show all keyboard shortcuts
escGo back / close current view
ctrl-cQuit k9s

Tip: When you first open k9s, it shows pods in the default namespace. Type :ns to switch namespaces, then select healthpulse-prod to see your running pods.

11.5 — Exercises

Try these to get comfortable with k9s:

  1. Switch namespaces and view pods — Type :ns, select healthpulse-prod, then type :pods. You should see your running HealthPulse pods.

  2. Tail logs of a running pod — Navigate to a pod and press l. You'll see live Nginx access logs streaming in real-time. Press esc to go back.

  3. Shell into a pod — Select a pod and press s. You're now inside the container. Run nginx -v to confirm Nginx is running, then type exit to leave.

  4. Watch a rolling update in real-time — Open k9s with :pods in one terminal. In another terminal, deploy a new version. Watch k9s as old pods terminate and new pods spin up — you'll see the rolling update happen live.

Bottom line: k9s is your daily driver for Kubernetes — faster than typing kubectl commands, and you get a real-time view of everything happening in your cluster.


Step 12: Document — Docker vs Kubernetes

Add a page to your MkDocs wiki comparing Task H (Docker) and Task I (Kubernetes):

# Kubernetes Deployment — Lessons Learned

## Cluster Architecture
Describe the k3s cluster: master, workers, how they communicate.

## Docker vs Kubernetes Comparison

| Aspect | Docker (Task H) | Kubernetes (Task I) |
|--------|-----------------|---------------------|
| Where it runs | Your local machine | 3-node cluster on AWS |
| Scaling | Manual — start more containers | Automatic — HPA adds pods based on load |
| Self-healing | Container dies → stays dead | Pod dies → Kubernetes restarts it |
| Rolling updates | Stop old, start new (downtime) | Zero-downtime rolling update |
| Rollback | Pull old image, restart manually | `kubectl rollout undo` — instant |
| Load balancing | Not built-in | Built-in service load balancing |
| Multi-environment | Run on different ports/hosts | Namespaces on the same cluster |
| Networking | Port mapping (-p 8080:80) | Services, DNS, automatic discovery |
| Config management | Environment variables, files | ConfigMaps, Secrets |
| Storage | Docker volumes | Persistent Volume Claims |

## Key Kubernetes Concepts I Learned
- [ ] Pods, Deployments, ReplicaSets
- [ ] Services (ClusterIP, LoadBalancer, NodePort)
- [ ] Namespaces for environment isolation
- [ ] HPA for auto-scaling
- [ ] Rolling updates and rollback
- [ ] Resource requests and limits
- [ ] Health probes (liveness, readiness)

Step 13: Cleanup

When you're done with Task I:

cd terraform/k3s
terraform destroy \
  -var-file=dev.tfvars \
  -var="ssh_public_key=$(cat ~/.ssh/healthpulse-key.pub)"

This removes all 3 EC2 instances, the VPC, and all associated resources. Cost drops to $0.


Acceptance Criteria Checklist

  •  3-node k3s cluster operational (kubectl get nodes — all Ready)
  •  Application deployed to all three namespaces (dev, qa, prod)
  •  Service accessible via browser
  •  HPA configured (kubectl get hpa shows targets)
  •  Rollback demonstrated with kubectl rollout undo
  •  Can SSH into master and explain the cluster architecture
  •  Docker vs Kubernetes comparison documented in MkDocs wiki

Instructor Verification

Be prepared to:

  1. Show kubectl get nodes and explain what each node does (master vs worker)
  2. Show pods running in all 3 namespaces and explain namespace isolation
  3. Deploy a new version while the instructor watches — show zero-downtime
  4. Rollback and prove the app reverted
  5. Explain HPA — what triggers scaling, what are the thresholds
  6. Show kubectl describe pod and explain what each section means
  7. Explain the difference: Why Kubernetes over just running Docker?

Troubleshooting

Workers not joining the cluster

# SSH into the worker
ssh -i ~/.ssh/healthpulse-key ubuntu@<WORKER_IP>

# Check the bootstrap log
sudo cat /var/log/cloud-init-output.log

# Check if k3s agent is running
sudo systemctl status k3s-agent

# Check k3s agent logs
sudo journalctl -u k3s-agent -f

Common causes:

  • Master not ready yet (worker tried to join too early) → restart k3s-agent: sudo systemctl restart k3s-agent
  • Security group blocks port 6443 between nodes → check intra-cluster rule

Pods stuck in ImagePullBackOff

kubectl describe pod <POD_NAME> -n healthpulse-dev
# Look at Events section

# Likely causes:
# 1. Image doesn't exist in Artifactory → check the tag
# 2. Pull secret is wrong → recreate it
# 3. Artifactory URL is wrong → check the image reference

Pods stuck in CrashLoopBackOff

# Check pod logs for the error
kubectl logs <POD_NAME> -n healthpulse-dev

# If the pod keeps restarting, check the previous container's logs
kubectl logs <POD_NAME> -n healthpulse-dev --previous

kubectl connection refused from local machine

# 1. Is the kubeconfig pointing to the right IP?
cat ~/.kube/healthpulse-config | grep server
# → should show https://<MASTER_IP>:6443

# 2. Is port 6443 open to your IP?
# Check your IP hasn't changed: curl ifconfig.me
# If it changed, re-apply Terraform with the new IP

# 3. Is k3s running on the master?
ssh -i ~/.ssh/healthpulse-key ubuntu@<MASTER_IP>
sudo systemctl status k3s

Service not accessible from browser

# Check the service
kubectl get svc -n healthpulse-dev

# Check endpoints (are pods connected to the service?)
kubectl get endpoints healthpulse-service -n healthpulse-dev
# Should show pod IPs, not <none>

# Check NodePort
kubectl get svc healthpulse-service -n healthpulse-dev -o jsonpath='{.spec.ports[0].nodePort}'

# Test from inside the cluster (SSH into master)
sudo k3s kubectl run curl-test --image=curlimages/curl --restart=Never -- \
  curl -s http://healthpulse-service.healthpulse-dev.svc.cluster.local/health
sudo k3s kubectl logs curl-test
sudo k3s kubectl delete pod curl-test

HPA shows "unknown" targets

# Check if metrics-server is running
kubectl get pods -n kube-system | grep metrics

# Check if metrics are available
kubectl top pods -n healthpulse-dev
# If this fails, metrics-server may need a minute to collect data

# Check HPA events
kubectl describe hpa healthpulse-hpa -n healthpulse-dev

Key Concepts Reference

ConceptWhat It Means
PodSmallest deployable unit in K8s. One or more containers that share networking and storage.
DeploymentManages a set of identical pods. Handles rolling updates and rollbacks.
ReplicaSetEnsures N pods are running. Created by Deployments. You rarely interact with it directly.
ServiceStable network endpoint for pods. Pods come and go, but the service IP stays the same.
NamespaceVirtual cluster within a cluster. Isolates resources (pods, services, secrets) between environments.
HPAHorizontal Pod Autoscaler. Watches metrics and adjusts replica count automatically.
k3sLightweight Kubernetes distribution. Single binary, certified conformant, built-in extras.
kubeconfigFile that tells kubectl where the cluster is and how to authenticate.
NodePortExposes a service on every node's IP at a specific port (30000–32767).
ServiceLB (Klipper)k3s's built-in load balancer. Makes LoadBalancer type services work without cloud provider integration. Avoid port 80/443 — Traefik already owns those.
ClusterIPDefault service type. Internal cluster IP only — no external access. Use with Traefik Ingress for external routing in k3s.
Traefik Ingressk3s's built-in ingress controller. Routes external HTTP traffic to ClusterIP services based on hostname rules. Owns port 80 (and 443) on every node.
Rolling UpdateGradually replaces old pods with new ones. At no point are zero pods running.
Liveness Probe"Is this pod alive?" If it fails, Kubernetes kills and restarts the pod.
Readiness Probe"Can this pod serve traffic?" If it fails, the pod is removed from the service until it recovers.

TASK I: Kubernetes Monitoring

  TASK K: Kubernetes Monitoring (Prometheus + Grafana + k9s) — Step-by-Step Guide Overview In this task, you add  Kubernetes-native monitori...