Sunday, 15 March 2026

HealthPulse Portal — Complete Capstone Project

 

HealthPulse Portal — Complete Capstone Project



*HealthPulse Inc.** is a healthcare technology startup that has built a patient portal as a **React/TypeScript single-page application**. The application allows patients to view appointments, lab results, medications, and communicate with their care team.


Currently, the development team **manually builds and deploys** the application by:

1. Running `npm run build` on a developer's laptop

2. SCP-ing the `dist/` folder to a single Nginx server

3. SSHing into the server and restarting Nginx

and hosted by their private server

https://healthpulse-capstone.vercel.app/

This process takes **45 minutes per deployment**, is error-prone, and has caused **3 production outages** in the last quarter from misconfigurations. There is **no testing in the pipeline**, **no code quality checks**, **no security scanning**, and **no monitoring**.


**HealthPulse Inc. has hired your DevOps team** to design and implement a complete CI/CD pipeline, multi-environment infrastructure, container orchestration, and observability platform on **AWS**.


---

## Application Details


| Item | Detail |

|------|--------|

| **App Name** | HealthPulse Portal |

| **Tech Stack** | React 18, TypeScript, Vite, shadcn/ui, Tailwind CSS |

| **Testing** | Vitest (unit), Playwright (e2e) |

| **Build Output** | Static files (`dist/`) served by Nginx |

| **Container** | Multi-stage Dockerfile (Node build → Nginx serve) |

| **Health Endpoint** | `GET /health` → `{"status":"healthy"}` |



Stack: React 18 + TypeScript + Vite + shadcn/ui + Tailwind CSS + Recharts



## Repository Structure

healthpulse-capstone/
├── src/                        # Application source code
│   ├── components/ui/          # shadcn/ui components
│   ├── components/layout/      # Layout (Sidebar, Header)
│   ├── pages/                  # Login, Dashboard, Appointments, LabResults, etc.
│   ├── data/                   # Mock data
│   ├── types/                  # TypeScript types
│   ├── lib/                    # Utilities
│   └── test/                   # Unit tests
├── tests/e2e/                  # Playwright e2e tests

## Tools & Technologies

| Category | Tool | Purpose |
|----------|------|---------|
| **CI/CD** | Jenkins OR GitLab CI OR Azure DevOps | Pipeline automation (student chooses one) |
| **Cloud** | AWS (ECS Fargate, ALB, VPC, Route 53) | Infrastructure hosting |
| **IaC** | Terraform | Infrastructure provisioning |
| **Config Mgmt** | Ansible Tower | Application deployment & rollback |
| **Containers** | Docker | Application containerization |
| **Orchestration** | Kubernetes (EKS) | Container orchestration |
| **Artifact Repo** | JFrog Artifactory | Docker images + build artifacts |
| **Code Quality** | SonarQube | Static analysis + code coverage |
| **Security** | Snyk | Dependency vulnerability scanning |
| **Monitoring** | Datadog | Infrastructure + application monitoring |
| **Version Control** | Git (Bitbucket/GitHub/GitLab) | Source code management |

---






### TASK A: Documentation Platform (Docs-as-Code)

Set up a **MkDocs Material** documentation site using the docs-as-code approach. Documentation lives in the Git repository as Markdown files and is built/served via Docker.

#### Why Docs-as-Code?
This is how top DevOps teams (AWS, Kubernetes, Terraform) manage documentation — Markdown files in Git, built by CI, deployed as a static site. You'll use the same multi-stage Docker pattern as the main application.

| Requirement | Detail |
|-------------|--------|
| Tool | MkDocs with Material theme |
| Container Port | `84` |
| Build | Multi-stage Docker (mkdocs build → nginx serve) |
| Dev Mode | `mkdocs serve` with live reload on port `8084` |
| Location | `docs/` directory in the deployment repo |


New FilePurpose
docs/mkdocs.ymlMkDocs config with Material theme, dark/light toggle, nav, extensions
docs/DockerfileMulti-stage build (mkdocs-material → nginx:alpine)
docs/docker-compose.ymlProd on port 84 + live-reload dev mode on port 8084
docs/docs/index.mdHome page with project overview, team roster template
docs/docs/architecture.mdADR templates (CI/CD platform + container orchestration)
docs/docs/environments.mdEnvironment matrix table (Dev/UAT/QA/Prod)
docs/docs/runbooks.md4 runbook templates (deploy, rollback, scale, incident)
docs/docs/pipeline.mdCI/CD pipeline stage docs with diagrams




#### Required Documentation Pages

| Page | Content |
|------|---------|
| Home (`index.md`) | Project overview, team roster, quick links |
| Architecture Decisions (`architecture.md`) | ADR-001: CI/CD Platform choice, ADR-002: Container orchestration choice |
| Environment Matrix (`environments.md`) | Dev/UAT/QA/Prod table with IPs, URLs, instance sizes |
| Runbooks (`runbooks.md`) | Deploy, rollback, scale, incident response procedures |
| CI/CD Pipeline (`pipeline.md`) | Pipeline stages, tools, configuration notes |

#### Commands
```bash
# Build and serve docs (production)
cd docs && docker-compose up docs-prod
# → Docs at http://localhost:84

# Live reload dev mode
cd docs && docker-compose up docs-dev
# → Docs at http://localhost:8084 (auto-refreshes on file save)
```

**Acceptance Criteria:**
- [ ] MkDocs site builds via multi-stage Dockerfile
- [ ] Docs served on port 84 via docker-compose
- [ ] Live reload dev mode working on port 8084
- [ ] All 5 documentation pages created with real content
- [ ] `mkdocs.yml` and all Markdown files committed to Git
- [ ] Docs auto-build in CI pipeline on changes to `docs/` folder


Summary Task A 

TASK A: Documentation Platform (Docs-as-Code)

1. Set up MkDocs with Material theme inside the deployment repo
2. Create a docker-compose.yml to serve docs on port 84
3. Write initial documentation pages:
   - Team roster and roles
   - ADR: "Why we chose [Jenkins/GitLab/Azure DevOps]"
   - Environment matrix (Dev/UAT/QA/Prod)
   - Runbook template
4. Build docs via Docker (multi-stage: mkdocs build → nginx serve)
5. CI pipeline auto-builds docs site on push to /docs folder

Acceptance Criteria:
- Docs served on port 84 via Docker
- mkdocs.yml and all markdown files committed to Git
- Multi-stage Dockerfile builds and serves the docs
- 4 documentation pages created with real content


MORE ABOUT MKDOCS  

1. Live Reload Dev Mode

When writing documentation (editing the Markdown files), you need to see how their changes look in real-time. That's what dev mode does:

Student edits runbooks.md → saves file → browser auto-refreshes → sees updated page instantly

Without dev mode: Edit markdown → rebuild Docker image → restart container → refresh browser → check result. That's painful and slow.

With dev mode: MkDocs watches the files. The second you hit save, the browser updates automatically. It's the same concept as npm run dev for the React app — hot reload for docs.

In the docker-compose.yml, there are two services:

ServicePortPurpose
docs-prod84Built static site served by Nginx (what users/team see)
docs-dev8084Live preview with auto-refresh (only used while writing docs)

Students use 8084 while writing, then build and deploy to 84 for production. It's a workflow thing — not two permanent servers.

2. Runbook Template

A runbook is an operational instruction manual — step-by-step procedures for when things happen in production. Think of it like a recipe book, but for servers.

Every real DevOps team has them. Eg When it's 2 AM and production is down, you don't want the on-call engineer guessing — you want them following a tested checklist.

Here's An Example of what the you would fill in as you complete the project:

RUNBOOK: Deploy New Version
═══════════════════════════
When to use:  New release ready for production
Who can run:  DevOps team lead

Steps:
  1. Verify build passed in Jenkins → check #healthpulse-builds Slack
  2. Confirm SonarQube quality gate passed
  3. Approve deployment in pipeline (manual gate)
  4. Monitor Datadog dashboard during rollout
  5. Verify /health endpoint returns 200
  6. If health check fails → pipeline auto-rolls back via Ansible

───────────────────────────

RUNBOOK: Rollback Production
════════════════════════════
When to use:  Production deployment caused errors
Who can run:  Any DevOps team member

Steps:
  1. Run: ./scripts/k8s-manage.sh rollback
     OR: Trigger Ansible Tower rollback job
  2. Verify previous version is serving traffic
  3. Check Datadog for error rate returning to normal
  4. Post incident summary in wiki

───────────────────────────

RUNBOOK: Scale Application
══════════════════════════
When to use:  High traffic / slow response times
Who can run:  Any DevOps team member

Steps:
  1. Check Datadog → confirm CPU/memory is the bottleneck
  2. Run: REPLICAS=6 ./scripts/k8s-manage.sh scale
  3. Monitor HPA: kubectl get hpa -n healthpulse-prod
  4. Scale back down after traffic normalizes

TIPS: REPO: 
https://github.com/princexav/mkdocs

 CHANGE PORT 100 - 84

healthpulse-docs/
├── mkdocs.yml                  # Site config + navigation
├── Dockerfile                  # Multi-stage build (mkdocs → nginx)
├── docker-compose.yml          # Prod (port 84) + dev (port 8084)
└── docs/
    ├── index.md                # Home — project overview, team roster, quick links
    ├── architecture.md         # ADR templates (CI/CD choice, orchestration choice)
    ├── environments.md         # Environment matrix (IPs, URLs, sizing)
    ├── pipeline.md             # CI/CD pipeline stages and config
    ├── setup-template.md       # Reusable template — copy for each tool install
    ├── runbooks.md             # Deploy, rollback, scale, health check procedures
    ├── incidents.md            # Incident log template — track issues + root causes
    └── changelog.md            # Weekly progress log — what was built, when, by whom

How Students Use It

PageWhen
Setup TemplateCopy to setup-jenkins.mdsetup-sonarqube.mdsetup-artifactory.mdsetup-ansible-tower.mdsetup-datadog.md — one per tool they install. Documents every command they ran.
RunbooksFill in real commands and URLs as they complete Tasks F-H
Incident LogEvery time something breaks during the project, they log it
ChangelogWeekly entries tracking progress across all tasks
Architecture/Environments/PipelineFill in as they make decisions and provision infrastructure

One template, students create as many copies as they need. Keeps it simple.

The docs site is fully self-contained — it'll build and run independently:

https://github.com/princexav/mkdocs


cd healthpulse-docs
docker compose up docs-prod   # → port 84
docker compose up docs-dev    # → port 8084 (live reload)




TASK B: Version Control & Code Security

Plan & Code

App Name: Healthpulse


  • WorkStation A- Team Pipeline Pirates - 3.15.209.165
  • WorkStation B - Team DevopsAvengers - 3.143.221.53
  • WorkStation C- Team Devius - 3.142.240.0
Developer Workstations are windows machines, Your Project Supervisor will provide you their ip/dns and credentials you will use to log into the machine assigned to ur group: You can use Mobaxterm or RemoteDesktop to connect. The Username is Administrator

When you access the Developer workstation assigned to your group, you will find the code base in the below location:
This PC:---->Desktop---->healthpulseapp

B.1 — Repository Setup

Create two repositories:

RepositoryPurposeAccess
HealthPulse_AppApplication source codeDevelopers
HealthPulse_DeploymentIaC, Ansible, pipelines, scriptsDevOps team

B.2 — Branching Strategy

Implement GitFlow in the App repository:

main ─────────────────────────────────────────►
  └── develop ─────────────────────────────────►
        ├── feature/login-page ──► (merge to develop)
        ├── feature/dashboard ───► (merge to develop)
        └── release/1.0.0 ───────► (merge to main + develop)

B.3 — Repository Security (Layer 1 & Layer 3)

Secure your repo:

Repository security follows a defense-in-depth approach with 3 layers. In this task you set up Layer 1 (local hooks) and Layer 3 (branch protection). Layer 2 (gitleaks in the CI pipeline) comes later in Task F once the pipeline exists.

Layer 1 (this task):  Local hooks      → fast feedback for developers
Layer 2 (Task F):     CI pipeline scan  → server-side safety net
Layer 3 (this task):  Branch protection → platform-enforced rules

Layer 1: Local Git Hooks (pre-commit + pre-push)

Install pre-commit and pre-push hooks so developers get early feedback when they accidentally commit secrets. Understand that developers can bypass these with --no-verify — that's why Layer 3 exists.

HookToolPurpose
pre-commitdetect-secretsScans staged changes for secrets using entropy + pattern analysis
pre-pushcustom scriptWarns on direct push to main/develop

Use the provided .pre-commit-config.yaml and scripts/setup-git-hooks.sh.

# Step 1: Install the pre-commit framework
curl -O https://raw.githubusercontent.com/princexav/security/refs/heads/main/.pre-commit-config.yaml


pip install pre-commit

# Step 2: Install hooks into the repo
pre-commit install

# Step 3: Test it — this should be BLOCKED
echo "AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" >> test.txt
git add test.txt && git commit -m "test secret"
# Expected: detect-secrets blocks the commit

# Step 4: Clean up
git checkout -- test.txt

# Step 5: Test the pre-push hook
git checkout main
git push origin main
# Expected: Warning message about direct push to protected branch

Key lesson: Run git commit --no-verify -m "test" and notice the hook is skipped entirely. This is why local hooks alone are NOT enough — you need Layer 3.

Layer 3: Branch Protection Rules (platform-level — cannot be bypassed)

Configure these in your Git hosting platform (GitHub / GitLab / Bitbucket). Unlike hooks, these are enforced by the server — no developer can skip them.

RuleSetting
Require pull request before mergingmain and develop
Require at least 1 approvalmain and develop
Do not allow bypassing the aboveEven admins must follow the rules

Note: The rule "Require CI status checks to pass" will be added in Task F once your pipeline is built. For now, configure the PR and approval requirements.

# Test it — this should be REJECTED by the platform
git checkout main
git commit --allow-empty -m "testing direct push"
git push origin main
# Expected: Rejected — branch protection requires a pull request

Acceptance Criteria:

  •  Both repos created with proper access controls
  •  GitFlow branching strategy demonstrated (main, develop, feature/, release/)
  •  SSH key authentication configured for repo access
  •  pre-commit install runs successfully and hooks are active
  •  Demonstrate: committing a fake AWS key is blocked by detect-secrets
  •  Demonstrate: --no-verify bypasses the hook (explain why this matters)
  •  Demonstrate: pre-push hook warns on direct push to main
  •  Branch protection rules configured on main and develop (screenshot required)
  •  PR requires at least 1 approval before merge
  •  Direct push to main is rejected by the platform (not just the hook)
  •  Document the security setup in your MkDocs wiki

TASK C: Bare-Metal Deployment (Nginx on EC2)

Before containers, deploy the application the traditional way — built files served directly by Nginx on an EC2 instance. This teaches what containers replace and why they exist.

C.1 — Provision the Server (Terraform)

Use the provided terraform/baremetal/ configuration to create a VPC, subnet, and EC2 instance with Nginx pre-installed. 

See GUIDES:

 https://www.devopstreams.com/2026/03/aws-credentials-setup-best-practices.html for IAM setup.

https://www.devopstreams.com/2026/03/task-c-bare-metal-deployment-nginx-on.html Step by Step Guide

https://github.com/princexav/mkdocs/tree/main/baremetal  TERRAFORM FILES

cd terraform/baremetal
terraform init
terraform plan -var-file=dev.tfvars -var="ssh_public_key=$(cat ~/.ssh/healthpulse-key.pub)"
terraform apply -var-file=dev.tfvars -var="ssh_public_key=$(cat ~/.ssh/healthpulse-key.pub)"

What Terraform creates:

ResourceDetail
VPC + SubnetIsolated network with internet gateway and route table
EC2 InstanceUbuntu 22.04, t2.micro
NginxInstalled and configured via user_data bootstrap
Security GroupPorts 22 (SSH), 80 (HTTP), 443 (HTTPS)
Elastic IPStatic public IP
Nginx ConfigSPA fallback, gzip, security headers, /health endpoint
Deploy Path/var/www/healthpulse

Detailed walkthrough: See guides/TASK-G-GUIDE.md for step-by-step instructions.

Manual deploy (for learning):

# SSH into the server
ssh -i ~/.ssh/healthpulse-key.pem ubuntu@<ELASTIC_IP>

# On the server — this is what Ansible automates
cd /var/www/healthpulse
# Copy dist/ files here
sudo systemctl reload nginx

# Verify
curl http://localhost/health
# → {"status":"healthy","deploy":"baremetal"}

Acceptance Criteria:

  •  EC2 instance provisioned via Terraform with Nginx running
  •  Application accessible at http://<ELASTIC_IP>
  •  Health check returns 200 at /health
  •  Pain points documented in MkDocs wiki
  •  SSH into the server and explain what Nginx is serving and from where

TASK D: Set up your Infrastructure


1. Create a 3 node cluster (Use K3s) (1 master + 2 workers) Using Terraform

2. Devops Tools

  • ToolInstance TypePurpose
    Jenkins/Gitlab/github Actions/AzureDevopst2.largeCI/CD server
    SonarQube
    t2.xlargeCode analysis
    Ansible Towert2.2xlargeConfiguration management
    JFrog Artifactoryt2.2xlargeArtifact repository

    Acceptance Criteria:

    •  k3s cluster provisioned with 3 nodes (1 master + 2 workers)kubectl get nodes — all Ready
    •  kubectl get nodes shows all nodes Ready
    •  Infrastructure tagged properly(All Names Spaces Created for Dev, Qa, Prod)
    •  Can terraform destroy and re-create cleanly
    •  HPA configured (kubectl get hpa shows targets)
    •  Can SSH into master and explain the cluster architecture
    •  documention in MkDocs wiki and devops tools set up
  • TASK E: Monitoring & Observability (Datadog)

    Install and configure Datadog agents on all servers.

    Use the provided monitoring/datadog/datadog-agent-setup.yml Ansible playbook.

    RequirementDetail
    Infrastructure metricsCPU, memory, disk, network
    Container monitoringDocker container metrics
    Process monitoringRunning process visibility
    Server taggingapp:healthpulseenv:<environment>team:<team-name>

    Acceptance Criteria:

    •  Datadog agent running on all servers
    •  Infrastructure metrics visible in Datadog dashboard
    •  Containers monitored with docker integration
    •  Process-level monitoring enabled
    •  All servers tagged and filterable by environment

    TASK F: DNS & Domain

    Register a team domain and configure DNS.

    RequirementDetail
    Domaine.g., team-healthpulse.com
    DNS ProviderRoute 53 (preferred), GoDaddy, etc.
    RecordsA/CNAME pointing to ALB
    Environmentsdev.team-healthpulse.comuat.team-healthpulse.comteam-healthpulse.com

    Acceptance Criteria:

    •  Domain registered
    •  DNS records pointing to load balancers
    •  Application accessible via domain name

See guide: https://www.devopstreams.com/2026/05/task-f-guide.html
Terraform Script: https://github.com/princexav/mkdocs/tree/main/dns


TASK H: Containerization & Image Management (Docker)

Now take the same application you deployed as bare files and package it into a Docker container. Build it, run it locally, scan it for vulnerabilities, push it to a registry, then deploy it to your k3s cluster manually.

Detailed walkthrough:

 

 See Guide https://www.devopstreams.com/2026/05/task-h-guide-dockerize-app.html 

 

.Docker files

https://github.com/princexav/mkdocs/tree/main/docker

Why manual first? Every step you do by hand here becomes an automated pipeline stage in Task F. When the pipeline breaks, you'll know how to debug it — because you've done each step yourself.

H.1 — Understand the Dockerfile

Review the provided docker/Dockerfile:

Stage 1: Node 20 Alpine
  ├── corepack enable (activate pnpm)
  ├── pnpm install --frozen-lockfile
  └── pnpm build → produces dist/

Stage 2: Nginx Alpine
  ├── Copy dist/ from Stage 1
  ├── Copy custom nginx.conf
  └── Expose port 80

Key concept: The entire build environment (Node, pnpm, dependencies) exists only in Stage 1 and is discarded. The final image is just Nginx + your static files — small, fast, and secure.

H.2 — Build and Run Locally

# Build the Docker image
docker build -t healthpulse-portal:local -f docker/Dockerfile .

# Run it locally
docker run -d --name healthpulse -p 8080:80 healthpulse-portal:local

# Test it
curl http://localhost:8080/health
# → {"status":"healthy"}

# Open in browser
# → http://localhost:8080

# Check the running container
docker ps
docker logs healthpulse

# Stop and remove
docker stop healthpulse && docker rm healthpulse

H.3 — Use Docker Compose

# Start with docker-compose (uses docker/docker-compose.yml)
docker compose -f docker/docker-compose.yml up -d

# Check status
docker compose -f docker/docker-compose.yml ps

# View logs
docker compose -f docker/docker-compose.yml logs -f

# Tear down
docker compose -f docker/docker-compose.yml down

H.4 — Manual Security Scanning

Before pushing your image to a registry, scan it for vulnerabilities. This is what the CI pipeline will automate in Task F — do it manually first so you understand the output.

# Option 1: Trivy (open-source, recommended)
# Install: https://aquasecurity.github.io/trivy/
trivy image healthpulse-portal:local

# Option 2: Docker Scout (built into Docker Desktop)
docker scout cves healthpulse-portal:local

# Option 3: Snyk CLI (if installed)
snyk container test healthpulse-portal:local

What to look for:

SeverityAction
CRITICALMust fix — update base image or package
HIGHShould fix — update if feasible
MEDIUMNote and track — fix when time allows
LOWAcceptable risk for a capstone

Common fixes:

  • Update FROM nginx:alpine to FROM nginx:alpine3.20 (pin the version)
  • Remove unnecessary packages in the final stage
  • Use --no-cache in apk add to reduce attack surface

Document your findings in MkDocs: what vulnerabilities did you find? What did you fix? What was acceptable risk?

H.5 — Tag and Push to Registry

Choose one registry — either your team's Artifactory or Docker Hub:

Option A: JFrog Artifactory (enterprise registry)

docker tag healthpulse-portal:local <ARTIFACTORY_URL>/healthpulse-portal:1.0.0
docker tag healthpulse-portal:local <ARTIFACTORY_URL>/healthpulse-portal:latest
docker login <ARTIFACTORY_URL>
docker push <ARTIFACTORY_URL>/healthpulse-portal:1.0.0
docker push <ARTIFACTORY_URL>/healthpulse-portal:latest

Option B: Docker Hub (public registry)

docker tag healthpulse-portal:local <DOCKERHUB_USERNAME>/healthpulse-portal:1.0.0
docker tag healthpulse-portal:local <DOCKERHUB_USERNAME>/healthpulse-portal:latest
docker login
docker push <DOCKERHUB_USERNAME>/healthpulse-portal:1.0.0
docker push <DOCKERHUB_USERNAME>/healthpulse-portal:latest

Note: The CI pipeline (Task F) automates this on every build. Here you're doing it manually to understand the process.

H.6 — Deploy to k3s Manually

Now pull your image from the registry and deploy it to the k3s cluster by hand:

export KUBECONFIG=~/.kube/healthpulse-config

# Create an image pull secret (if using private registry)
kubectl create secret docker-registry regcred \
  --docker-server=<REGISTRY_URL> \
  --docker-username=<USERNAME> \
  --docker-password=<PASSWORD> \
  -n healthpulse-dev

# Apply deployment (update image in deployment.yml first)
kubectl apply -f kubernetes/deployment.yml -n healthpulse-dev
kubectl apply -f kubernetes/service.yml -n healthpulse-dev

# Watch pods come up
kubectl get pods -n healthpulse-dev -w

# Test
curl http://<K3S_MASTER_IP>:<NODE_PORT>/health

This is the manual version of what the pipeline will automate. Feel how many commands it takes — that's why CI/CD exists.

H.7 — Compare: Bare-Metal vs Container

After running both ways, document the comparison in your MkDocs wiki:

AspectBare-Metal (Task G)Container (Task H)
Server setupInstall Node, Nginx, configure manuallydocker run — everything is inside the image
Build outputdist/ folder copied to serverDocker image with Nginx + dist/ baked in
Deploy timeMinutes (download, extract, reload Nginx)Seconds (pull image, start container)
RollbackRestore from tar backupdocker run previous-image:tag
Environment parityHope configs match across serversGuaranteed — same image everywhere
DependenciesInstalled on the OS — can conflictIsolated inside the container
Reproducibility"Works on my machine" problemsSame image runs everywhere
Security scanningManual audit of server packagestrivy image — automated CVE check
CleanupFiles scattered across the OSdocker rm — clean removal

Acceptance Criteria:

  •  Docker image builds successfully with docker build
  •  Application runs locally via docker run and is accessible at http://localhost:8080
  •  Health check returns 200 at /health
  •  Navigate through the app — all pages work (SPA routing via Nginx)
  •  Manual vulnerability scan completed (Trivy, Docker Scout, or Snyk)
  •  Scan findings documented — what was found, what was fixed, what was accepted
  •  Image pushed to registry (Artifactory or Docker Hub) with version tag
  •  Image pulled from registry and deployed to k3s cluster manually
  •  Bare-metal vs container comparison documented in MkDocs wiki
  •  Can explain: what is in the final Docker image? What was discarded?

TASK I: Kubernetes Monitoring (Prometheus + Grafana)

Now that your applications are running on k3s, add Kubernetes-native monitoring using Prometheus and Grafana. This complements Datadog (Task D) by providing deep visibility into pod-level metrics, deployment health, and cluster performance.

Detailed walkthrough: See guides/TASK-I-GUIDE.md for the complete step-by-step guide.

 https://www.devopstreams.com/2026/05/task-i-kubernetes-monitoring.html

Datadog vs Prometheus — why both?

Datadog (Task D)Prometheus + Grafana (Task K)
ScopeInfrastructure (OS-level)Kubernetes (pod/container-level)
Runs whereAgent on each server → SaaS cloudInside the k3s cluster
MetricsCPU, memory, disk, network, processesPod resource usage, deployment health, HPA scaling, request rates
DashboardsDatadog web consoleGrafana (self-hosted on k3s)
CostFree tier (5 hosts) → paidFree (open source)
IndustryUsed alongside Prometheus in most orgsStandard for Kubernetes monitoring

I.1 — Install via Helm

# Install Helm (if not installed)
# https://helm.sh/docs/intro/install/

# Add the Prometheus community Helm chart repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install the kube-prometheus-stack (includes Prometheus + Grafana + Node Exporter)
helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  --set grafana.adminPassword=healthpulse123

This single command installs:

ComponentPurpose
PrometheusScrapes and stores metrics from all k8s components
GrafanaVisualization dashboards
Node ExporterHardware/OS metrics from each node
kube-state-metricsKubernetes object metrics (pods, deployments, etc.)
AlertmanagerAlert routing and notifications

I.2 — Access Grafana

# Port-forward Grafana to your local machine
kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80

# Open in browser: http://localhost:3000
# Login: admin / healthpulse123

I.3 — Explore Built-in Dashboards

The Helm chart includes pre-built dashboards. Navigate to Dashboards in Grafana and explore:

DashboardWhat It Shows
Kubernetes / Compute Resources / Namespace (Pods)CPU + memory per pod, per namespace
Kubernetes / Compute Resources / Node (Pods)Which pods are using resources on each node
Node Exporter / NodesOS-level metrics per node (CPU, memory, disk, network)
Kubernetes / Networking / Namespace (Pods)Network traffic per pod

I.4 — Monitor Your HealthPulse Deployment

  1. Go to the Namespace (Pods) dashboard
  2. Select namespace: healthpulse-prod
  3. You'll see CPU and memory usage for your HealthPulse pods
  4. Deploy a new version and watch the metrics change in real-time

I.5 — Create a Custom Dashboard

Create a dashboard with these panels:

  1. Pod count by namespace — how many pods per environment
  2. CPU usage by pod — which pods are consuming resources
  3. Memory usage trend — are pods leaking memory over time?
  4. Pod restart count — are pods crash-looping?
  5. HPA replica count — is the autoscaler active?

I.6 — Explore with k9s + Prometheus

Use k9s to cross-reference what Prometheus reports:

k9s
# :pods → see pod status
# Compare with Grafana dashboards — do the numbers match?

Acceptance Criteria:

  •  Prometheus + Grafana installed on k3s via Helm
  •  Grafana accessible and pre-built dashboards visible
  •  HealthPulse pod metrics visible in Grafana (CPU, memory)
  •  Custom dashboard created with at least 4 panels
  •  Can explain: what does Prometheus scrape? How does Grafana query it?
  •  Datadog vs Prometheus comparison documented in MkDocs wiki

TASK I: Kubernetes Monitoring

  TASK K: Kubernetes Monitoring (Prometheus + Grafana + k9s) — Step-by-Step Guide Overview In this task, you add  Kubernetes-native monitori...