Devops Training

HealthPulse Portal — Complete Capstone Project

*HealthPulse Inc. is a healthcare technology startup that has built a patient portal as a React/TypeScript single-page application**. The application allows patients to view appointments, lab results, medications, and communicate with their care team.

Currently, the development team manually builds and deploys the application by:

1. Running `npm run build` on a developer's laptop

2. SCP-ing the `dist/` folder to a single Nginx server

3. SSHing into the server and restarting Nginx

and hosted by their private server

https://healthpulse-capstone.vercel.app/

This process takes 45 minutes per deployment, is error-prone, and has caused 3 production outages in the last quarter from misconfigurations. There is no testing in the pipeline, no code quality checks, no security scanning, and no monitoring.

HealthPulse Inc. has hired your DevOps team to design and implement a complete CI/CD pipeline, multi-environment infrastructure, container orchestration, and observability platform on AWS.

---

## Application Details

| Item | Detail |

|------|--------|

| App Name | HealthPulse Portal |

| Tech Stack | React 18, TypeScript, Vite, shadcn/ui, Tailwind CSS |

| Testing | Vitest (unit), Playwright (e2e) |

| Build Output | Static files (`dist/`) served by Nginx |

| Container | Multi-stage Dockerfile (Node build → Nginx serve) |

| Health Endpoint | `GET /health` → `{"status":"healthy"}` |

Stack: React 18 + TypeScript + Vite + shadcn/ui + Tailwind CSS + Recharts

## Repository Structure

healthpulse-capstone/

├── src/ # Application source code

│ ├── components/ui/ # shadcn/ui components

│ ├── components/layout/ # Layout (Sidebar, Header)

│ ├── pages/ # Login, Dashboard, Appointments, LabResults, etc.

│ ├── data/ # Mock data

│ ├── types/ # TypeScript types

│ ├── lib/ # Utilities

│ └── test/ # Unit tests

├── tests/e2e/ # Playwright e2e tests

## Tools & Technologies

| Category | Tool | Purpose |

|----------|------|---------|

| **CI/CD** | Jenkins OR GitLab CI OR Azure DevOps | Pipeline automation (student chooses one) |

| **Cloud** | AWS (ECS Fargate, ALB, VPC, Route 53) | Infrastructure hosting |

| **IaC** | Terraform | Infrastructure provisioning |

| **Config Mgmt** | Ansible Tower | Application deployment & rollback |

| **Containers** | Docker | Application containerization |

| **Orchestration** | Kubernetes (EKS) | Container orchestration |

| **Artifact Repo** | JFrog Artifactory | Docker images + build artifacts |

| **Code Quality** | SonarQube | Static analysis + code coverage |

| **Security** | Snyk | Dependency vulnerability scanning |

| **Monitoring** | Datadog | Infrastructure + application monitoring |

| **Version Control** | Git (Bitbucket/GitHub/GitLab) | Source code management |

---

### TASK A: Documentation Platform (Docs-as-Code)

Set up a **MkDocs Material** documentation site using the docs-as-code approach. Documentation lives in the Git repository as Markdown files and is built/served via Docker.

#### Why Docs-as-Code?

This is how top DevOps teams (AWS, Kubernetes, Terraform) manage documentation — Markdown files in Git, built by CI, deployed as a static site. You'll use the same multi-stage Docker pattern as the main application.

| Requirement | Detail |

|-------------|--------|

| Tool | MkDocs with Material theme |

| Container Port | `84` |

| Build | Multi-stage Docker (mkdocs build → nginx serve) |

| Dev Mode | `mkdocs serve` with live reload on port `8084` |

| Location | `docs/` directory in the deployment repo |

New File	Purpose
`docs/mkdocs.yml`	MkDocs config with Material theme, dark/light toggle, nav, extensions
`docs/Dockerfile`	Multi-stage build (mkdocs-material → nginx:alpine)
`docs/docker-compose.yml`	Prod on port 84 + live-reload dev mode on port 8084
`docs/docs/index.md`	Home page with project overview, team roster template
`docs/docs/architecture.md`	ADR templates (CI/CD platform + container orchestration)
`docs/docs/environments.md`	Environment matrix table (Dev/UAT/QA/Prod)
`docs/docs/runbooks.md`	4 runbook templates (deploy, rollback, scale, incident)
`docs/docs/pipeline.md`	CI/CD pipeline stage docs with diagrams

#### Required Documentation Pages

| Page | Content |

|------|---------|

| Home (`index.md`) | Project overview, team roster, quick links |

| Architecture Decisions (`architecture.md`) | ADR-001: CI/CD Platform choice, ADR-002: Container orchestration choice |

| Environment Matrix (`environments.md`) | Dev/UAT/QA/Prod table with IPs, URLs, instance sizes |

| Runbooks (`runbooks.md`) | Deploy, rollback, scale, incident response procedures |

| CI/CD Pipeline (`pipeline.md`) | Pipeline stages, tools, configuration notes |

#### Commands

```bash

# Build and serve docs (production)

cd docs && docker-compose up docs-prod

# → Docs at http://localhost:84

# Live reload dev mode

cd docs && docker-compose up docs-dev

# → Docs at http://localhost:8084 (auto-refreshes on file save)

```

**Acceptance Criteria:**

- [ ] MkDocs site builds via multi-stage Dockerfile

- [ ] Docs served on port 84 via docker-compose

- [ ] Live reload dev mode working on port 8084

- [ ] All 5 documentation pages created with real content

- [ ] `mkdocs.yml` and all Markdown files committed to Git

- [ ] Docs auto-build in CI pipeline on changes to `docs/` folder

Summary Task A

TASK A: Documentation Platform (Docs-as-Code)

1. Set up MkDocs with Material theme inside the deployment repo
2. Create a docker-compose.yml to serve docs on port 84
3. Write initial documentation pages:
   - Team roster and roles
   - ADR: "Why we chose [Jenkins/GitLab/Azure DevOps]"
   - Environment matrix (Dev/UAT/QA/Prod)
   - Runbook template
4. Build docs via Docker (multi-stage: mkdocs build → nginx serve)
5. CI pipeline auto-builds docs site on push to /docs folder

Acceptance Criteria:
- Docs served on port 84 via Docker
- mkdocs.yml and all markdown files committed to Git
- Multi-stage Dockerfile builds and serves the docs
- 4 documentation pages created with real content

MORE ABOUT MKDOCS

1. Live Reload Dev Mode

When writing documentation (editing the Markdown files), you need to see how their changes look in real-time. That's what dev mode does:

Student edits runbooks.md → saves file → browser auto-refreshes → sees updated page instantly

Without dev mode: Edit markdown → rebuild Docker image → restart container → refresh browser → check result. That's painful and slow.

With dev mode: MkDocs watches the files. The second you hit save, the browser updates automatically. It's the same concept as npm run dev for the React app — hot reload for docs.

In the docker-compose.yml, there are two services:

Service	Port	Purpose
`docs-prod`	84	Built static site served by Nginx (what users/team see)
`docs-dev`	8084	Live preview with auto-refresh (only used while writing docs)

Students use 8084 while writing, then build and deploy to 84 for production. It's a workflow thing — not two permanent servers.

2. Runbook Template

A runbook is an operational instruction manual — step-by-step procedures for when things happen in production. Think of it like a recipe book, but for servers.

Every real DevOps team has them. Eg When it's 2 AM and production is down, you don't want the on-call engineer guessing — you want them following a tested checklist.

Here's An Example of what the you would fill in as you complete the project:

RUNBOOK: Deploy New Version
═══════════════════════════
When to use:  New release ready for production
Who can run:  DevOps team lead

Steps:
  1. Verify build passed in Jenkins → check #healthpulse-builds Slack
  2. Confirm SonarQube quality gate passed
  3. Approve deployment in pipeline (manual gate)
  4. Monitor Datadog dashboard during rollout
  5. Verify /health endpoint returns 200
  6. If health check fails → pipeline auto-rolls back via Ansible

───────────────────────────

RUNBOOK: Rollback Production
════════════════════════════
When to use:  Production deployment caused errors
Who can run:  Any DevOps team member

Steps:
  1. Run: ./scripts/k8s-manage.sh rollback
     OR: Trigger Ansible Tower rollback job
  2. Verify previous version is serving traffic
  3. Check Datadog for error rate returning to normal
  4. Post incident summary in wiki

───────────────────────────

RUNBOOK: Scale Application
══════════════════════════
When to use:  High traffic / slow response times
Who can run:  Any DevOps team member

Steps:
  1. Check Datadog → confirm CPU/memory is the bottleneck
  2. Run: REPLICAS=6 ./scripts/k8s-manage.sh scale
  3. Monitor HPA: kubectl get hpa -n healthpulse-prod
  4. Scale back down after traffic normalizes

TIPS: REPO:

https://github.com/princexav/mkdocs

CHANGE PORT 100 - 84

healthpulse-docs/
├── mkdocs.yml                  # Site config + navigation
├── Dockerfile                  # Multi-stage build (mkdocs → nginx)
├── docker-compose.yml          # Prod (port 84) + dev (port 8084)
└── docs/
    ├── index.md                # Home — project overview, team roster, quick links
    ├── architecture.md         # ADR templates (CI/CD choice, orchestration choice)
    ├── environments.md         # Environment matrix (IPs, URLs, sizing)
    ├── pipeline.md             # CI/CD pipeline stages and config
    ├── setup-template.md       # Reusable template — copy for each tool install
    ├── runbooks.md             # Deploy, rollback, scale, health check procedures
    ├── incidents.md            # Incident log template — track issues + root causes
    └── changelog.md            # Weekly progress log — what was built, when, by whom

How Students Use It

Page	When
Setup Template	Copy to `setup-jenkins.md`, `setup-sonarqube.md`, `setup-artifactory.md`, `setup-ansible-tower.md`, `setup-datadog.md` — one per tool they install. Documents every command they ran.
Runbooks	Fill in real commands and URLs as they complete Tasks F-H
Incident Log	Every time something breaks during the project, they log it
Changelog	Weekly entries tracking progress across all tasks
Architecture/Environments/Pipeline	Fill in as they make decisions and provision infrastructure

One template, students create as many copies as they need. Keeps it simple.

The docs site is fully self-contained — it'll build and run independently:

https://github.com/princexav/mkdocs

cd healthpulse-docs
docker compose up docs-prod   # → port 84
docker compose up docs-dev    # → port 8084 (live reload)

TASK B: Version Control & Code Security

Plan & Code

App Name: Healthpulse

WorkStation A- Team Pipeline Pirates - 3.15.209.165
WorkStation B - Team DevopsAvengers - 3.143.221.53
WorkStation C- Team Devius - 3.142.240.0

Developer Workstations are windows machines, Your Project Supervisor will provide you their ip/dns and credentials you will use to log into the machine assigned to ur group: You can use Mobaxterm or RemoteDesktop to connect. The Username is Administrator

When you access the Developer workstation assigned to your group, you will find the code base in the below location:

This PC:---->Desktop---->healthpulseapp

B.1 — Repository Setup

Create two repositories:

Repository	Purpose	Access
`HealthPulse_App`	Application source code	Developers
`HealthPulse_Deployment`	IaC, Ansible, pipelines, scripts	DevOps team

B.2 — Branching Strategy

Implement GitFlow in the App repository:

main ─────────────────────────────────────────►
  └── develop ─────────────────────────────────►
        ├── feature/login-page ──► (merge to develop)
        ├── feature/dashboard ───► (merge to develop)
        └── release/1.0.0 ───────► (merge to main + develop)

B.3 — Repository Security (Layer 1 & Layer 3)

Secure your repo:

READ ABOUT HOOKS: https://www.devopstreams.com/2022/06/git-hooks-simplified.html

Repository security follows a defense-in-depth approach with 3 layers. In this task you set up Layer 1 (local hooks) and Layer 3 (branch protection). Layer 2 (gitleaks in the CI pipeline) comes later in Task F once the pipeline exists.

Layer 1 (this task):  Local hooks      → fast feedback for developers
Layer 2 (Task F):     CI pipeline scan  → server-side safety net
Layer 3 (this task):  Branch protection → platform-enforced rules

Layer 1: Local Git Hooks (pre-commit + pre-push)

Install pre-commit and pre-push hooks so developers get early feedback when they accidentally commit secrets. Understand that developers can bypass these with --no-verify — that's why Layer 3 exists.

Hook	Tool	Purpose
pre-commit	`detect-secrets`	Scans staged changes for secrets using entropy + pattern analysis
pre-push	custom script	Warns on direct push to `main`/`develop`

Use the provided .pre-commit-config.yaml and scripts/setup-git-hooks.sh.

# Step 1: Install the pre-commit framework
curl -O https://raw.githubusercontent.com/princexav/security/refs/heads/main/.pre-commit-config.yaml

pip install pre-commit

# Step 2: Install hooks into the repo
pre-commit install

# Step 3: Test it — this should be BLOCKED
echo "AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" >> test.txt
git add test.txt && git commit -m "test secret"
# Expected: detect-secrets blocks the commit

# Step 4: Clean up
git checkout -- test.txt

# Step 5: Test the pre-push hook
git checkout main
git push origin main
# Expected: Warning message about direct push to protected branch

Key lesson: Run git commit --no-verify -m "test" and notice the hook is skipped entirely. This is why local hooks alone are NOT enough — you need Layer 3.

Layer 3: Branch Protection Rules (platform-level — cannot be bypassed)

Configure these in your Git hosting platform (GitHub / GitLab / Bitbucket). Unlike hooks, these are enforced by the server — no developer can skip them.

Rule	Setting
Require pull request before merging	`main` and `develop`
Require at least 1 approval	`main` and `develop`
Do not allow bypassing the above	Even admins must follow the rules

Note: The rule "Require CI status checks to pass" will be added in Task F once your pipeline is built. For now, configure the PR and approval requirements.

# Test it — this should be REJECTED by the platform
git checkout main
git commit --allow-empty -m "testing direct push"
git push origin main
# Expected: Rejected — branch protection requires a pull request

Acceptance Criteria:

TASK C: Bare-Metal Deployment (Nginx on EC2)

Before containers, deploy the application the traditional way — built files served directly by Nginx on an EC2 instance. This teaches what containers replace and why they exist.

C.1 — Provision the Server (Terraform)

Use the provided terraform/baremetal/ configuration to create a VPC, subnet, and EC2 instance with Nginx pre-installed.

See GUIDES:

https://www.devopstreams.com/2026/03/aws-credentials-setup-best-practices.html for IAM setup.

https://www.devopstreams.com/2026/03/task-c-bare-metal-deployment-nginx-on.html Step by Step Guide

https://github.com/princexav/mkdocs/tree/main/baremetal TERRAFORM FILES

cd terraform/baremetal
terraform init
terraform plan -var-file=dev.tfvars -var="ssh_public_key=$(cat ~/.ssh/healthpulse-key.pub)"
terraform apply -var-file=dev.tfvars -var="ssh_public_key=$(cat ~/.ssh/healthpulse-key.pub)"

What Terraform creates:

Resource	Detail
VPC + Subnet	Isolated network with internet gateway and route table
EC2 Instance	Ubuntu 22.04, t2.micro
Nginx	Installed and configured via user_data bootstrap
Security Group	Ports 22 (SSH), 80 (HTTP), 443 (HTTPS)
Elastic IP	Static public IP
Nginx Config	SPA fallback, gzip, security headers, `/health` endpoint
Deploy Path	`/var/www/healthpulse`

Detailed walkthrough: See guides/TASK-G-GUIDE.md for step-by-step instructions.

Manual deploy (for learning):

# SSH into the server
ssh -i ~/.ssh/healthpulse-key.pem ubuntu@<ELASTIC_IP>

# On the server — this is what Ansible automates
cd /var/www/healthpulse
# Copy dist/ files here
sudo systemctl reload nginx

# Verify
curl http://localhost/health
# → {"status":"healthy","deploy":"baremetal"}

Acceptance Criteria:

EC2 instance provisioned via Terraform with Nginx running
Application accessible at http://<ELASTIC_IP>
Health check returns 200 at /health
Pain points documented in MkDocs wiki
SSH into the server and explain what Nginx is serving and from where

TASK D: Set up your Infrastructure

1. Create a 3 node cluster (Use K3s) (1 master + 2 workers) Using Terraform

See Guide https://www.devopstreams.com/2026/04/task-d-kubernetes-deployment-k3s-on-ec2.html

Terraform Scripts k3s: https://github.com/princexav/mkdocs/tree/main/k3s

2. Devops Tools

Tool	Instance Type	Purpose
Jenkins/Gitlab/github Actions/AzureDevops	t2.large	CI/CD server
SonarQube	t2.xlarge	Code analysis
Ansible Tower	t2.2xlarge	Configuration management
JFrog Artifactory	t2.2xlarge	Artifact repository

Acceptance Criteria:

k3s cluster provisioned with 3 nodes (1 master + 2 workers)kubectl get nodes — all Ready
kubectl get nodes shows all nodes Ready
Infrastructure tagged properly(All Names Spaces Created for Dev, Qa, Prod)
Can terraform destroy and re-create cleanly
HPA configured (kubectl get hpa shows targets)
Can SSH into master and explain the cluster architecture
documention in MkDocs wiki and devops tools set up

TASK E: Monitoring & Observability (Datadog)

Install and configure Datadog agents on all servers.

Use the provided monitoring/datadog/datadog-agent-setup.yml Ansible playbook.

Requirement	Detail
Infrastructure metrics	CPU, memory, disk, network
Container monitoring	Docker container metrics
Process monitoring	Running process visibility
Server tagging	`app:healthpulse`, `env:<environment>`, `team:<team-name>`

Acceptance Criteria:

Datadog agent running on all servers
Infrastructure metrics visible in Datadog dashboard
Containers monitored with docker integration
Process-level monitoring enabled
All servers tagged and filterable by environment

TASK F: DNS & Domain

Requirement	Detail
Domain	e.g., `team-healthpulse.com`
DNS Provider	Route 53 (preferred), GoDaddy, etc.
Records	A/CNAME pointing to ALB
Environments	`dev.team-healthpulse.com`, `uat.team-healthpulse.com`, `team-healthpulse.com`

Acceptance Criteria:

Domain registered
DNS records pointing to load balancers
Application accessible via domain name

See guide: https://www.devopstreams.com/2026/05/task-f-guide.html

Terraform Script: https://github.com/princexav/mkdocs/tree/main/dns

TASK H: Containerization & Image Management (Docker)

Now take the same application you deployed as bare files and package it into a Docker container. Build it, run it locally, scan it for vulnerabilities, push it to a registry, then deploy it to your k3s cluster manually.

Detailed walkthrough:

See Guide https://www.devopstreams.com/2026/05/task-h-guide-dockerize-app.html

.Docker files:
https://github.com/princexav/mkdocs/tree/main/docker

Why manual first? Every step you do by hand here becomes an automated pipeline stage in Task F. When the pipeline breaks, you'll know how to debug it — because you've done each step yourself.

H.1 — Understand the Dockerfile

Review the provided docker/Dockerfile:

Stage 1: Node 20 Alpine
  ├── corepack enable (activate pnpm)
  ├── pnpm install --frozen-lockfile
  └── pnpm build → produces dist/

Stage 2: Nginx Alpine
  ├── Copy dist/ from Stage 1
  ├── Copy custom nginx.conf
  └── Expose port 80

Key concept: The entire build environment (Node, pnpm, dependencies) exists only in Stage 1 and is discarded. The final image is just Nginx + your static files — small, fast, and secure.

H.2 — Build and Run Locally

# Build the Docker image
docker build -t healthpulse-portal:local -f docker/Dockerfile .

# Run it locally
docker run -d --name healthpulse -p 8080:80 healthpulse-portal:local

# Test it
curl http://localhost:8080/health
# → {"status":"healthy"}

# Open in browser
# → http://localhost:8080

# Check the running container
docker ps
docker logs healthpulse

# Stop and remove
docker stop healthpulse && docker rm healthpulse

H.3 — Use Docker Compose

# Start with docker-compose (uses docker/docker-compose.yml)
docker compose -f docker/docker-compose.yml up -d

# Check status
docker compose -f docker/docker-compose.yml ps

# View logs
docker compose -f docker/docker-compose.yml logs -f

# Tear down
docker compose -f docker/docker-compose.yml down

H.4 — Manual Security Scanning

Before pushing your image to a registry, scan it for vulnerabilities. This is what the CI pipeline will automate in Task F — do it manually first so you understand the output.

# Option 1: Trivy (open-source, recommended)
# Install: https://aquasecurity.github.io/trivy/
trivy image healthpulse-portal:local

# Option 2: Docker Scout (built into Docker Desktop)
docker scout cves healthpulse-portal:local

# Option 3: Snyk CLI (if installed)
snyk container test healthpulse-portal:local

What to look for:

Severity	Action
CRITICAL	Must fix — update base image or package
HIGH	Should fix — update if feasible
MEDIUM	Note and track — fix when time allows
LOW	Acceptable risk for a capstone

Common fixes:

Update FROM nginx:alpine to FROM nginx:alpine3.20 (pin the version)
Remove unnecessary packages in the final stage
Use --no-cache in apk add to reduce attack surface

Document your findings in MkDocs: what vulnerabilities did you find? What did you fix? What was acceptable risk?

H.5 — Tag and Push to Registry

Choose one registry — either your team's Artifactory or Docker Hub:

Option A: JFrog Artifactory (enterprise registry)

docker tag healthpulse-portal:local <ARTIFACTORY_URL>/healthpulse-portal:1.0.0
docker tag healthpulse-portal:local <ARTIFACTORY_URL>/healthpulse-portal:latest
docker login <ARTIFACTORY_URL>
docker push <ARTIFACTORY_URL>/healthpulse-portal:1.0.0
docker push <ARTIFACTORY_URL>/healthpulse-portal:latest

Option B: Docker Hub (public registry)

docker tag healthpulse-portal:local <DOCKERHUB_USERNAME>/healthpulse-portal:1.0.0
docker tag healthpulse-portal:local <DOCKERHUB_USERNAME>/healthpulse-portal:latest
docker login
docker push <DOCKERHUB_USERNAME>/healthpulse-portal:1.0.0
docker push <DOCKERHUB_USERNAME>/healthpulse-portal:latest

Note: The CI pipeline (Task F) automates this on every build. Here you're doing it manually to understand the process.

H.6 — Deploy to k3s Manually

Now pull your image from the registry and deploy it to the k3s cluster by hand:

export KUBECONFIG=~/.kube/healthpulse-config

# Create an image pull secret (if using private registry)
kubectl create secret docker-registry regcred \
  --docker-server=<REGISTRY_URL> \
  --docker-username=<USERNAME> \
  --docker-password=<PASSWORD> \
  -n healthpulse-dev

# Apply deployment (update image in deployment.yml first)
kubectl apply -f kubernetes/deployment.yml -n healthpulse-dev
kubectl apply -f kubernetes/service.yml -n healthpulse-dev

# Watch pods come up
kubectl get pods -n healthpulse-dev -w

# Test
curl http://<K3S_MASTER_IP>:<NODE_PORT>/health

This is the manual version of what the pipeline will automate. Feel how many commands it takes — that's why CI/CD exists.

H.7 — Compare: Bare-Metal vs Container

After running both ways, document the comparison in your MkDocs wiki:

Aspect	Bare-Metal (Task G)	Container (Task H)
Server setup	Install Node, Nginx, configure manually	`docker run` — everything is inside the image
Build output	`dist/` folder copied to server	Docker image with Nginx + dist/ baked in
Deploy time	Minutes (download, extract, reload Nginx)	Seconds (pull image, start container)
Rollback	Restore from tar backup	`docker run previous-image:tag`
Environment parity	Hope configs match across servers	Guaranteed — same image everywhere
Dependencies	Installed on the OS — can conflict	Isolated inside the container
Reproducibility	"Works on my machine" problems	Same image runs everywhere
Security scanning	Manual audit of server packages	`trivy image` — automated CVE check
Cleanup	Files scattered across the OS	`docker rm` — clean removal

Acceptance Criteria:

TASK I: Kubernetes Monitoring (Prometheus + Grafana)

Now that your applications are running on k3s, add Kubernetes-native monitoring using Prometheus and Grafana. This complements Datadog (Task D) by providing deep visibility into pod-level metrics, deployment health, and cluster performance.

Detailed walkthrough: See guides/TASK-I-GUIDE.md for the complete step-by-step guide.

https://www.devopstreams.com/2026/05/task-i-kubernetes-monitoring.html

Datadog vs Prometheus — why both?

	Datadog (Task D)	Prometheus + Grafana (Task K)
Scope	Infrastructure (OS-level)	Kubernetes (pod/container-level)
Runs where	Agent on each server → SaaS cloud	Inside the k3s cluster
Metrics	CPU, memory, disk, network, processes	Pod resource usage, deployment health, HPA scaling, request rates
Dashboards	Datadog web console	Grafana (self-hosted on k3s)
Cost	Free tier (5 hosts) → paid	Free (open source)
Industry	Used alongside Prometheus in most orgs	Standard for Kubernetes monitoring

I.1 — Install via Helm

# Install Helm (if not installed)
# https://helm.sh/docs/intro/install/

# Add the Prometheus community Helm chart repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install the kube-prometheus-stack (includes Prometheus + Grafana + Node Exporter)
helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  --set grafana.adminPassword=healthpulse123

This single command installs:

Component	Purpose
Prometheus	Scrapes and stores metrics from all k8s components
Grafana	Visualization dashboards
Node Exporter	Hardware/OS metrics from each node
kube-state-metrics	Kubernetes object metrics (pods, deployments, etc.)
Alertmanager	Alert routing and notifications

I.2 — Access Grafana

# Port-forward Grafana to your local machine
kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80

# Open in browser: http://localhost:3000
# Login: admin / healthpulse123

I.3 — Explore Built-in Dashboards

The Helm chart includes pre-built dashboards. Navigate to Dashboards in Grafana and explore:

Dashboard	What It Shows
Kubernetes / Compute Resources / Namespace (Pods)	CPU + memory per pod, per namespace
Kubernetes / Compute Resources / Node (Pods)	Which pods are using resources on each node
Node Exporter / Nodes	OS-level metrics per node (CPU, memory, disk, network)
Kubernetes / Networking / Namespace (Pods)	Network traffic per pod

I.4 — Monitor Your HealthPulse Deployment

Go to the Namespace (Pods) dashboard
Select namespace: healthpulse-prod
You'll see CPU and memory usage for your HealthPulse pods
Deploy a new version and watch the metrics change in real-time

I.5 — Create a Custom Dashboard

Create a dashboard with these panels:

Pod count by namespace — how many pods per environment
CPU usage by pod — which pods are consuming resources
Memory usage trend — are pods leaking memory over time?
Pod restart count — are pods crash-looping?
HPA replica count — is the autoscaler active?

I.6 — Explore with k9s + Prometheus

Use k9s to cross-reference what Prometheus reports:

k9s
# :pods → see pod status
# Compare with Grafana dashboards — do the numbers match?

Acceptance Criteria:

Prometheus + Grafana installed on k3s via Helm
Grafana accessible and pre-built dashboards visible
HealthPulse pod metrics visible in Grafana (CPU, memory)
Custom dashboard created with at least 4 panels
Can explain: what does Prometheus scrape? How does Grafana query it?
Datadog vs Prometheus comparison documented in MkDocs wiki

Sunday, 15 March 2026

HealthPulse Portal — Complete Capstone Project

HealthPulse Portal — Complete Capstone Project

*HealthPulse Inc.** is a healthcare technology startup that has built a patient portal as a **React/TypeScript single-page application**. The application allows patients to view appointments, lab results, medications, and communicate with their care team.

Currently, the development team **manually builds and deploys** the application by:

1. Running `npm run build` on a developer's laptop

2. SCP-ing the `dist/` folder to a single Nginx server

3. SSHing into the server and restarting Nginx

and hosted by their private server

https://healthpulse-capstone.vercel.app/

This process takes **45 minutes per deployment**, is error-prone, and has caused **3 production outages** in the last quarter from misconfigurations. There is **no testing in the pipeline**, **no code quality checks**, **no security scanning**, and **no monitoring**.

**HealthPulse Inc. has hired your DevOps team** to design and implement a complete CI/CD pipeline, multi-environment infrastructure, container orchestration, and observability platform on **AWS**.

---

## Application Details

| Item | Detail |

|------|--------|

| **App Name** | HealthPulse Portal |

| **Tech Stack** | React 18, TypeScript, Vite, shadcn/ui, Tailwind CSS |

| **Testing** | Vitest (unit), Playwright (e2e) |

| **Build Output** | Static files (`dist/`) served by Nginx |

| **Container** | Multi-stage Dockerfile (Node build → Nginx serve) |

| **Health Endpoint** | `GET /health` → `{"status":"healthy"}` |

Stack: React 18 + TypeScript + Vite + shadcn/ui + Tailwind CSS + Recharts

Summary Task A

1. Live Reload Dev Mode

2. Runbook Template

How Students Use It

TASK B: Version Control & Code Security

B.1 — Repository Setup

B.2 — Branching Strategy

B.3 — Repository Security (Layer 1 & Layer 3)

TASK C: Bare-Metal Deployment (Nginx on EC2)

C.1 — Provision the Server (Terraform)

TASK E: Monitoring & Observability (Datadog)

TASK F: DNS & Domain

TASK H: Containerization & Image Management (Docker)

H.1 — Understand the Dockerfile

H.2 — Build and Run Locally

H.3 — Use Docker Compose

H.4 — Manual Security Scanning

H.5 — Tag and Push to Registry

H.6 — Deploy to k3s Manually

H.7 — Compare: Bare-Metal vs Container

TASK I: Kubernetes Monitoring (Prometheus + Grafana)

I.1 — Install via Helm

I.2 — Access Grafana

I.3 — Explore Built-in Dashboards

I.4 — Monitor Your HealthPulse Deployment

I.5 — Create a Custom Dashboard

I.6 — Explore with k9s + Prometheus

TASK I: Kubernetes Monitoring

*HealthPulse Inc. is a healthcare technology startup that has built a patient portal as a React/TypeScript single-page application**. The application allows patients to view appointments, lab results, medications, and communicate with their care team.

Currently, the development team manually builds and deploys the application by:

This process takes 45 minutes per deployment, is error-prone, and has caused 3 production outages in the last quarter from misconfigurations. There is no testing in the pipeline, no code quality checks, no security scanning, and no monitoring.

HealthPulse Inc. has hired your DevOps team to design and implement a complete CI/CD pipeline, multi-environment infrastructure, container orchestration, and observability platform on AWS.

| App Name | HealthPulse Portal |

| Tech Stack | React 18, TypeScript, Vite, shadcn/ui, Tailwind CSS |

| Testing | Vitest (unit), Playwright (e2e) |

| Build Output | Static files (`dist/`) served by Nginx |

| Container | Multi-stage Dockerfile (Node build → Nginx serve) |

| Health Endpoint | `GET /health` → `{"status":"healthy"}` |