Action Plan

Format: Prioritized checklist with owner suggestion, effort estimate, success criteria. Owners are suggestions — adjust to your org structure.

🔴 P0 — Do now (within 24 hours)

P0.1 — Take Jira offline or migrate

Finding: CRZ-006
Owner: IT-ops + CTO decision
Effort: 2–4h to make offline; 1–3 months to migrate to Cloud/DC 10.x
Steps:
1. [ ] Put jira.corezoid.com behind VPN-only (restrict public ingress in ALB listener)
2. [ ] Announce read-only maintenance to users
3. [ ] Begin migration evaluation: Jira Cloud vs Data Center 10.x
4. [ ] Full Jira access-log review for suspicious activity over the last 7.5 years
5. [ ] Assume compromise of any credential pasted into a ticket/comment historically; rotate
Success criteria: jira.corezoid.com not reachable from public IP; migration plan signed off

P0.2 — Rotate all leaked secrets from public repos

Finding: CRZ-009
Owner: DevOps / SRE
Effort: 4–8 hours for rotation; 2 days for CloudTrail audit
Steps:
1. [ ] Deactivate AWS IAM key AKIAYQAMCNBUQ3PY5FO3
2. [ ] CloudTrail search for that Access Key ID for last 400 days
3. [ ] Rotate admin_bearer_token_secret (ungoh3mohM3valu6Zu1ohdiighie1EemoophaequohMoov)
4. [ ] Rotate all postgres admin passwords (NE4wzHnodH9sbTeEfNaxDx23scM0ZaLS76xBiSIBhqT7EL4M3e etc.)
5. [ ] Revoke and re-issue all TLS certs whose private keys appeared in git history; CT-log check for existing certs tied to compromised pubkeys
6. [ ] Rotate all "api key"-flagged tokens from the inventory in CRZ-009
7. [ ] Rotate ArgoCD tokens in terraform/modules/eks/variables.tf (confirm if real or example first)
8. [ ] Audit Ansible box-credentials and postgres reset scripts — rotate
Success criteria: All 41+ listed secrets replaced; CloudTrail shows no unauthorized use

🟠 P1 — Do this week (within 7 days)

P1.1 — Lock down EKS control plane

Finding: CRZ-002
Owner: DevOps / Platform
Effort: 2–4 hours
Steps:
1. [ ] Identify why track.pre.corezoid.com DNS points at an EKS API (likely a Route53 misroute)
2. [ ] Change EKS cluster endpoint access to private or public+allowlist (via aws eks update-cluster-config)
3. [ ] If public+allowlist needed: limit to corporate VPN egress IPs + CI/CD build runners
4. [ ] Add CloudWatch alarm on /healthz, /readyz, /livez request volume from non-corporate IPs
5. [ ] Audit all other *.corezoid.com DNS entries for similar misroutes to cluster endpoints
Success criteria: curl https://track.pre.corezoid.com/ from any public network → connection refused / DNS fails

P1.2 — Remove public SSH on dev hosts

Finding: CRZ-007
Owner: SRE
Effort: 4–8 hours
Steps:
1. [ ] Block port 22 from internet on corezoid-ma.dev.corezoid.com (AWS security group update)
2. [ ] Audit all *.dev.corezoid.com and *.pre.corezoid.com hosts for similar public SSH exposure
3. [ ] Migrate developer access to:
  - AWS SSM Session Manager (no SSH daemon needed), or
  - VPN-only SSH, or
  - Teleport / Tailscale SSH
4. [ ] Upgrade OpenSSH on the host to ≥ 9.8p1 regardless
5. [ ] Audit /etc/ssh/sshd_config for PasswordAuthentication no, PermitRootLogin no
6. [ ] Rotate all SSH keys in ~/.ssh/authorized_keys on the host
Success criteria: nmap -p 22 *.dev.corezoid.com returns filtered/closed from public

Finding: CRZ-008
Owner: Backend (admin.corezoid.com team)
Effort: 1–2 days (including QA)
Steps:
1. [ ] Add explicit SameSite=Strict to mw cookie Set-Cookie header
2. [ ] Require both mw and __Host_mw cookies on ALL authenticated endpoints (currently GET endpoints accept mw alone)
3. [ ] Rename __Host_mw → __Host-mw (hyphen instead of underscore) so browsers enforce host-prefix rules — OR drop the fake-looking name
4. [ ] Change GET /logout to POST /logout with CSRF token
5. [ ] Consider reducing cookie Domain=.corezoid.com → Domain=admin.corezoid.com (blast-radius reduction)
6. [ ] QA cross-origin behavior from a test origin before deploying
Success criteria: Unit test: GET /auth/me with only mw cookie returns 401, not 200

🟡 P2 — Do this sprint (within 2 weeks)

P2.1 — Apply Kubernetes hardening defaults

Finding: CRZ-010
Owner: Platform team
Effort: 1–2 sprints, phased
Phase 1 (this sprint): umbrella chart values.yaml defaults
1. [ ] Set podSecurityContext + securityContext defaults as shown in CRZ-010
2. [ ] Set automountServiceAccountToken: false default; override per-chart where needed
3. [ ] Set CPU/memory requests + limits defaults
4. [ ] Verify in staging; expect some pods to crashloop (root-required workloads); add per-chart overrides
Phase 2: namespace migration 5. [ ] Create namespaces: account, apigw, workers, simulator, db, messaging, observability 6. [ ] Migrate resources off default namespace
Phase 3: network policies 7. [ ] Install Cilium or Calico (if not already) 8. [ ] Default-deny NetworkPolicy per namespace 9. [ ] Allow-list specific service-to-service traffic
Phase 4: supply chain 10. [ ] Pin all images by digest (not tag) 11. [ ] Extend cosign verification (already in account) cluster-wide via Kyverno / Gatekeeper 12. [ ] Add Renovate / digestabot for digest updates
Success criteria: Checkov re-run shows <50 failures (from 377); CI gate fails PRs that regress checkov score

P2.2 — Remove dead DNS/ALB entries

Findings: CRZ-003, CRZ-001
Owner: SRE
Effort: 2–4 hours
Steps:
1. [ ] Remove widget.simulator.company DNS record (or fix the nginx routing)
2. [ ] Remove admin-pre.corezoid.com from public Route53, move to Private Hosted Zone
3. [ ] Audit every other public DNS record — cross-check against current service inventory
4. [ ] Remove entries for: admin-oleg.dev.corezoid.com (if dev no longer active), corezoid-6102-* (old version deployments), corezoid69-* (even older), 4ages.sandbox.corezoid.com (customer sandbox, verify), etc.

P2.3 — Audit doc.corezoid.com Google Doc

Finding: CRZ-004
Owner: Docs team / CTO
Effort: 4 hours for audit; 2 weeks for migration
Steps:
1. [ ] Check share setting on the Google Doc (1-31BBNhy2DUIfu-EljVn3MJr3GSOVqJ3PIwjGPUi3So)
2. [ ] Review revision history for any sensitive content ever pasted
3. [ ] If share is "anyone with link can comment/edit" — lock down immediately
4. [ ] Plan migration to a controlled CMS (Docusaurus / ReadMe.io / Mkdocs-material)
Success criteria: docs are version-controlled and access is managed by a real authorization system

🔵 P3 — Do this quarter (within 90 days)

P3.1 — Org-wide secret management

Owner: Security team
Effort: 1 quarter
Steps:
1. [ ] Install pre-commit hooks (gitleaks, detect-secrets) org-wide
2. [ ] Enable GitHub secret scanning + push protection on all Corezoid repos
3. [ ] Migrate Helm values away from containing secrets — use Sealed Secrets, External Secrets Operator, or Vault
4. [ ] Enforce IAM key rotation policy (90 days max) via AWS Config + Lambda
5. [ ] Quarterly trufflehog+gitleaks sweep of all repos (including archived) — dashboard the results
6. [ ] Publish a "Secrets Handbook" for developers

P3.2 — Close Jira migration + secrets cleanup loop

Complete Jira migration (from P0.1)
git filter-repo all public repos to scrub history (after all secrets rotated)
Confirm no cached references to old secrets in CI/CD logs, Slack backlogs, etc.

P3.3 — Expand testing coverage

Give red team / pentest access to internal GitLab (git.corezoid.com) for the actual service source code — Helm charts alone don't expose business logic
Request fresh HAR captures of mw.simulator.company authenticated sessions for future engagements (the provided HAR was essentially empty)
Commission a Layer 7 pentest with 2-week scope on the Corezoid core API (/api/2/json) — my 1-day engagement covered authz basics but left depth on the table

Tracking

Priority	Count	Findings
🔴 P0	2	CRZ-006 (Jira EOL), CRZ-009 (public-repo secrets)
🟠 P1	5	CRZ-002 (k8s public), CRZ-007 (OpenSSH regreSSHion), CRZ-008 (SameSite), CRZ-011 (VPN TLS), CRZ-015 (postMessage origin)
🟡 P2	4	CRZ-010 (k8s hardening), CRZ-013 (destructive-op audit), CRZ-003 (widget nginx), CRZ-001 (admin-pre DNS)
🔵 P3	4	CRZ-012 (SHA-1 signatures → HMAC-SHA256 migration), CRZ-004 (Google Doc), CRZ-014 (super-user scope), org-wide secret mgmt
⚪ Info	1	CRZ-005 (OpenVPN-AS version — monitor)

Total findings: 15 (1 Critical, 3 High, 5 Medium, 3 Low/Low-Med, 3 Info). Engagement complete; no more findings pending.