Action Plan
Format: Prioritized checklist with owner suggestion, effort estimate, success criteria. Owners are suggestions โ adjust to your org structure.
๐ด P0 โ Do now (within 24 hours)
P0.1 โ Take Jira offline or migrate
- Finding: CRZ-006
- Owner: IT-ops + CTO decision
- Effort: 2โ4h to make offline; 1โ3 months to migrate to Cloud/DC 10.x
- Steps:
- [ ] Put
jira.corezoid.combehind VPN-only (restrict public ingress in ALB listener) - [ ] Announce read-only maintenance to users
- [ ] Begin migration evaluation: Jira Cloud vs Data Center 10.x
- [ ] Full Jira access-log review for suspicious activity over the last 7.5 years
- [ ] Assume compromise of any credential pasted into a ticket/comment historically; rotate
- [ ] Put
- Success criteria:
jira.corezoid.comnot reachable from public IP; migration plan signed off
P0.2 โ Rotate all leaked secrets from public repos
- Finding: CRZ-009
- Owner: DevOps / SRE
- Effort: 4โ8 hours for rotation; 2 days for CloudTrail audit
- Steps:
- [ ] Deactivate AWS IAM key
AKIAYQAMCNBUQ3PY5FO3 - [ ] CloudTrail search for that Access Key ID for last 400 days
- [ ] Rotate
admin_bearer_token_secret(ungoh3mohM3valu6Zu1ohdiighie1EemoophaequohMoov) - [ ] Rotate all postgres admin passwords
(
NE4wzHnodH9sbTeEfNaxDx23scM0ZaLS76xBiSIBhqT7EL4M3eetc.) - [ ] Revoke and re-issue all TLS certs whose private keys appeared in git history; CT-log check for existing certs tied to compromised pubkeys
- [ ] Rotate all "api key"-flagged tokens from the inventory in CRZ-009
- [ ] Rotate ArgoCD tokens in
terraform/modules/eks/variables.tf(confirm if real or example first) - [ ] Audit Ansible box-credentials and postgres reset scripts โ rotate
- [ ] Deactivate AWS IAM key
- Success criteria: All 41+ listed secrets replaced; CloudTrail shows no unauthorized use
๐ P1 โ Do this week (within 7 days)
P1.1 โ Lock down EKS control plane
- Finding: CRZ-002
- Owner: DevOps / Platform
- Effort: 2โ4 hours
- Steps:
- [ ] Identify why
track.pre.corezoid.comDNS points at an EKS API (likely a Route53 misroute) - [ ] Change EKS cluster endpoint access to private
or public+allowlist (via
aws eks update-cluster-config) - [ ] If public+allowlist needed: limit to corporate VPN egress IPs + CI/CD build runners
- [ ] Add CloudWatch alarm on
/healthz,/readyz,/livezrequest volume from non-corporate IPs - [ ] Audit all other
*.corezoid.comDNS entries for similar misroutes to cluster endpoints
- [ ] Identify why
- Success criteria:
curl https://track.pre.corezoid.com/from any public network โ connection refused / DNS fails
P1.2 โ Remove public SSH on dev hosts
- Finding: CRZ-007
- Owner: SRE
- Effort: 4โ8 hours
- Steps:
- [ ] Block port 22 from internet on
corezoid-ma.dev.corezoid.com(AWS security group update) - [ ] Audit all
*.dev.corezoid.comand*.pre.corezoid.comhosts for similar public SSH exposure - [ ] Migrate developer access to:
- AWS SSM Session Manager (no SSH daemon needed), or
- VPN-only SSH, or
- Teleport / Tailscale SSH
- [ ] Upgrade OpenSSH on the host to โฅ 9.8p1 regardless
- [ ] Audit
/etc/ssh/sshd_configforPasswordAuthentication no,PermitRootLogin no - [ ] Rotate all SSH keys in
~/.ssh/authorized_keyson the host
- [ ] Block port 22 from internet on
- Success criteria:
nmap -p 22 *.dev.corezoid.comreturns filtered/closed from public
P1.3 โ Fix auth cookie SameSite
- Finding: CRZ-008
- Owner: Backend (admin.corezoid.com team)
- Effort: 1โ2 days (including QA)
- Steps:
- [ ] Add explicit
SameSite=Stricttomwcookie Set-Cookie header - [ ] Require both
mwand__Host_mwcookies on ALL authenticated endpoints (currently GET endpoints acceptmwalone) - [ ] Rename
__Host_mwโ__Host-mw(hyphen instead of underscore) so browsers enforce host-prefix rules โ OR drop the fake-looking name - [ ] Change
GET /logouttoPOST /logoutwith CSRF token - [ ] Consider reducing cookie
Domain=.corezoid.comโDomain=admin.corezoid.com(blast-radius reduction) - [ ] QA cross-origin behavior from a test origin before deploying
- [ ] Add explicit
- Success criteria: Unit test:
GET /auth/mewith onlymwcookie returns 401, not 200
๐ก P2 โ Do this sprint (within 2 weeks)
P2.1 โ Apply Kubernetes hardening defaults
- Finding: CRZ-010
- Owner: Platform team
- Effort: 1โ2 sprints, phased
- Phase 1 (this sprint): umbrella chart
values.yamldefaults- [ ] Set
podSecurityContext+securityContextdefaults as shown in CRZ-010 - [ ] Set
automountServiceAccountToken: falsedefault; override per-chart where needed - [ ] Set CPU/memory requests + limits defaults
- [ ] Verify in staging; expect some pods to crashloop (root-required workloads); add per-chart overrides
- [ ] Set
- Phase 2: namespace migration 5. [ ] Create
namespaces:
account,apigw,workers,simulator,db,messaging,observability6. [ ] Migrate resources offdefaultnamespace - Phase 3: network policies 7. [ ] Install Cilium or Calico (if not already) 8. [ ] Default-deny NetworkPolicy per namespace 9. [ ] Allow-list specific service-to-service traffic
- Phase 4: supply chain 10. [ ] Pin all images by
digest (not tag) 11. [ ] Extend cosign verification (already in
account) cluster-wide via Kyverno / Gatekeeper 12. [ ] Add Renovate / digestabot for digest updates - Success criteria: Checkov re-run shows <50 failures (from 377); CI gate fails PRs that regress checkov score
P2.2 โ Remove dead DNS/ALB entries
- Findings: CRZ-003, CRZ-001
- Owner: SRE
- Effort: 2โ4 hours
- Steps:
- [ ] Remove
widget.simulator.companyDNS record (or fix the nginx routing) - [ ] Remove
admin-pre.corezoid.comfrom public Route53, move to Private Hosted Zone - [ ] Audit every other public DNS record โ cross-check against current service inventory
- [ ] Remove entries for:
admin-oleg.dev.corezoid.com(if dev no longer active),corezoid-6102-*(old version deployments),corezoid69-*(even older),4ages.sandbox.corezoid.com(customer sandbox, verify), etc.
- [ ] Remove
P2.3 โ Audit doc.corezoid.com Google Doc
- Finding: CRZ-004
- Owner: Docs team / CTO
- Effort: 4 hours for audit; 2 weeks for migration
- Steps:
- [ ] Check share setting on the Google Doc
(
1-31BBNhy2DUIfu-EljVn3MJr3GSOVqJ3PIwjGPUi3So) - [ ] Review revision history for any sensitive content ever pasted
- [ ] If share is "anyone with link can comment/edit" โ lock down immediately
- [ ] Plan migration to a controlled CMS (Docusaurus / ReadMe.io / Mkdocs-material)
- [ ] Check share setting on the Google Doc
(
- Success criteria: docs are version-controlled and access is managed by a real authorization system
๐ต P3 โ Do this quarter (within 90 days)
P3.1 โ Org-wide secret management
- Owner: Security team
- Effort: 1 quarter
- Steps:
- [ ] Install pre-commit hooks (
gitleaks,detect-secrets) org-wide - [ ] Enable GitHub secret scanning + push protection on all Corezoid repos
- [ ] Migrate Helm values away from containing secrets โ use Sealed Secrets, External Secrets Operator, or Vault
- [ ] Enforce IAM key rotation policy (90 days max) via AWS Config + Lambda
- [ ] Quarterly trufflehog+gitleaks sweep of all repos (including archived) โ dashboard the results
- [ ] Publish a "Secrets Handbook" for developers
- [ ] Install pre-commit hooks (
P3.2 โ Close Jira migration + secrets cleanup loop
P3.3 โ Expand testing coverage
Tracking
| Priority | Count | Findings |
|---|---|---|
| ๐ด P0 | 2 | CRZ-006 (Jira EOL), CRZ-009 (public-repo secrets) |
| ๐ P1 | 5 | CRZ-002 (k8s public), CRZ-007 (OpenSSH regreSSHion), CRZ-008 (SameSite), CRZ-011 (VPN TLS), CRZ-015 (postMessage origin) |
| ๐ก P2 | 4 | CRZ-010 (k8s hardening), CRZ-013 (destructive-op audit), CRZ-003 (widget nginx), CRZ-001 (admin-pre DNS) |
| ๐ต P3 | 4 | CRZ-012 (SHA-1 signatures โ HMAC-SHA256 migration), CRZ-004 (Google Doc), CRZ-014 (super-user scope), org-wide secret mgmt |
| โช Info | 1 | CRZ-005 (OpenVPN-AS version โ monitor) |
Total findings: 15 (1 Critical, 3 High, 5 Medium, 3 Low/Low-Med, 3 Info). Engagement complete; no more findings pending.