← StepBinderSample export · fictional data
Deploy ERP Patch 26B-CPU to Production · Change Control
Change Control
Deploy ERP Patch 26B-CPU to Production
Acme Corporation
Apply the Q1 2026 critical patch update to the ERP production stack using a rolling-node deployment. Implementation window 2026-05-09 02:00–04:00 UTC. Expected customer-visible downtime under 90 seconds.

Change IDCHG-2026-0142
OwnerPriya Mehta — Platform Engineering
Affected systemAcme ERP · payments-db production cluster · us-east-1
EnvironmentProduction · us-east-1 · 3 nodes (erp-prod-1/2/3)
Date
Change typeNormal · CAB review
SummaryApply the Q1 2026 critical patch update (Patch 36245718 — "26B-CPU") to the ERP production stack. The patch addresses three security advisories (CVE-2026-0144, CVE-2026-0158, CVE-2026-0203) and one performance hot-fix to the AR posting engine. UAT in ERP-STAGE ran 2026-04-15 → 2026-04-28 with zero high-severity regressions.
ClassificationInternal · Change Controlled
Deploy ERP Patch 26B-CPU to Production · Change Control
Change request details
Submitted2026-04-30 14:22 PT by Priya Mehta
Implementation windowSaturday 2026-05-09 · 02:00–04:00 UTC (Fri 19:00–21:00 PT)
Expected downtime≤ 90 seconds during pgbouncer flip in step 05
Risk ratingMedium — production data path, but UAT clean and rollback < 15 min per node
Service impactCustomer-facing portals (vendor + carrier) and warehouse handheld order-entry will see a brief 503 during the pgbouncer flip. Reporting and analytics dashboards are unaffected because they read from the warehouse replica.
Affected stakeholdersVendor Operations, Carrier Operations, Warehouse Operations, Finance (AR posting)
Backout plan

If verification (steps 06–07) fails or any P1 alert fires within the window:

  1. Halt the patch rollout immediately. Notify #change-control and the CAB chair.
  2. Flip pgbouncer back to the previous primary (erp-prod-1, pre-patch). Customer impact: another ~30 sec 503.
  3. Restore the patched node from the pre-patch snapshot taken in step 01. Restore time ≈ 12 minutes per node.
  4. File a post-incident review within 24 hours; do not retry the patch until the PIR is closed.
Approvals
Role Approver Decision Timestamp
Change ManagerNaomi ParkApproved2026-05-01 11:08 PT
CAB ChairMarcus Tan — VP EngineeringApproved2026-05-02 09:14 PT
Business OwnerDan Okafor — VP FinanceApproved2026-05-02 09:46 PT
Security ReviewerHana Becker — InfoSecApproved2026-05-02 10:33 PT
Communications plan
  • T−5 days — email vendor-ops@ and carrier-ops@ with the window and expected impact.
  • T−24 hours — post in #all-eng and on status.acmecorporation.com (scheduled maintenance banner).
  • T0 — implementation Slack huddle in #change-control; live updates every 15 min.
  • T+0 (post-cutover) — clear status page banner; email "all clear" to vendor-ops@ and carrier-ops@.
Deploy ERP Patch 26B-CPU to Production · Change Control
Implementation steps

Eight steps grouped into four phases (Pre-flight · Cutover · Verify · Post-cutover). Total estimated runtime ≈ 50 minutes. Roles: SRE on rotation, DBA on rotation, Tech Lead (Priya Mehta), QA (Anna Voss).

Step 01 · Pre-flight · SRE
Verify replica lag is below 100 ms and snapshot disk usage of the primary.

Run the SRE pre-flight notebook (sre/notebooks/erp-patch-preflight.ipynb). The notebook reads pg_stat_replication on every replica and fails loudly if lag > 100 ms. Capture the output and attach it to this change ticket. Estimated runtime: 5 minutes.

https://grafana.acmecorporation.net/d/erp/replica-lag?orgId=1&from=now-1h
Grafana › ERP › Replica lag
payments-db · replica lag (ms)
erp-prod-2 (replica)
12 ms
erp-prod-3 (replica)
18 ms
warehouse-replica
31 ms
Pass: all replicas below the 100 ms threshold. Cleared to proceed.
Fig 1 — Pre-flight replica lag dashboard. All three replicas under threshold.
Step 02 · Pre-flight · DBA
Take a final logical snapshot via pg_basebackup and copy to cold-storage bucket.

Snapshot to s3://acme-erp-snapshots/preflight/2026-05-09/. Confirm the snapshot manifest hash matches the source manifest before proceeding. Estimated runtime: 10 minutes.

Note: Do not garbage-collect this snapshot until 7 days after the change closes. Backout plan depends on it.
Step 03 · Pre-flight · Tech Lead
Page #payments-oncall and post the cutover banner on the customer status page.

Open status.acmecorporation.com admin and publish the pre-scheduled "Scheduled maintenance — ERP patching" incident in "Investigating" state. Confirm vendor-ops and carrier-ops have acknowledged in Slack. Estimated runtime: 2 minutes.

Step 04 · Cutover · DBA
Promote pg-16 replica with pg_ctl promote and confirm it accepts writes.

SSH to erp-prod-2, then run pg_ctl promote -D /var/lib/pgsql/16/data. Confirm the promotion banner appears in the postgres log within 5 seconds. Test a write with a no-op INSERT into the heartbeat table. Estimated runtime: 8 minutes.

Deploy ERP Patch 26B-CPU to Production · Change Control
Step 05 · Cutover · SRE
Flip pgbouncer pool config to point at the new primary and reload.

Edit /etc/pgbouncer/databases.ini on each pgbouncer host (3×). Change the host= line for the payments-db pool to point at erp-prod-2. SIGHUP pgbouncer with pkill -HUP pgbouncer. This is the only customer-visible step — expect a brief 503 burst (~30 sec) as in-flight connections drain. Estimated runtime: 3 minutes.

sre@pgb-1:~$ sudo vi /etc/pgbouncer/databases.ini
# edit host= for [payments-db]; save :wq
sre@pgb-1:~$ sudo pkill -HUP pgbouncer && tail -n 2 /var/log/pgbouncer.log
2026-05-09 02:14:08 LOG file removed: signal received
2026-05-09 02:14:08 LOG reload signal received, reloading config
sre@pgb-1:~$ psql -h 127.0.0.1 -p 6432 -U svc_health -c "select 1"
?column?
----------
1
(1 row)
Fig 2 — pgbouncer reload completed cleanly on pgb-1. Repeat on pgb-2 and pgb-3 in sequence.
Step 06 · Verify · QA
Run smoke suite against /charges, /refunds, /webhooks. Confirm p95 latency below 250 ms.

Trigger the post-deploy job in CI (qa/smoke-erp-prod). Suite runs ~40 synthetic transactions against the three critical endpoints. All assertions must pass; latency report attaches to the change ticket. Estimated runtime: 15 minutes.

Step 07 · Verify · Tech Lead
Lift the customer status banner and send all-clear in #change-control.

Mark the status-page incident as "Resolved". Post a one-line all-clear in #change-control and email vendor-ops@ and carrier-ops@. Estimated runtime: 2 minutes.

Step 08 · Post-cutover · DBA
Decommission the old primary read-only and tag the volume for 30-day retention.

Set erp-prod-1 to read-only mode with ALTER SYSTEM SET default_transaction_read_only = on. Tag the EBS volume retention=30d and notify infra-ops@ so it isn't garbage-collected early. Estimated runtime: 5 minutes.

Post-implementation sign-off
Verifier Confirmation Timestamp
QA — Anna VossSmoke suite green; p95 = 184 ms across /charges, /refunds, /webhooks.2026-05-09 02:53 UTC
SRE on-call — Jordan LeeNo P1/P2 alerts during or 30 min after window; pgbouncer pools healthy on all hosts.2026-05-09 03:29 UTC
Tech Lead — Priya MehtaChange closed in ServiceNow; status page resolved; all stakeholders notified.2026-05-09 03:42 UTC