Ship the Sync
Stop stale catalog data before customers ever feel it.
One worker. One source of truth. One launch gate.
Architecture Shape
flowchart LR
Supplier[Supplier event] --> Queue[Updates queue]
Queue --> Sync[Catalog sync worker]
Sync --> Store[(Catalog DB)]
Sync --> API[Storefront API]
Sync --> Metrics[Metrics]
The worker is small on purpose: consume, validate, write, publish.
Event Handoff
sequenceDiagram
participant Supplier
participant Queue
participant Worker as Sync worker
participant Catalog
participant Storefront
Supplier->>Queue: publish inventory.updated
Queue->>Worker: deliver event
Worker->>Catalog: upsert normalized item
Catalog-->>Worker: version token
Worker->>Storefront: publish cache bust
Storefront-->>Worker: accepted
The sequence shows where retries belong and where customer-visible latency starts.
Domain Model
classDiagram
class SupplierEvent {
string SupplierID
string SKU
int Quantity
time UpdatedAt
}
class CatalogItem {
string ID
string SKU
int Quantity
int Version
}
class PublishResult {
string ItemID
bool Accepted
}
SupplierEvent --> CatalogItem : normalizes into
CatalogItem --> PublishResult : publishes
The code should keep these responsibilities separate, even when the worker stays small.
Storage Contract
erDiagram
SUPPLIER ||--o{ SUPPLIER_EVENT : emits
SUPPLIER_EVENT ||--|| CATALOG_ITEM : normalizes
CATALOG_ITEM ||--o{ PUBLISH_ATTEMPT : triggers
TENANT ||--o{ CATALOG_ITEM : owns
The data model gives support a durable answer to "what changed and why?"
Release Branch Story
gitGraph
commit id: "baseline"
branch sync-worker
checkout sync-worker
commit id: "normalize"
commit id: "publish"
checkout main
merge sync-worker id: "canary"
branch rollback
checkout rollback
commit id: "disable-flag"
The rollback branch exists before launch, not after the first bad metric.
The Problem
Every stale catalog record creates a support ticket, a failed order, or a customer who no longer trusts the storefront.
- supplier feeds arrive out of order
- retry behavior is split across services
- storefront owns the customer pain, but not the update path
- manual fixes cannot explain which event actually won
The Code That Matters
func publishCatalog(ctx context.Context, event SupplierEvent) error {
item, err := normalize(event)
if err != nil {
return fmt.Errorf("normalize supplier event: %w", err)
}
if err := catalog.Save(ctx, item); err != nil {
return fmt.Errorf("save catalog item: %w", err)
}
return storefront.Publish(ctx, item.ID)
}
Every failure points to the exact handoff that needs attention.
Replay Command
When the canary tenant misses a signal, replay a bounded window before widening:
go run ./cmd/replay --tenant canary --since 2026-06-01T00:00:00Z
Keep the replay command boring. It should do one thing and print the item IDs it touched.
Failure Modes
- Input risk
- malformed SKU
- supplier clock drift
- duplicate event
- Write risk
- stale catalog version
- transaction timeout
- publish attempt rejected
- Customer risk
- product page shows old quantity
- support cannot trace the winning event
This list drives alert labels, dashboard grouping, and support macros.
Launch Signals
| Signal | Target | Owner |
|---|---|---|
| stale items | < 1% |
Platform |
| queue lag | < 90s |
Infra |
| publish errors | < 0.5% |
Storefront |
| trace coverage | 100% |
Support Eng |
If any signal misses target, the canary stays closed.
Launch Checklist
- replay command tested against staging
- dashboard shows queue lag and publish errors
- canary tenant approved
- rollback note linked in release ticket
- support macro includes trace lookup steps
Decision Gate
- Run the replay against staging.
- Open the canary for one tenant.
- Watch queue lag, stale item rate, and publish errors for one business day.
- Widen by supplier only after support can trace three sampled updates.
The gate is operational: if support cannot explain the sync, engineering has not finished the launch.
Decision
Ship behind the canary tenant flag first, then widen by supplier once the launch signals stay green for one business day.