Ship the Sync

Stop stale catalog data before customers ever feel it.

One worker. One source of truth. One launch gate.

Architecture Shape

flowchart LR
    Supplier[Supplier event] --> Queue[Updates queue]
    Queue --> Sync[Catalog sync worker]
    Sync --> Store[(Catalog DB)]
    Sync --> API[Storefront API]
    Sync --> Metrics[Metrics]

The worker is small on purpose: consume, validate, write, publish.

Event Handoff

sequenceDiagram
    participant Supplier
    participant Queue
    participant Worker as Sync worker
    participant Catalog
    participant Storefront
    Supplier->>Queue: publish inventory.updated
    Queue->>Worker: deliver event
    Worker->>Catalog: upsert normalized item
    Catalog-->>Worker: version token
    Worker->>Storefront: publish cache bust
    Storefront-->>Worker: accepted

The sequence shows where retries belong and where customer-visible latency starts.

Domain Model

classDiagram
    class SupplierEvent {
        string SupplierID
        string SKU
        int Quantity
        time UpdatedAt
    }
    class CatalogItem {
        string ID
        string SKU
        int Quantity
        int Version
    }
    class PublishResult {
        string ItemID
        bool Accepted
    }
    SupplierEvent --> CatalogItem : normalizes into
    CatalogItem --> PublishResult : publishes

The code should keep these responsibilities separate, even when the worker stays small.

Storage Contract

erDiagram
    SUPPLIER ||--o{ SUPPLIER_EVENT : emits
    SUPPLIER_EVENT ||--|| CATALOG_ITEM : normalizes
    CATALOG_ITEM ||--o{ PUBLISH_ATTEMPT : triggers
    TENANT ||--o{ CATALOG_ITEM : owns

The data model gives support a durable answer to "what changed and why?"

Release Branch Story

gitGraph
  commit id: "baseline"
  branch sync-worker
  checkout sync-worker
  commit id: "normalize"
  commit id: "publish"
  checkout main
  merge sync-worker id: "canary"
  branch rollback
  checkout rollback
  commit id: "disable-flag"

The rollback branch exists before launch, not after the first bad metric.

The Problem

Every stale catalog record creates a support ticket, a failed order, or a customer who no longer trusts the storefront.

  • supplier feeds arrive out of order
  • retry behavior is split across services
  • storefront owns the customer pain, but not the update path
  • manual fixes cannot explain which event actually won

The Code That Matters

func publishCatalog(ctx context.Context, event SupplierEvent) error {
    item, err := normalize(event)
    if err != nil {
        return fmt.Errorf("normalize supplier event: %w", err)
    }

    if err := catalog.Save(ctx, item); err != nil {
        return fmt.Errorf("save catalog item: %w", err)
    }

    return storefront.Publish(ctx, item.ID)
}

Every failure points to the exact handoff that needs attention.

Replay Command

When the canary tenant misses a signal, replay a bounded window before widening:

go run ./cmd/replay --tenant canary --since 2026-06-01T00:00:00Z

Keep the replay command boring. It should do one thing and print the item IDs it touched.

Failure Modes

  • Input risk
    • malformed SKU
    • supplier clock drift
    • duplicate event
  • Write risk
    1. stale catalog version
    2. transaction timeout
    3. publish attempt rejected
  • Customer risk
    • product page shows old quantity
    • support cannot trace the winning event

This list drives alert labels, dashboard grouping, and support macros.

Launch Signals

Signal Target Owner
stale items < 1% Platform
queue lag < 90s Infra
publish errors < 0.5% Storefront
trace coverage 100% Support Eng

If any signal misses target, the canary stays closed.

Launch Checklist

  • replay command tested against staging
  • dashboard shows queue lag and publish errors
  • canary tenant approved
  • rollback note linked in release ticket
  • support macro includes trace lookup steps

Decision Gate

  1. Run the replay against staging.
  2. Open the canary for one tenant.
  3. Watch queue lag, stale item rate, and publish errors for one business day.
  4. Widen by supplier only after support can trace three sampled updates.

The gate is operational: if support cannot explain the sync, engineering has not finished the launch.

Decision

Ship behind the canary tenant flag first, then widen by supplier once the launch signals stay green for one business day.

decks/developer-project-walkthrough.md
1 / 14