16 min read

Vendure eCommerce up and running with Dev & Prod

Table of Contents

My containerized Vendure instance now works with dev & prod environments

In January I began exploring the use of the Vendure TypeScript framework for setting up ecommerce shops for clients. There were quite a few hiccups along the way.

  • The project wasn’t organized to be containerized for local development.
  • Dockerfiles for the server, worker, and storefront were nonexistent.
  • It lacked any CI/CD for deploying containers to a remote production server.

From working at CDK Global, I knew I wanted to start the project off with a good foundation — a turnkey, containerized philosophy where spinning up the full app is a single command, not a 10-step confluence doc that breaks for every new developer.

And now it’s done.


Why Containerize From the Start?

I saw first-hand at CDK Global what happens when a 15-year-old codebase isn’t containerized. Onboarding a new developer meant navigating an ever-drifting installation doc, platform-specific dependency issues, and a half-day (or more) of troubleshooting before they could even run the app. The engineering lead eventually spent ~3 months dockerizing that $25M product.

Starting containerized from day one avoids all of that. The advantages compound over time:

  • Production/dev parity. Both environments run the same Linux OS, the same Node version, the same Postgres version — same everything. Bugs that only appear in prod become much rarer.
  • One-command startup. docker compose -f docker-compose.local.yml up — that’s it. No global npm installs, no version juggling, no machine-specific gotchas.
  • Clean isolation. No conflicts with other projects’ Node versions or global packages.
  • Portability. The entire app is a packaged, self-contained artifact. Easy to hand off, easy to deploy, easy to sell.
  • Reproducibility. The Dockerfiles and docker-compose files are the documentation. Committed to version control, they define exactly what the app needs to run.

Idiosyncrasies of the Vendure Scaffold

The Vendure scaffold (npx @vendure/create my-shop) arrives set up to use npm workspaces, with the root package.json managing the server’s dependencies via concurrently. The storefront, by contrast, is self-contained. This structure made sense for the “run everything with one script” approach Vendure ships with — but it’s slightly awkward when you want each app component in its own container with its own isolated dependency graph.

The symptom: the server directory doesn’t generate its own package-lock.json after scaffolding, because npm workspaces consolidate everything at the root. The storefront does have one (likely a scaffolding artifact created before the workspace config takes full effect), but the server doesn’t. For a Docker-first project where each container needs a deterministic lockfile to run npm ci, this was a problem.

I reorganized the project to not use npm workspaces at all. I moved the original root package.json to archived-original-backup/ and set up each subdirectory — server and storefront — to function as an independent npm project with its own lockfile. It’s still a monorepo, just not a workspace-managed one. Each container builds cleanly from its own package-lock.json, which is exactly what npm ci expects.


The Mac vs. Linux Binary Problem

Here’s a subtle one that tripped me up: npm isn’t a perfect cross-platform package manager. Most of the time, it resolves the right binaries for your OS — but not always.

In this case, the culprit was lightningcss, a native module with C++ bindings that compiles differently on macOS vs Linux. If you run npm install on an ARM Mac, npm installs lightningcss-darwin-arm64. When a Linux container tries to use that lockfile, it fails looking for lightningcss-linux-arm64-gnu:

Error: Cannot find module '../lightningcss.linux-arm64-gnu.node'

The fix was a shell script — generate-lockfiles.sh — that spins up a temporary Node.js Linux container, runs npm install from inside it (with the project directory bind-mounted as a volume), and then exits. The result: a package-lock.json generated by Linux, for Linux. It gets written back to the host filesystem so it can be committed to version control and used consistently by every developer, every CI/CD run, and every production build.

Without this approach, a Dockerfile could generate a lockfile during the build phase — but it would be ephemeral, living only inside the image layer, never syncing back to your machine, never committed to git, and regenerated on every build. The script solves the problem at development time, once, cleanly.

đź’ˇ

Keep the package-lock.json files, even after deleting node_modules. They are the source of truth for dependency versions. Deleting them and reinstalling risks version drift.


Bind Mounts vs. Named Volumes (And Why It Matters)

This distinction caused real bugs and is worth understanding clearly.

A bind mount maps a directory from your host machine directly into a container:

- ./my-shop-juniper/apps/server:/usr/src/app

Everything in that host folder is shared with the container — including node_modules, dist, .next, and any other generated artifacts. This is great for source code (hot reload works because edits on your Mac are immediately visible inside the container), but dangerous for dependencies and build artifacts. If macOS-compiled binaries end up in node_modules and that folder is bind-mounted into a Linux container, you get the exact lightningcss error described above.

A named volume is managed entirely by Docker:

- server-node-modules:/usr/src/app/node_modules

Docker creates a storage location inside its own managed directory, invisible to your host OS. The container installs its own dependencies there, with no interference from anything on your Mac. It’s isolated, Linux-native, and clean.

The architectural principle: bind mounts are for editing; named volumes are for runtime state.

Data typeWho owns itVolume type
Source codeDeveloperBind mount
node_modulesContainerNamed volume
dist / .nextContainerNamed volume
Database dataContainer runtimeNamed volume

In the docker-compose.local.yml for this project, source code is bind-mounted for hot reload, but node_modules, dist, .tanstack, and .next are all named volumes — installed and owned by the container, never contaminated by the host OS.

Anonymous volumes (- /usr/src/app/node_modules) can work similarly in theory, but in practice they’re less reliable. Named volumes are explicit, debuggable, and predictable.



Migrations: TypeScript at Authorship, JavaScript at Runtime

Vendure migrations are authored in TypeScript, but they’re compiled to JavaScript during the Docker build step. In production, the container runs Node.js directly — no ts-node. This means TypeORM must only ever load .js migration files at runtime.

This is an easy footgun. TypeORM documentation and many examples show:

migrations: [path.join(__dirname, './migrations/*.+(js|ts)')],

But in a compiled production container, this causes TypeORM to attempt loading .ts files — which Node.js can’t execute. The result is a cryptic:

SyntaxError: Unexpected strict mode reserved word

The correct production configuration:

migrations: [path.join(__dirname, './migrations/*.js')],

This ensures only compiled JavaScript is loaded. TypeScript migrations are build-time artifacts; JavaScript migrations are runtime artifacts.

The trickier part: the error only surfaces at runtime. The CI build succeeds. The image builds fine. The container starts — and then crashes. This makes it easy to miss during development, and painful to debug in production.

One more migration quirk: running npx vendure migrate locally requires that the CLI can reach the database. The DB_HOST env var needs to be localhost (since the CLI runs on your host machine and connects through the Docker-mapped port), not the Docker network alias like vendure-database (which only works for container-to-container communication inside the Docker network).


Admin Dashboard: Build-Time vs. Runtime Environment

The Vendure Admin Dashboard is a compiled Vite frontend. Its API configuration is baked into the JavaScript bundle at build time — not read dynamically at runtime.

In development, you specify host + port explicitly:

api: {
  host: 'http://localhost',
  port: 3000,
}

In production, the Admin UI runs behind a reverse proxy (Nginx Proxy Manager in this case). Specifying a port in the compiled bundle means the browser calls host:port directly — bypassing the proxy, triggering CORS preflight requests, causing multi-second timeouts, and resulting in a blank or delayed dashboard load.

The fix is to omit the port in production and let same-origin routing handle it:

const IS_DEV = process.env.APP_ENV === "dev";

api: IS_DEV
  ? { host: "http://localhost", port: 3000 }
  : { host: process.env.VENDURE_API_HOST };
// No port in prod — same-origin routing via reverse proxy

But here’s the subtle part: APP_ENV must be present at build time, not just at runtime. During a Docker multi-stage build, the builder stage defines reality for compiled assets. The runtime stage cannot change what’s already baked into the bundle.

This means the CI/CD pipeline must pass APP_ENV as a Docker build argument:

docker build \
  --build-arg APP_ENV=prod \
  --build-arg VENDURE_API_HOST=https://admin.myshop.com \
  ...

And the Dockerfile builder stage must expose it:

ARG APP_ENV=prod
ENV APP_ENV=$APP_ENV

ARG VENDURE_API_HOST
ENV VENDURE_API_HOST=$VENDURE_API_HOST

RUN npm run build:dashboard

In this project, VENDURE_API_HOST is stored as a GitHub repository secret (not in any committed env file) because the repo is intended to be portfolio-public. Most other env vars are passed via env files injected at container run time. But anything that affects the compiled frontend bundle must arrive before npm run build.


CI/CD Structure

The production deployment runs via GitHub Actions. At a high level:

  1. Docker images are built with production build args (APP_ENV=prod, VENDURE_API_HOST) and pushed to a container registry.
  2. The runner SSH’s into the Debian production server.
  3. Containers are pulled from the registry and run with production secrets injected as environment variables (stored in GitHub repository/environment secrets).
  4. Nginx Proxy Manager — running in its own container network — is connected to the Vendure container network via docker network connect vendure-network nginx-proxy-mgr-011526, and routes traffic to the appropriate services.

The distinction between build-time env vars (baked into images) and run-time env vars (injected via secrets during docker run) is the core of how this pipeline is organized. Default/non-secret env vars live in .github/defaults/env-defaults.yml and get baked in during the build. Secret production values live in GitHub secrets and overwrite those defaults at container startup.


The Dockerfiles: A Walkthrough

Each of the three app components — server, worker, and storefront — has its own Dockerfile using a multi-stage build pattern. All of them share the same foundational philosophy: a deps stage installs dependencies, a dev stage supports local hot-reload development, a builder stage compiles for production, and a prod stage is the lean runtime image that actually gets deployed.

A Shared Lockfile Guard

Every Dockerfile opens with a strict check: if package-lock.json isn’t present, the build fails immediately with a clear error message explaining that this is a container-first project and that lockfiles must be generated from within a Linux container. This is the enforcement mechanism for the generate-lockfiles.sh workflow described above — it makes it impossible to accidentally build with macOS-generated or missing lockfiles.

RUN if [ ! -f package-lock.json ]; then \
      echo "❌ ERROR: package-lock.json not found!"; \
      exit 1; \
    fi
RUN npm ci

Dockerfile.server

The server Dockerfile has four stages:

deps — Copies package.json and package-lock.json, runs the lockfile guard, then npm ci. This is a separate stage so its layer can be cached and reused by both dev and builder.

dev — Installs procps (required because Vendure’s dev process spawns ps), copies source, exposes port 3000, and starts with npm run dev:container. This is the target used locally via docker-compose.local.yml.

builder — This is where the build-time environment variable concern from earlier becomes concrete. VENDURE_API_HOST and APP_ENV are both accepted as build arguments and exported as environment variables before npm run build and npm run build:dashboard run. Without both present at this stage, the compiled Admin UI will bake in development defaults. After building, it copies compiled migration files from src/migrations/ into dist/migrations/ so they’re available at runtime as .js files.

prod — Copies only what’s needed from builder: the dist/ directory, node_modules, package.json, static/, and a few other files. It creates a non-root vendure user and group for security, bakes in default env vars from env-defaults.env (which will be overwritten at container startup by production secrets), and starts with npm run start:server.

Dockerfile.worker

The worker Dockerfile is nearly identical to the server’s, which makes sense — the worker runs on the same codebase and shares the same apps/server directory. The key difference is that it has no port exposure (the worker doesn’t serve HTTP traffic; it processes background jobs pulled directly from the database’s job queue) and its production CMD is npm run start:worker instead of start:server. It also accepts APP_ENV as a build arg for consistency, even though it doesn’t affect the worker’s compiled output in the same way it does the dashboard.

Dockerfile.storefront

The storefront (Next.js) Dockerfile has a few notable differences from the server/worker setup.

The builder stage handles a tricky situation: the Next.js build requires the Vendure API to be reachable, but during docker build the API container isn’t running. The solution is a SKIP_NEXTJS_BUILD flag in the default env file. If that flag is set to true, the builder stage skips npm run build and just creates a placeholder .next/skip-build marker file. The actual Next.js build then happens at container startup via an entrypoint.sh script, at which point the network is available and the Vendure server is running.

The prod stage copies everything the runtime build needs — source files, config files, the partially-built .next directory, and the entrypoint script — and uses ENTRYPOINT ["./entrypoint.sh"] rather than a direct CMD. This lets the entrypoint check for the skip-build marker and conditionally run npm run build before starting the server. The nextjs user is given write permissions to /app to support this runtime build step.


The CI/CD Pipeline

The GitHub Actions pipeline is organized as a main orchestrator workflow (z-main.yml) that calls four reusable sub-workflows in sequence: database init, server deploy, worker deploy, and storefront deploy. Each sub-workflow is independently responsible for its component, and the main workflow threads secrets into each one.

Deployment Order and Dependencies

The pipeline enforces a strict deployment order via needs:

database → server → worker → storefront

This ensures the database is always up before the application servers start, and the server is running before the worker tries to connect. A final summary job collects outputs from all four and renders a deployment summary in the GitHub Actions UI showing what changed and what action was taken for each component.

Smart Change Detection

Each app sub-workflow (server, worker, storefront) uses a deployment marker pattern to avoid rebuilding and redeploying containers when nothing has changed. On every successful deploy, the current git commit hash is saved to a file and uploaded as a GitHub Actions artifact. On the next run, that artifact is downloaded, and git diff is used to check whether any files in the relevant app directory have changed since the last deploy.

The decision logic is a simple matrix:

Container exists?Marker artifact exists?Changes detected?Decision
NoAnyAnyBuild & Deploy (initial)
YesYesYesBuild & Deploy
YesYesNoSkip
YesNoN/ASkip (incomplete state)

This means pushes that only touch the storefront don’t trigger a server rebuild, and vice versa. Saves build time and avoids unnecessary container restarts.

Database: Deploy Once, Leave It Alone

The database sub-workflow (db-init.yml) takes a deliberately conservative approach: it checks whether the database container already exists on the production server, and if it does, it does nothing. The database is only created on first deployment. This is intentional — you don’t want a routine CI/CD push to recreate your database container and potentially interact with persistent volume state. If you need to make database changes, that’s handled via Vendure’s migration system, not by the CI/CD pipeline.

The Two-Layer Environment Variable Strategy

The env var setup deserves its own explanation because it took some thought to get right.

Images are built with default env vars baked in. These come from .github/defaults/env-defaults.yml — a file committed to the repo that contains non-sensitive placeholder values (default DB credentials, localhost URLs, etc.). During the build step, the CI/CD pipeline uses yq to extract the relevant section of that YAML file and writes it to a temporary env-defaults.env file, which the Dockerfile then COPYs into the image. After the build, the temporary file is deleted from the runner.

When containers are started on the production server, production secrets overwrite those defaults. The pipeline SSH’s into the server, writes the production secret env file (pulled from GitHub repository secrets) to a temporary location, passes it to docker run --env-file, and then immediately deletes it. The container starts with the real credentials; the defaults that were baked into the image are never used in production.

This two-layer approach means:

  • The repo can be public without exposing production credentials
  • Images can be inspected without leaking secrets (they only contain placeholder values)
  • Production secrets only ever exist transiently on the server during the docker run step
  • The VENDURE_API_HOST used to compile the Admin Dashboard is kept as a separate repository secret, since it needs to be available at build time as a --build-arg rather than at run time

One env var worth noting that wasn’t in the original Vendure template: SKIP_NEXTJS_BUILD=true in the storefront defaults. The original template didn’t need to account for a CI/CD environment where the API isn’t reachable during image build. Adding that flag, along with the entrypoint script logic, was a necessary addition to make the containerized storefront work correctly in both build and runtime contexts.


Where It Stands Now

The project has hit two version benchmarks:

  • v1.0 — Fully functional local development environment with Docker Compose.
  • v2.0 — Prototype production deployment via CI/CD, running on a Terraform-provisioned Debian server with basic security hardening (UFW, Fail2ban, SSH key-only auth, kernel hardening via sysctl).

Next up (v3.0) is production hardening: Vendure’s HardenPlugin, rate limiting, Cloudflare integration, HTTPS enforcement, automated database backups, structured error logging, and switching payment providers (Stripe, PayPal) to live mode.

Then: storefront customization, product catalog, two-language schema (English/Spanish), and eventually blue/green deployments.

But that’s for future posts.