What this guide covers (and who it is for)
This is a practical CI/CD reference for tech leads, platform engineers, and senior developers who own release safety—not a single vendor tutorial. It explains Continuous Integration (merge often, automated build + test), Continuous Delivery (main is always releasable; humans approve prod), and Continuous Deployment (passing main auto-ships to production). You will get a clear stage model, branching tradeoffs, how to handle secrets and databases, deployment patterns, and how to measure whether your pipeline actually helps the business.
Search and AI context: related intents include CI/CD best practices, GitHub Actions workflow, GitLab CI/CD, Jenkins pipeline, deployment strategies, canary deployment, blue green deployment, rollback strategy, OIDC GitHub Actions, trunk-based development, feature flags CI/CD, database migrations in deployment, and software supply chain security.
Glossary (quick definitions)
- Pipeline: Automated sequence from code commit to artifact and optionally to deployed runtime.
- Job / stage: A unit of work (e.g. “test,” “build image”); stages often run in order with dependencies.
- Artifact: Immutable output of a build (container image digest, tarball, static bundle) promoted across environments.
- Quality gate: Required check (tests, review, scan) that must pass before merge or deploy.
- OIDC: OpenID Connect—lets CI obtain short-lived cloud tokens without storing long-lived access keys.
- Canary: Send a small fraction of traffic to a new version before full promotion.
- SBOM: Software Bill of Materials—machine-readable list of dependencies for audit and incident response.
CI, CD, and “continuous deployment” in one place
CI proves that integrated code works: compile, lint, unit tests, fast feedback on every push or PR. Continuous Delivery adds packaging, integration tests, and deployment automation up to a pre-prod or staging environment; production still needs an explicit approval. Continuous Deployment removes that human gate for production—only appropriate when tests, observability, and rollback are strong and the org accepts that pace. Many teams say “CI/CD” but run delivery with manual prod promote; that is still valuable if releases are small and frequent.
Pipeline stages (typical order)
Order matters for fail fast—cheap checks before expensive ones.
- Checkout an immutable commit SHA, not a moving branch tip, for reproducible builds.
- Install dependencies using a lockfile (
package-lock.json,poetry.lock, etc.); cache by lockfile hash so caches invalidate when deps change. - Lint, format check, typecheck—seconds to minutes; catches entire-class bugs before tests run.
- Unit tests in parallel shards; avoid shared mutable global state between tests.
- Build production artifacts: optimized bundles, OCI images, or compiled binaries.
- Integration / contract tests against disposable services (containers, ephemeral DB) or consumer-driven contracts.
- Security: dependency and license scan (SCA), container image scan, optional SAST; block or warn based on policy.
- Deploy to staging with the same mechanism as prod (treat staging as a rehearsal, not a different shell script).
- Smoke / synthetic tests hit critical paths; post-deploy verification confirms migrations and config, not only HTTP 200 on
/.
CI platforms: patterns, not tribalism
GitHub Actions fits GitHub-centric teams: workflows as YAML in .github/workflows, reusable actions, environments with protection rules, and OIDC to AWS/Azure/GCP. GitLab CI uses .gitlab-ci.yml, strong built-in registry and review apps. Jenkins remains common for complex on-prem glue with plugins—budget maintenance. Circle CI, Buildkite, TeamCity and others differ in pricing and parallelism; the principles here transfer. Prefer pipeline-as-code in Git so changes are reviewed like application code.
Branching: trunk-based vs long-lived branches
Trunk-based development uses short-lived feature branches merged to main frequently; CI keeps main green. That minimizes merge debt and makes “what is in production?” answerable by SHA. GitFlow-style long-lived develop / release/* branches help rare scheduled releases but add merge and cherry-pick cost; regulated industries sometimes need release trains anyway—then automate the train, don’t hand-crank it. Release branches for hotfixes on old minors are fine; just document the mapping from branch to artifact tag.
Branch protection and required checks
Require PR reviews, required status checks from CI, and (where useful) CODEOWNERS for sensitive paths. Prevent force-push to main. These are organizational guardrails—without them, “we have a pipeline” does not mean “we have safe shipping.”
Artifacts and immutability
Tag container images with git SHA or semver+SHA; avoid relying on latest in production references. Promote the same digest from staging to production so you never debug “prod was rebuilt with different env.” Store build metadata (commit, pipeline run id) as OCI labels for support. Generate and retain an SBOM where policy requires it; it accelerates vulnerability response.
Secrets and environments
Prefer OIDC federation from CI to cloud IAM so runners assume roles without static cloud keys in secrets. Scope secrets per environment (dev/staging/prod); use separate cloud accounts or projects where feasible. Rotate credentials on a schedule; never print secrets to logs (mask in CI UIs). For third-party tokens, use minimum scope and short TTL where APIs allow it.
Testing strategy: pyramid, contracts, flakes
The test pyramid still holds: many fast unit tests, fewer integration tests, very few full E2E journeys covering critical revenue paths. Flaky tests are worse than no tests—they train teams to ignore red builds. Quarantine flakes, fix or delete, and track flake rate. Contract tests (Pact-style or schema tests) catch API breakage between services without spinning entire stacks for every PR. For UI, prefer stable selectors and parallelizable suites with explicit test data setup.
Database migrations in the pipeline
Schema changes are the #1 source of deploy ordering bugs. Prefer expand–contract patterns: add new columns/tables compatible with old code, deploy app, then remove old paths. Run migrations as an explicit pipeline step with timeouts and locks understood; avoid long exclusive locks on hot tables in peak traffic. Keep rollback plans—sometimes forward-only migration + feature flag is safer than automatic down migrations in prod.
Infrastructure as code in CI
Run terraform plan (or equivalent) on PRs for visibility; apply from protected branches or with manual approval. Pin provider and module versions. Separate plan artifacts from apply so auditors can see what changed. For Kubernetes, lint manifests and run policy checks (OPA, Kyverno) before deploy steps.
Monorepos and “affected” builds
In monorepos, build and test only affected projects (Nx, Turborepo, Bazel, or custom diff logic) so CI time scales with change size, not repo size. Cache outputs keyed by inputs; share remote cache across developers and CI when safe.
Deployment strategies
- Rolling: Replace instances incrementally—simple on stateless apps; watch errors during the wave; bad health checks should pause or roll back.
- Blue/green: Stand up full “green” stack, switch load balancer or DNS, keep blue for fast rollback. Needs capacity or efficient pooling; watch connection draining.
- Canary: Route 1–5% (then step up) to new version; automate promotion on golden signals (error rate, latency) and rollback on regression. Requires mature metrics and traffic shaping.
- Recreate: Acceptable for dev/staging or stateless batch; usually too blunt for customer-facing zero-downtime SLOs.
Rollback, feature flags, and kill switches
Maintain one-click rollback to the previous known-good artifact (image tag or release version). Feature flags decouple deploy from exposure—ship dark, enable gradually, disable without redeploy if something misbehaves. Document which flags are temporary vs long-lived to avoid flag debt.
Security and supply chain in the pipeline
Beyond SCA: pin GitHub Actions to full commit SHAs for third-party actions, use org-level allow lists, and enable dependency review on PRs. Sign artifacts where policy requires (Sigstore/cosign patterns). Scan IaC for misconfigurations. Treat build nodes as sensitive—harden runners, use ephemeral environments, and restrict who can trigger production deploy workflows.
Observability after deploy
Tag traces and logs with release version or git SHA. Watch error rate, saturation (CPU, queue depth), and latency percentiles during canary windows. Align with on-call: if deploy triggers pages, runbooks should include “rollback first, debug second” for user-visible incidents.
Measuring pipeline and delivery health
DORA metrics (deployment frequency, lead time for changes, change failure rate, time to restore) help justify investment. Internally, track CI duration, queue time, main branch red time, and mean time to recovery after failed deploys. If developers bypass the pipeline “to go faster,” fix the pipeline—not the policy.
Git-triggered deploys and managed platforms
Many teams connect the same principles here to a managed app platform: push or merge to a protected branch triggers build, test, and deploy with environment-specific config. Whether you self-host runners or use a platform, the non-negotiables stay: immutable artifacts, reviewed pipeline changes, secrets scoped by environment, and observable rollbacks.
FAQ
CI is slow—what should we fix first?
Profile the critical path: parallelize independent jobs, use remote build cache, shrink Docker build context, split giant test suites, and delete redundant E2E. Slow CI encourages skipping checks—treat feedback time as a product requirement.
How often should we deploy?
Smaller changes more often usually reduce risk per change and improve learning. The limit is organizational: review capacity, on-call readiness, and customer change windows—not a magic weekly vs daily rule.
Should production deploys run automatically from main?
Only if tests, canaries, feature flags, and rollback are trustworthy and stakeholders accept continuous deployment. Otherwise use delivery with explicit promote—still automate everything up to that button.
How do we test infrastructure changes safely?
Use isolated accounts or namespaces, apply from CI with plan/apply separation, and keep a known rollback (previous Terraform state or previous manifest revision) documented.
What about compliance and audit trails?
Retain pipeline logs and artifact provenance (who built what, from which commit). Map required checks to control objectives; required reviews plus immutable artifacts support many SOC2-style narratives—work with your security team on specifics.
Are self-hosted runners worth it?
When you need custom hardware, strict data residency, or cheaper heavy builds at scale—yes, with the cost of patching and securing those runners yourself.
Key takeaways
- Automate build → test → artifact → promote; treat the artifact as immutable once built.
- Use short feedback loops on PRs and strong main protection so green means trustworthy.
- Prefer OIDC and scoped secrets over long-lived cloud keys in CI variables.
- Pair deploys with canary or blue/green, feature flags, and fast rollback—plus migrations that tolerate old code during rollout.
- Measure pipeline time, flake rate, and DORA-style outcomes so improvements are data-driven, not ceremonial.



