Standalone Envoy Gateway

In a multi-AZ or multi-region deployment, the Envoy Gateway control plane (the EnvoyProxy data plane pods, the GatewayClass, the cloud load balancer) must outlive any individual Kloudfuse cluster. During a failover you bring up a fresh Kloudfuse install in the target AZ and then repoint the existing Envoy Gateway at the new cluster — the load balancer IP, DNS, and TLS termination stay put.

To support that, install envoy-gateway as a separate Helm release in its own namespace, outside the Kloudfuse chart. Each Kloudfuse install then references the shared gateway by GatewayClass name and only installs its own Gateway and HTTPRoute resources — no controller, no proxy pods, no load balancer.

When to Use Standalone Envoy

Use this pattern when any of the following apply:

  • You need to fail over between Kloudfuse clusters in different AZs or regions without DNS changes or TLS re-issuance.

  • You want to spin up a new Kloudfuse install in parallel with an existing one and cut traffic over once verified.

  • The Envoy load balancer must survive helm delete of the Kloudfuse release (for example, blue/green upgrades).

Failover repoints one standalone release between clusters, keeping a single IP / set of EIPs — it does not run two load balancers on the same address (an IP/EIP attaches to one LB at a time). For true side-by-side, give each LB distinct IPs and switch DNS once verified.

For a single-cluster install where Envoy and Kloudfuse are managed together, use the in-chart configuration described in Configure Envoy Ingress instead.

Prerequisites

  • Kubernetes cluster 1.27 or later with kubectl configured.

  • Helm 3.x.

  • Gateway API CRDs installed cluster-wide (see Install Envoy CRDs).

  • cert-manager installed cluster-wide (or AWS ACM configured) for TLS.

  • A reserved static external IP (and an internal IP if the deployment needs an internal LB).

  • OCI registry access for oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/envoy-gateway. The login commands below use the gcloud CLI (authenticated to a principal with read access to the registry); if gcloud is not available, substitute any other method of obtaining a registry token — e.g. helm registry login -u _json_key --password-stdin …​ < key.json with a service account key — before running helm install.

  • Image pull secret for us.gcr.io/mvp-demo-301906/kfuse/envoy-gateway-fips in the install namespace — referenced by global.imagePullSecrets in your values.yaml (the controller and proxy won’t start without it). Copy the existing kfuse-image-pull-credentials secret from the Kloudfuse namespace, or create a dockerconfigjson secret from your registry key:

    kubectl get secret kfuse-image-pull-credentials -n <kfuse-namespace> -o yaml \
      | grep -v 'namespace:\|resourceVersion:\|uid:\|creationTimestamp:' \
      | kubectl apply -n envoy-gateway-system -f -
    bash

Disable the Bundled Envoy in the Kloudfuse Chart

Before installing the standalone controller, configure each Kloudfuse install so it emits routing manifests only — no controller, no proxy pods, no load balancer.

Add the following to each Kloudfuse custom_values.yaml:

envoy-gateway:
  enabled: false                          (1)
  installGatewayRoutes: true              (2)
  gatewayClassName: "envoy-gateway-az1"   (3)
  envoyService:
    internal:
      enabled: true                       (4)
      gatewayClassName: "envoy-gateway-az1-internal"

ingress-nginx:
  enabled: false
  installIngressRules: false
yaml
1 enabled: false skips the controller Deployment, the EnvoyProxy data plane, and the load balancer Service. The standalone release provides all three.
2 installGatewayRoutes: true keeps the Gateway, HTTPRoute, SecurityPolicy, and BackendTrafficPolicy resources — these tell the standalone controller where to send traffic for this Kloudfuse install.
3 Must match the external GatewayClass name produced by the standalone release. See Configure the Standalone Controller for naming.
4 Only if you run an internal load balancer. The internal Gateway is a second GatewayClass and must be wired separately: set envoyService.internal.gatewayClassName here to match the internal class produced by the standalone release (envoyService.internal.gatewayClassName there). Omit this whole envoyService block if you have no internal LB.

gatewayClassName is just a name that has to match on both sides (Kloudfuse install ↔ standalone release). If you omit it on the standalone release, the chart derives it from the install namespace: <namespace>-envoy-gateway for the external class and <namespace>-envoy-gateway-internal for the internal one. Whatever the standalone release ends up with, the Kloudfuse custom_values.yaml above must reference the exact same strings.

Both envoy-gateway.enabled and ingress-nginx.enabled must be false in this pattern. The Kloudfuse chart enforces that the two are mutually exclusive but does not require either to be on.

Configure the Standalone Controller

Create a values.yaml for the standalone envoy-gateway chart. The release must:

  • Pin the controller and EnvoyProxy pods to the target AZ’s node group via nodeSelector and tolerations.

  • Set a unique gatewayClassName per AZ so each Kloudfuse install can target the right gateway.

  • Configure the cloud load balancer (external + optional internal) to match the IP / EIP / DNS already in use.

global:
  # Image pull credentials for us.gcr.io/mvp-demo-301906/kfuse/envoy-gateway-fips.
  # Required — without this, both the controller and the EnvoyProxy data plane
  # pods will ImagePullBackOff. The named secret must already exist in the
  # install namespace (see Prerequisites for how to copy it).
  imagePullSecrets:
    - kfuse-image-pull-credentials       (1)

  # Optional: set true only if your cluster uses kfRoles node pools
  # (kf_role label/taint); otherwise pin via deployment.pod.* below.
  # kfRoles:
  #   enabled: true

# GatewayClass name referenced by Kloudfuse installs' envoy-gateway.gatewayClassName
gatewayClassName: "envoy-gateway-az1"      (2)

# External LB Service shape — must match the NLB / EIPs the Nginx or
# previous Envoy install was using, so DNS records do not change.
envoyService:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: 'true'
    service.beta.kubernetes.io/aws-load-balancer-eip-allocations: <YOUR_EIP_ALLOC_IDS>  (3)
  patch:
    externalTrafficPolicy: Local
  external:
    enabled: true
  internal:
    enabled: true
    gatewayClassName: "envoy-gateway-az1-internal"   (4)
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-internal: "true"
      service.beta.kubernetes.io/aws-load-balancer-type: nlb

# Pin to the AZ's node group
deployment:
  pod:
    nodeSelector:
      ng_label: <az-name>                  (5)
    tolerations:
    - key: "ng_taint"
      operator: "Equal"
      value: "<az-name>"
      effect: "NoSchedule"

# certgen Job runs the same pod-placement constraints
certgen:
  job:
    nodeSelector:
      ng_label: <az-name>
    tolerations:
    - key: "ng_taint"
      operator: "Equal"
      value: "<az-name>"
      effect: "NoSchedule"
yaml
1 Image pull secret name — the secret itself must exist in the install namespace. See Prerequisites for the copy-from-Kloudfuse command.
2 Must match envoy-gateway.gatewayClassName (the external class) in every Kloudfuse custom_values.yaml that is supposed to register routes against this gateway.
3 Comma-separated Elastic IP allocation IDs. Without these AWS provisions a fresh NLB with new public IPs and DNS breaks.
4 Internal GatewayClass name — must match envoy-gateway.envoyService.internal.gatewayClassName in the Kloudfuse install. Only needed when the internal LB is enabled. Drop the internal block entirely if you have no internal LB.
5 <az-name>: AZ identifier matching the node group label (for example az1). The same selector/tolerations must be on the certgen Job — the chart does not inherit them from the deployment block.

Install the Standalone Controller

Authenticate against the OCI registry, then install the chart into a dedicated namespace:

gcloud auth print-access-token | helm registry login -u oauth2accesstoken \
  --password-stdin us-east1-docker.pkg.dev

helm upgrade --install envoy-gateway \
  oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/envoy-gateway \
  --version <VERSION> \
  --namespace envoy-gateway-system \
  --create-namespace \
  -f values.yaml
bash

Chart version. Pin the standalone envoy-gateway chart to the same version that your Kloudfuse release bundles, so the controller, Gateway API CRDs, and proxy data plane stay in lockstep with the Gateway/HTTPRoute objects Kloudfuse emits. The currently published version is v1.7.1. To find the version your Kloudfuse release expects, read the envoy-gateway dependency in its chart:

helm show chart oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse \
  --version <kfuse-version> | grep -A2 'name: envoy-gateway'
bash

Substitute it for <VERSION> in the helm install / upgrade commands on this page.

Repeat this step once per AZ, each with a different <release-name>, a distinct gatewayClassName (envoy-gateway-az1, envoy-gateway-az2, …), and the AZ-specific node selector / load balancer config.

Verify the Install

At install time the chart creates only the controller, the EnvoyProxy CR, and the GatewayClass. The data plane pods and cloud load balancer come up lazily — when a Gateway first references the GatewayClass, which happens when you install Kloudfuse against it (next section). So after helm install, just confirm the control plane:

kubectl get pods -n envoy-gateway-system          # controller Running
kubectl get gatewayclass envoy-gateway-az1        # ACCEPTED: True
bash

There is no load balancer Service yet — that is expected; it appears once a Gateway attaches.

Install Kloudfuse Against the Standalone Gateway

With the standalone controller running, install or upgrade each Kloudfuse cluster using the custom_values.yaml from Disable the Bundled Envoy (Envoy off, routes on, gatewayClassName matching the standalone release). Then verify it registered against the gateway:

# Gateway should show ADDRESS = the standalone LB IP, PROGRAMMED: True
kubectl get gateway -n <kfuse-namespace>

# All HTTPRoutes accepted
kubectl get httproute -n <kfuse-namespace>
bash

Migrating from standalone Nginx to Envoy (in-place LB repoint)

If the cluster currently serves traffic through a standalone Nginx controller (its own Helm release, with Kloudfuse emitting only Ingress rules), you can cut over to Envoy in place — reusing the existing Nginx load balancer so the IP / EIPs and DNS never change. It is the same trick the in-chart migration uses (Upgrade from Nginx to Envoy): repoint the Nginx LB Service selector at the Envoy proxy pods. The difference is that a standalone Nginx LB is owned by the upstream ingress-nginx chart, which has no envoyMigration logic — so the selector swap is done manually.

A Service can only select pods in its own namespace. Install the standalone Envoy controller into the same namespace as the Nginx controller so its managed proxy pods land where the Nginx LB Service can select them.

  1. Create the standalone Envoy values.yaml. Start from Configure the Standalone Controller and add the migration overrides — global.envoyMigration.enabled: true makes the Envoy proxy Service ClusterIP, so it does not create its own load balancer and contend for the EIPs:

    global:
      envoyMigration:
        enabled: true          # proxy Service = ClusterIP, no LB
        external: false
        internal: false
    envoyService:
      external:
        enabled: true          # external GatewayClass still created
      internal:
        enabled: true
    yaml
  2. helm upgrade --install the standalone Envoy controller into the same namespace as the Nginx controller, with a release name distinct from the Nginx release:

    helm upgrade --install envoy-gateway \
      oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/envoy-gateway \
      --version <VERSION> -n <nginx-namespace> \
      -f values.yaml
    
    kubectl get gatewayclass            # Accepted: True
    bash
  3. Edit the Kloudfuse custom_values.yaml to emit the Envoy Gateway and HTTPRoute resources while keeping the Nginx Ingress rules (overlap), so traffic keeps flowing through Nginx until you flip the LB:

    envoy-gateway:
      enabled: false
      installGatewayRoutes: true
      gatewayClassName: "<standalone-class>"
    ingress-nginx:
      enabled: false
      installIngressRules: true     # keep Nginx rules until the cutover is verified
    yaml
  4. helm upgrade Kloudfuse to apply it. This brings up the Envoy proxy pods:

    helm upgrade kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse \
      --version <kfuse-version> -n <kfuse-namespace> -f custom_values.yaml
    bash

    Verify, and note the proxy listener ports (Envoy maps port 80 to 10080 and 443 to 10443):

    kubectl get gateway -n <kfuse-namespace>            # PROGRAMMED: True
    kubectl get pods -n <nginx-namespace> -l app.kubernetes.io/managed-by=envoy-gateway
    kubectl get svc envoy -n <nginx-namespace> \
      -o jsonpath='{range .spec.ports[*]}{.port}{"=>"}{.targetPort}{"\n"}{end}'   # 443=>10443
    bash
  5. Add helm.sh/resource-policy: keep to the Nginx LB Service(s) in the Nginx values.yaml. Do this before repointing — it lands the annotation in Helm’s stored manifest, the only form helm uninstall honors (a kubectl-patched annotation is ignored — see the warning below):

    controller:
      service:
        annotations:
          helm.sh/resource-policy: keep        # external LB
        internal:
          annotations:
            helm.sh/resource-policy: keep      # internal LB, if used
    yaml
  6. helm upgrade the Nginx release to apply the annotation. The selector still points at Nginx here, so it keeps serving 302 with no disruption:

    # reuse the version you already run -- check it with: helm list -n <nginx-namespace>
    helm upgrade <nginx-release> ingress-nginx/ingress-nginx --version <installed-version> \
      -n <nginx-namespace> -f values.yaml
    bash
  7. Repoint the Nginx LB Service at the Envoy proxy pods — same Service, same load balancer, same IP / EIPs / DNS, no recreation:

    kubectl patch svc <nginx-controller-svc> -n <nginx-namespace> --type=json -p '[
      {"op":"replace","path":"/spec/selector","value":{
         "app.kubernetes.io/component":"proxy",
         "app.kubernetes.io/managed-by":"envoy-gateway",
         "app.kubernetes.io/name":"envoy",
         "gateway.envoyproxy.io/owning-gateway-name":"kfuse",
         "gateway.envoyproxy.io/owning-gateway-namespace":"<kfuse-namespace>"}},
      {"op":"replace","path":"/spec/ports/0/targetPort","value":10443}
    ]'
    bash

    owning-gateway-namespace is the namespace where Kloudfuse emits the Gateway (not the controller’s namespace). The cutover is effectively instant once the LB target group reconverges; an unauthenticated probe flips from 302 (Nginx login redirect) to 401 (Envoy auth):

    curl -sk https://<your-dns-host>/ -o /dev/null -w "%{http_code}\n"   # expect 401
    bash

    To roll back at this point, patch selector and targetPort back to the Nginx controller pods — the LB never moved.

    Internal LB. If you also run an internal load balancer, repeat this patch on the internal Service (<nginx-controller-svc>-internal), but target the internal Gateway: gateway.envoyproxy.io/owning-gateway-name: kfuse-internal (everything else, including targetPort: 10443, is identical). Each Service is patched independently — unlike the in-chart migration, where global.envoyMigration.external / internal flip both automatically.

  8. helm uninstall Nginx. The LB Service survives (its keep annotation is in the stored manifest from the earlier step) and keeps serving Envoy — no LB recreation, zero downtime:

    helm uninstall <nginx-release> -n <nginx-namespace>
    # Helm prints: "These resources were kept due to the resource policy: [Service] ..."
    bash

    The kept Service retains the live selector you patched, so it keeps serving Envoy on the same NLB / IP / EIPs. It is now unmanaged by Helm — adopt it into the Envoy release later if you want it Helm-managed.

The keep annotation only works when it is chart-rendered into Helm’s stored manifest — set it in the Nginx values.yaml (controller.service.annotations."helm.sh/resource-policy": keep) and helm install / upgrade. A helm.sh/resource-policy: keep added after the fact with kubectl annotate / patch is ignored by helm uninstall (Helm evaluates the annotation from its stored manifest, not the live object), and the Service and its cloud load balancer are deleted with the release, dropping traffic.

AZ / Region Failover

Standalone Envoy makes failover an in-place helm upgrade of the standalone release — no DNS change, no TLS re-issuance, no LB recreation.

The typical procedure:

  1. Bring up a fresh Kloudfuse install in the target AZ’s namespace, pointing at the new AZ’s GatewayClass (gatewayClassName: "envoy-gateway-az2"). Verify Kloudfuse is healthy.

    kubectl get pods -n <kfuse-az2-namespace>
    kubectl get gateway,httproute -n <kfuse-az2-namespace>
    bash
  2. Upgrade the standalone Envoy release so its gatewayClassName matches the new AZ:

    # values.yaml for the standalone release
    gatewayClassName: "envoy-gateway-az2"
    yaml
    helm upgrade envoy-gateway \
      oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/envoy-gateway \
      --version <VERSION> \
      --namespace envoy-gateway-system \
      -f values.yaml
    bash
  3. Confirm the new AZ’s Gateway is PROGRAMMED: True and traffic flows through:

    kubectl get gateway -n <kfuse-az2-namespace>
    
    curl -sk https://<your-dns-host>/ -o /dev/null -w "%{http_code}\n"
    bash
  4. Drain the old AZ’s Kloudfuse install once you’ve verified the new one is serving traffic.

For a fully cut-over deployment (no rollback expected), helm delete the old Kloudfuse release. For a blue/green deployment, leave the old install up but with installGatewayRoutes: false to detach its routes from the gateway.

Uninstall

helm delete envoy-gateway -n envoy-gateway-system removes the controller, but the EnvoyProxy data plane Deployments and Services it created at runtime are not owned by Helm and stay behind. Clean them up explicitly:

kubectl delete deploy,svc -l app.kubernetes.io/managed-by=envoy-gateway \
  -n envoy-gateway-system

for gc in $(kubectl get gatewayclass -o name | grep envoy-gateway); do
  kubectl patch "$gc" --type=merge -p '{"metadata":{"finalizers":[]}}'
  kubectl delete "$gc"
done

kubectl delete namespace envoy-gateway-system
bash

Troubleshooting

Symptom Cause / fix

Gateway stays PROGRAMMED: False on the Kloudfuse side

The gatewayClassName in the Kloudfuse custom_values.yaml does not match the gatewayClassName set in the standalone release. Check both values and re-run helm upgrade on Kloudfuse.

HTTPRoute objects show Accepted: False with ParentRefNotAllowed

The standalone Envoy’s GatewayClass does not allow the Kloudfuse namespace to attach routes. Confirm the standalone install’s allowedRoutes configuration includes the namespaces that contain Kloudfuse HTTPRoute objects.

LB IP changed after failover

The loadBalancerIP (GCP/Azure) or EIP allocations (AWS) in the standalone release’s values.yaml must stay constant across upgrades. Confirm the values file used for the failover upgrade still pins the same IP / EIPs.

Stale EnvoyProxy pods after switching gatewayClassName

The controller reconciles within ~30s. If pods persist beyond a minute, check kubectl logs -n envoy-gateway-system deploy/envoy-gateway. Stale pods owned by the old GatewayClass can be removed with kubectl delete pod -n envoy-gateway-system -l gateway.envoyproxy.io/owning-gatewayclass=<old-name>.

helm install fails with cannot re-use a name that is still in use

A standalone release of the same name already exists. List with helm list -n envoy-gateway-system; either helm upgrade the existing release or pick a new release name.