Standalone Envoy Gateway
In a multi-AZ or multi-region deployment, the Envoy Gateway control plane (the EnvoyProxy data plane pods, the GatewayClass, the cloud load balancer) must outlive any individual Kloudfuse cluster. During a failover you bring up a fresh Kloudfuse install in the target AZ and then repoint the existing Envoy Gateway at the new cluster — the load balancer IP, DNS, and TLS termination stay put.
To support that, install envoy-gateway as a separate Helm release in its own namespace, outside the Kloudfuse chart. Each Kloudfuse install then references the shared gateway by GatewayClass name and only installs its own Gateway and HTTPRoute resources — no controller, no proxy pods, no load balancer.
When to Use Standalone Envoy
Use this pattern when any of the following apply:
-
You need to fail over between Kloudfuse clusters in different AZs or regions without DNS changes or TLS re-issuance.
-
You want to spin up a new Kloudfuse install in parallel with an existing one and cut traffic over once verified.
-
The Envoy load balancer must survive
helm deleteof the Kloudfuse release (for example, blue/green upgrades).
| Failover repoints one standalone release between clusters, keeping a single IP / set of EIPs — it does not run two load balancers on the same address (an IP/EIP attaches to one LB at a time). For true side-by-side, give each LB distinct IPs and switch DNS once verified. |
For a single-cluster install where Envoy and Kloudfuse are managed together, use the in-chart configuration described in Configure Envoy Ingress instead.
Prerequisites
-
Kubernetes cluster 1.27 or later with
kubectlconfigured. -
Helm 3.x.
-
Gateway API CRDs installed cluster-wide (see Install Envoy CRDs).
-
cert-manager installed cluster-wide (or AWS ACM configured) for TLS.
-
A reserved static external IP (and an internal IP if the deployment needs an internal LB).
-
OCI registry access for
oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/envoy-gateway. The login commands below use thegcloudCLI (authenticated to a principal with read access to the registry); ifgcloudis not available, substitute any other method of obtaining a registry token — e.g.helm registry login -u _json_key --password-stdin … < key.jsonwith a service account key — before runninghelm install. -
Image pull secret for
us.gcr.io/mvp-demo-301906/kfuse/envoy-gateway-fipsin the install namespace — referenced byglobal.imagePullSecretsin yourvalues.yaml(the controller and proxy won’t start without it). Copy the existingkfuse-image-pull-credentialssecret from the Kloudfuse namespace, or create adockerconfigjsonsecret from your registry key:kubectl get secret kfuse-image-pull-credentials -n <kfuse-namespace> -o yaml \ | grep -v 'namespace:\|resourceVersion:\|uid:\|creationTimestamp:' \ | kubectl apply -n envoy-gateway-system -f -bash
Disable the Bundled Envoy in the Kloudfuse Chart
Before installing the standalone controller, configure each Kloudfuse install so it emits routing manifests only — no controller, no proxy pods, no load balancer.
Add the following to each Kloudfuse custom_values.yaml:
envoy-gateway:
enabled: false (1)
installGatewayRoutes: true (2)
gatewayClassName: "envoy-gateway-az1" (3)
envoyService:
internal:
enabled: true (4)
gatewayClassName: "envoy-gateway-az1-internal"
ingress-nginx:
enabled: false
installIngressRules: false
| 1 | enabled: false skips the controller Deployment, the EnvoyProxy data plane, and the load balancer Service. The standalone release provides all three. |
| 2 | installGatewayRoutes: true keeps the Gateway, HTTPRoute, SecurityPolicy, and BackendTrafficPolicy resources — these tell the standalone controller where to send traffic for this Kloudfuse install. |
| 3 | Must match the external GatewayClass name produced by the standalone release. See Configure the Standalone Controller for naming. |
| 4 | Only if you run an internal load balancer. The internal Gateway is a second GatewayClass and must be wired separately: set envoyService.internal.gatewayClassName here to match the internal class produced by the standalone release (envoyService.internal.gatewayClassName there). Omit this whole envoyService block if you have no internal LB. |
|
Both |
Configure the Standalone Controller
Create a values.yaml for the standalone envoy-gateway chart. The release must:
-
Pin the controller and EnvoyProxy pods to the target AZ’s node group via
nodeSelectorandtolerations. -
Set a unique
gatewayClassNameper AZ so each Kloudfuse install can target the right gateway. -
Configure the cloud load balancer (external + optional internal) to match the IP / EIP / DNS already in use.
-
AWS (EKS)
-
GCP (GKE)
-
Azure (AKS)
global:
# Image pull credentials for us.gcr.io/mvp-demo-301906/kfuse/envoy-gateway-fips.
# Required — without this, both the controller and the EnvoyProxy data plane
# pods will ImagePullBackOff. The named secret must already exist in the
# install namespace (see Prerequisites for how to copy it).
imagePullSecrets:
- kfuse-image-pull-credentials (1)
# Optional: set true only if your cluster uses kfRoles node pools
# (kf_role label/taint); otherwise pin via deployment.pod.* below.
# kfRoles:
# enabled: true
# GatewayClass name referenced by Kloudfuse installs' envoy-gateway.gatewayClassName
gatewayClassName: "envoy-gateway-az1" (2)
# External LB Service shape — must match the NLB / EIPs the Nginx or
# previous Envoy install was using, so DNS records do not change.
envoyService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: 'true'
service.beta.kubernetes.io/aws-load-balancer-eip-allocations: <YOUR_EIP_ALLOC_IDS> (3)
patch:
externalTrafficPolicy: Local
external:
enabled: true
internal:
enabled: true
gatewayClassName: "envoy-gateway-az1-internal" (4)
annotations:
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
service.beta.kubernetes.io/aws-load-balancer-type: nlb
# Pin to the AZ's node group
deployment:
pod:
nodeSelector:
ng_label: <az-name> (5)
tolerations:
- key: "ng_taint"
operator: "Equal"
value: "<az-name>"
effect: "NoSchedule"
# certgen Job runs the same pod-placement constraints
certgen:
job:
nodeSelector:
ng_label: <az-name>
tolerations:
- key: "ng_taint"
operator: "Equal"
value: "<az-name>"
effect: "NoSchedule"
| 1 | Image pull secret name — the secret itself must exist in the install namespace. See Prerequisites for the copy-from-Kloudfuse command. |
| 2 | Must match envoy-gateway.gatewayClassName (the external class) in every Kloudfuse custom_values.yaml that is supposed to register routes against this gateway. |
| 3 | Comma-separated Elastic IP allocation IDs. Without these AWS provisions a fresh NLB with new public IPs and DNS breaks. |
| 4 | Internal GatewayClass name — must match envoy-gateway.envoyService.internal.gatewayClassName in the Kloudfuse install. Only needed when the internal LB is enabled. Drop the internal block entirely if you have no internal LB. |
| 5 | <az-name>: AZ identifier matching the node group label (for example az1). The same selector/tolerations must be on the certgen Job — the chart does not inherit them from the deployment block. |
global:
# Required image pull secret — see Prerequisites for setup.
imagePullSecrets:
- kfuse-image-pull-credentials (1)
# Optional kfRoles pool isolation (see AWS tab for details).
# kfRoles:
# enabled: true
gatewayClassName: "envoy-gateway-az1" (2)
envoyService:
patch:
loadBalancerIP: "<EXTERNAL_STATIC_IP>" (3)
external:
enabled: true
internal:
enabled: true
# Internal class — must match envoy-gateway.envoyService.internal.gatewayClassName
# on the Kloudfuse side. Omit the whole internal block if you have no internal LB.
gatewayClassName: "envoy-gateway-az1-internal"
patch:
loadBalancerIP: "<INTERNAL_STATIC_IP>"
annotations:
networking.gke.io/load-balancer-type: "Internal"
cloud.google.com/load-balancer-type: "Internal"
deployment:
pod:
nodeSelector:
ng_label: <az-name> (4)
tolerations:
- key: "ng_taint"
operator: "Equal"
value: "<az-name>"
effect: "NoSchedule"
certgen:
job:
nodeSelector:
ng_label: <az-name>
tolerations:
- key: "ng_taint"
operator: "Equal"
value: "<az-name>"
effect: "NoSchedule"
| 1 | Image pull secret name — see Prerequisites. |
| 2 | Must match envoy-gateway.gatewayClassName in every Kloudfuse custom_values.yaml. |
| 3 | Pre-allocated GCP static external IP. Must be the IP currently mapped to the public DNS hostname. |
| 4 | <az-name>: AZ identifier matching the node group label. The same selector/tolerations must be on the certgen Job — the chart does not inherit them from the deployment block. |
global:
# Required image pull secret — see Prerequisites for setup.
imagePullSecrets:
- kfuse-image-pull-credentials (1)
# Optional kfRoles pool isolation (see AWS tab for details).
# kfRoles:
# enabled: true
gatewayClassName: "envoy-gateway-az1" (2)
envoyService:
patch:
loadBalancerIP: "<EXTERNAL_PUBLIC_IP>" (3)
externalTrafficPolicy: Local
external:
enabled: true
internal:
enabled: true
# Internal class — must match envoy-gateway.envoyService.internal.gatewayClassName
# on the Kloudfuse side. Omit the whole internal block if you have no internal LB.
gatewayClassName: "envoy-gateway-az1-internal"
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
deployment:
pod:
nodeSelector:
ng_label: <az-name> (4)
certgen:
job:
nodeSelector:
ng_label: <az-name>
| 1 | Image pull secret name — see Prerequisites. |
| 2 | Must match envoy-gateway.gatewayClassName in every Kloudfuse custom_values.yaml. |
| 3 | Pre-allocated Azure public IP. Must be the IP currently mapped to the public DNS hostname. |
| 4 | <az-name>: AZ identifier matching the node group label. The same selector must be on the certgen Job — the chart does not inherit it from the deployment block. |
Install the Standalone Controller
Authenticate against the OCI registry, then install the chart into a dedicated namespace:
gcloud auth print-access-token | helm registry login -u oauth2accesstoken \
--password-stdin us-east1-docker.pkg.dev
helm upgrade --install envoy-gateway \
oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/envoy-gateway \
--version <VERSION> \
--namespace envoy-gateway-system \
--create-namespace \
-f values.yaml
|
Chart version. Pin the standalone
bash Substitute it for |
Repeat this step once per AZ, each with a different <release-name>, a distinct gatewayClassName (envoy-gateway-az1, envoy-gateway-az2, …), and the AZ-specific node selector / load balancer config.
Verify the Install
At install time the chart creates only the controller, the EnvoyProxy CR, and the GatewayClass. The data plane pods and cloud load balancer come up lazily — when a Gateway first references the GatewayClass, which happens when you install Kloudfuse against it (next section). So after helm install, just confirm the control plane:
kubectl get pods -n envoy-gateway-system # controller Running
kubectl get gatewayclass envoy-gateway-az1 # ACCEPTED: True
There is no load balancer Service yet — that is expected; it appears once a Gateway attaches.
Install Kloudfuse Against the Standalone Gateway
With the standalone controller running, install or upgrade each Kloudfuse cluster using the custom_values.yaml from Disable the Bundled Envoy (Envoy off, routes on, gatewayClassName matching the standalone release). Then verify it registered against the gateway:
# Gateway should show ADDRESS = the standalone LB IP, PROGRAMMED: True
kubectl get gateway -n <kfuse-namespace>
# All HTTPRoutes accepted
kubectl get httproute -n <kfuse-namespace>
Migrating from standalone Nginx to Envoy (in-place LB repoint)
If the cluster currently serves traffic through a standalone Nginx controller (its own Helm release, with Kloudfuse emitting only Ingress rules), you can cut over to Envoy in place — reusing the existing Nginx load balancer so the IP / EIPs and DNS never change. It is the same trick the in-chart migration uses (Upgrade from Nginx to Envoy): repoint the Nginx LB Service selector at the Envoy proxy pods. The difference is that a standalone Nginx LB is owned by the upstream ingress-nginx chart, which has no envoyMigration logic — so the selector swap is done manually.
|
A Service can only select pods in its own namespace. Install the standalone Envoy controller into the same namespace as the Nginx controller so its managed proxy pods land where the Nginx LB Service can select them. |
-
Create the standalone Envoy
values.yaml. Start from Configure the Standalone Controller and add the migration overrides —global.envoyMigration.enabled: truemakes the Envoy proxy ServiceClusterIP, so it does not create its own load balancer and contend for the EIPs:global: envoyMigration: enabled: true # proxy Service = ClusterIP, no LB external: false internal: false envoyService: external: enabled: true # external GatewayClass still created internal: enabled: trueyaml -
helm upgrade --installthe standalone Envoy controller into the same namespace as the Nginx controller, with a release name distinct from the Nginx release:helm upgrade --install envoy-gateway \ oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/envoy-gateway \ --version <VERSION> -n <nginx-namespace> \ -f values.yaml kubectl get gatewayclass # Accepted: Truebash -
Edit the Kloudfuse
custom_values.yamlto emit the EnvoyGatewayandHTTPRouteresources while keeping the Nginx Ingress rules (overlap), so traffic keeps flowing through Nginx until you flip the LB:envoy-gateway: enabled: false installGatewayRoutes: true gatewayClassName: "<standalone-class>" ingress-nginx: enabled: false installIngressRules: true # keep Nginx rules until the cutover is verifiedyaml -
helm upgradeKloudfuse to apply it. This brings up the Envoy proxy pods:helm upgrade kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse \ --version <kfuse-version> -n <kfuse-namespace> -f custom_values.yamlbashVerify, and note the proxy listener ports (Envoy maps port 80 to
10080and 443 to10443):kubectl get gateway -n <kfuse-namespace> # PROGRAMMED: True kubectl get pods -n <nginx-namespace> -l app.kubernetes.io/managed-by=envoy-gateway kubectl get svc envoy -n <nginx-namespace> \ -o jsonpath='{range .spec.ports[*]}{.port}{"=>"}{.targetPort}{"\n"}{end}' # 443=>10443bash -
Add
helm.sh/resource-policy: keepto the Nginx LB Service(s) in the Nginxvalues.yaml. Do this before repointing — it lands the annotation in Helm’s stored manifest, the only formhelm uninstallhonors (akubectl-patched annotation is ignored — see the warning below):controller: service: annotations: helm.sh/resource-policy: keep # external LB internal: annotations: helm.sh/resource-policy: keep # internal LB, if usedyaml -
helm upgradethe Nginx release to apply the annotation. The selector still points at Nginx here, so it keeps serving302with no disruption:# reuse the version you already run -- check it with: helm list -n <nginx-namespace> helm upgrade <nginx-release> ingress-nginx/ingress-nginx --version <installed-version> \ -n <nginx-namespace> -f values.yamlbash -
Repoint the Nginx LB Service at the Envoy proxy pods — same Service, same load balancer, same IP / EIPs / DNS, no recreation:
kubectl patch svc <nginx-controller-svc> -n <nginx-namespace> --type=json -p '[ {"op":"replace","path":"/spec/selector","value":{ "app.kubernetes.io/component":"proxy", "app.kubernetes.io/managed-by":"envoy-gateway", "app.kubernetes.io/name":"envoy", "gateway.envoyproxy.io/owning-gateway-name":"kfuse", "gateway.envoyproxy.io/owning-gateway-namespace":"<kfuse-namespace>"}}, {"op":"replace","path":"/spec/ports/0/targetPort","value":10443} ]'bashowning-gateway-namespaceis the namespace where Kloudfuse emits theGateway(not the controller’s namespace). The cutover is effectively instant once the LB target group reconverges; an unauthenticated probe flips from302(Nginx login redirect) to401(Envoy auth):curl -sk https://<your-dns-host>/ -o /dev/null -w "%{http_code}\n" # expect 401bashTo roll back at this point, patch
selectorandtargetPortback to the Nginx controller pods — the LB never moved.Internal LB. If you also run an internal load balancer, repeat this patch on the internal Service (
<nginx-controller-svc>-internal), but target the internal Gateway:gateway.envoyproxy.io/owning-gateway-name: kfuse-internal(everything else, includingtargetPort: 10443, is identical). Each Service is patched independently — unlike the in-chart migration, whereglobal.envoyMigration.external/internalflip both automatically. -
helm uninstallNginx. The LB Service survives (itskeepannotation is in the stored manifest from the earlier step) and keeps serving Envoy — no LB recreation, zero downtime:helm uninstall <nginx-release> -n <nginx-namespace> # Helm prints: "These resources were kept due to the resource policy: [Service] ..."bashThe kept Service retains the live selector you patched, so it keeps serving Envoy on the same NLB / IP / EIPs. It is now unmanaged by Helm — adopt it into the Envoy release later if you want it Helm-managed.
|
The keep annotation only works when it is chart-rendered into Helm’s stored manifest — set it in the Nginx |
AZ / Region Failover
Standalone Envoy makes failover an in-place helm upgrade of the standalone release — no DNS change, no TLS re-issuance, no LB recreation.
The typical procedure:
-
Bring up a fresh Kloudfuse install in the target AZ’s namespace, pointing at the new AZ’s
GatewayClass(gatewayClassName: "envoy-gateway-az2"). Verify Kloudfuse is healthy.kubectl get pods -n <kfuse-az2-namespace> kubectl get gateway,httproute -n <kfuse-az2-namespace>bash -
Upgrade the standalone Envoy release so its
gatewayClassNamematches the new AZ:# values.yaml for the standalone release gatewayClassName: "envoy-gateway-az2"yamlhelm upgrade envoy-gateway \ oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/envoy-gateway \ --version <VERSION> \ --namespace envoy-gateway-system \ -f values.yamlbash -
Confirm the new AZ’s
GatewayisPROGRAMMED: Trueand traffic flows through:kubectl get gateway -n <kfuse-az2-namespace> curl -sk https://<your-dns-host>/ -o /dev/null -w "%{http_code}\n"bash -
Drain the old AZ’s Kloudfuse install once you’ve verified the new one is serving traffic.
For a fully cut-over deployment (no rollback expected), helm delete the old Kloudfuse release. For a blue/green deployment, leave the old install up but with installGatewayRoutes: false to detach its routes from the gateway.
|
Uninstall
helm delete envoy-gateway -n envoy-gateway-system removes the controller, but the EnvoyProxy data plane Deployments and Services it created at runtime are not owned by Helm and stay behind. Clean them up explicitly:
kubectl delete deploy,svc -l app.kubernetes.io/managed-by=envoy-gateway \
-n envoy-gateway-system
for gc in $(kubectl get gatewayclass -o name | grep envoy-gateway); do
kubectl patch "$gc" --type=merge -p '{"metadata":{"finalizers":[]}}'
kubectl delete "$gc"
done
kubectl delete namespace envoy-gateway-system
Troubleshooting
| Symptom | Cause / fix |
|---|---|
|
The |
|
The standalone Envoy’s |
LB IP changed after failover |
The |
Stale |
The controller reconciles within ~30s. If pods persist beyond a minute, check |
|
A standalone release of the same name already exists. List with |