Load Balancer
LoadBalancer Services in Kubernetes play a crucial role in exposing the "serverless inference endpoint" to external traffic. Since the serverless inference offering will be deployed on-premises, you will need to configure and deploy a tool like MetalLB.
Info
Although services can be exposed using alternatives like NodePort or externalIP, they are not scalable or recommended for highly available deployments.
MetalLB can scale according to demand and handle various load balancing requirements. It dynamically adjusts IP address allocation to accommodate traffic growth, making it well-suited for environments with changing needs.
-
MetalLB operates by creating speaker pods as a DaemonSet in your Kubernetes cluster. These speaker pods are responsible for advertising the Public IP addresses of your LoadBalancer Services to the outside world, making your services accessible externally.
-
To propagate External IPs to other nodes, MetalLB speaker pods use either ARP (Address Resolution Protocol) or BGP (Border Gateway Protocol).
MetalLB Operational Modes¶
MetalLB supports two operational modes described below
Layer 2 Mode (ARP)¶
Layer 2 networking is your classic Ethernet-style network communication. It operates within the same broadcast domain (i.e. your local network) and uses protocols like ARP (Address Resolution Protocol) to map IP addresses to MAC addresses. You should consider this as the “local” option for service announcements.
Benefits¶
For smaller setups or POCs, Layer 2 is simpler to manage and deploy. The advantages of the Layer 2 Mode are:
-
Simplicity Layer 2 doesn’t require complex router configurations or a deep understanding of networking protocols. This mode does not require coordination with the networking and security teams in your datacenter.
-
Low Resource Requirements There’s no need for additional CPU cycles or memory to handle route advertisements and calculations.
Important
When you scale beyond a few nodes or need robust failover and load balancing (aka SLAs), we advocate and recommend using the Layer 3 (BGP) mode.
Considerations¶
- Single Node Bottleneck
MetalLB routes all traffic for a service through a single node, the node can become a bottleneck and limit performance. Layer 2 mode limits the ingress bandwidth for your service to the bandwidth of a single node. This is a fundamental limitation of using ARP and NDP to direct traffic.
- Slow Failover
Failover between nodes depends on cooperation from the clients. When a failover occurs, MetalLB sends gratuitous ARP packets to notify clients that the MAC address associated with the service IP has changed.
Most client operating systems handle gratuitous ARP packets correctly and update their neighbor caches promptly. When clients update their caches quickly, failover completes within a few seconds. Clients typically fail over to a new node within 10 seconds. However, some client operating systems either do not handle gratuitous ARP packets at all or have outdated implementations that delay the cache update.
-
Recent versions of common operating systems such as Windows, macOS, and Linux implement layer 2 failover correctly. Issues with slow failover are not expected except for older and less common client operating systems.
-
To minimize the impact from a planned failover on outdated clients, keep the old node running for a few minutes after flipping leadership. The old node can continue to forward traffic for outdated clients until their caches refresh.
-
During an unplanned failover, the service IPs are unreachable until the outdated clients refresh their cache entries.
Layer 3 Mode (BGP)¶
BGP (Border Gateway Protocol), on the other hand, is a Layer 3 routing protocol designed to handle communication between networks. Instead of just working within a local network, it advertises which IP addresses or subnets belong where — both inside and outside your data center.
Benefits¶
BGP allows your Kubernetes cluster to advertise service IPs directly to your routers, removing the need for Layer 2’s broadcast-based communication. This comes with some serious advantages:
-
Better Scalability L2 networking works fine in small clusters, but as your environment grows, so does the broadcast noise. Imagine dozens — or hundreds — of nodes shouting at each other about ARP requests. BGP scales much better because it doesn’t rely on broadcasting. It simply tells routers, “Hey, I’ve got this IP over here.”
-
High Availability and Failover With L2, if a node holding a service’s IP goes down, there’s a delay while Kubernetes shifts things around. BGP makes this transition fast and seamless. When a node fails, the route to its IP is withdrawn from the network, and traffic automatically flows to another node hosting the service.
-
Network-Wide Integration L2 is limited to the local network, but BGP can go beyond that. It communicates directly with upstream routers and peers, enabling external devices to know exactly where to send traffic. This is especially useful in multi-data-center or hybrid cloud setups.
Considerations¶
- Node Failure impacting Active Connections
When a BGP session terminates (e.g. when a node fails or when a speaker pod restarts), the session termination might result in resetting all active connections. End users can experience a Connection reset by peer message.
The consequence of a terminated BGP session is implementation-specific for each router manufacturer. However, you can anticipate that a change in the number of speaker pods affects the number of BGP sessions and that active connections with BGP peers will break.
To avoid or reduce the likelihood of a service interruption, you can specify a node selector when you add a BGP peer. By limiting the number of nodes that start BGP sessions, a fault on a node that does not have a BGP session has no affect on connections to the service.
Requirements¶
Before you deploy and configure MetalLB, ensure the following requirements are satisfied:
- Kubernetes Cluster
- v1.31.x or higher
- Access with cluster admin privileges
- Ensure there are no other Load Balancers installed on the cluster
- Ensure that the cluster has a CNI compatible with MetalLB
- For L3/BGP mode
- Ensure your routers are BGP compatible
- Ensure port (179/TCP) is open between the external router and the MetalLB-enabled nodes
- For L2/ARP mode
- Ensure that port 7946 (TCP/UDP) is allowed to pass through on each node for the memberlist functionality.
Deployment Steps¶
Once the requirements are satisfied, follow the steps described below. For simplicity of administration, it is recommended to use the MetalLB Kubernetes Operator
Step 1: MetalLB Operator¶
Install the MetalLB Operator using a cluster-admin credential within the "metallb-system" namespace.
Step 2: MetalLB Custom Resource¶
After you install the Operator, you need to configure a single instance of a MetalLB custom resource. After you configure the custom resource, the Operator starts MetalLB on your cluster. The operator will deploy the core MetalLB components (controller + speaker) on the cluster.
apiVersion: metallb.io/v1beta1
kind: MetalLB
metadata:
name: metallb
namespace: metallb-system
Apply it using kubectl
kubectl apply -f metallb-cr.yaml
¶
kubectl apply -f metallb-cr.yaml
Step 3: AddressPool & Advertisement¶
You can configure MetalLB so that the IP address is advertised with Layer 2 protocols, the BGP protocol, or both. MetalLB supports advertising using L2 and BGP for the same set of IP addresses.
- With Layer 2, MetalLB provides a fault-tolerant external IP address.
- With BGP, MetalLB provides fault-tolerance for the external IP address and load balancing.
Layer-2 Example¶
For Layer-2, create an AddressPool to define the IP addresses MetalLB will assign to services.
IPAddressPool
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: default-pool
namespace: metallb-system
spec:
addresses:
- 192.168.10.100-192.168.10.120
---
**L2Advertisement**
``` yaml
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: l2adv
namespace: metallb-system
spec:
ipAddressPools:
- default-pool
Layer-3 Example¶
For Layer-3, let's assume the following:
- Your service VIP range: 192.168.10.200-192.168.10.220
- Your router: 192.168.10.1, ASN 65001
- Your Kubernetes cluster (MetalLB) ASN: 64513
Now, create the following Kubernetes resources on your cluster using "kubectl apply"
IPAddressPool
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: bgp-pool
namespace: metallb-system
spec:
addresses:
- 192.168.10.200-192.168.10.220
¶
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: bgp-pool
namespace: metallb-system
spec:
addresses:
- 192.168.10.200-192.168.10.220
BGPAdvertisement
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
name: bgp-adv
namespace: metallb-system
spec:
ipAddressPools:
- bgp-pool
aggregationLength: 32 # advertise individual IPs (host routes)
localPref: 100 # optional, BGP attribute
¶
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
name: bgp-adv
namespace: metallb-system
spec:
ipAddressPools:
- bgp-pool
aggregationLength: 32 # advertise individual IPs (host routes)
localPref: 100 # optional, BGP attribute
BGPPeer
Create a peer definition for your external router. On the router side (FRR/Cisco/etc.) you’d configure BGP neighbor 192.168.10.x (node IPs that run MetalLB speakers) with remote-AS 64513, and allow those host routes.
apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
name: router-1
namespace: metallb-system
spec:
myASN: 64513 # ASN used by MetalLB / cluster
peerASN: 65001 # ASN of your router
peerAddress: 192.168.10.1 # router IP (reachable from nodes)
# optional, but common:
ebgpMultiHop: 1 # set >1 if router not L2-adjacent
nodeSelectors:
- matchLabels:
kubernetes.io/hostname: worker-1 # or omit to peer from all nodes
Troubleshooting¶
MetalLB has published detailed documentation on troubleshooting.
