HTTP load balancer

This chapter describes how to use NGINX and NGINX Plus as a load balancer.

Load Balancing Overview
Proxying Traffic to a Group of Servers
Choosing a Load Balancing Method
Server Weights
Server Slow Start
Enabling Session Persistence
Limiting the Number of Connections
Passive Health Monitoring
Active Health Monitoring
Sharing Data with Multiple Worker Processes
Configuring Load Balancing Using DNS
Load Balancing of Microsoft Exchange Servers
On-the-Fly (runtime) Configuration

Overview

Load balancing across multiple application instances is a commonly used technique for optimizing resource utilization, maximizing throughput, reducing latency, and ensuring fault-tolerant configurations.

Watch the NGINX Load Balancing Software Webinar On Demand for a deep dive on techniques that NGINX users employ to build large scale, highly-available web services.

NGINX can be used in different deployment scenarios as a very efficient HTTP load balancer.

Proxying Traffic to a Group of Servers

To start using NGINX with a group of servers, first, you need to define the group with the upstream directive. The directive is placed in the http context.

Servers in the group are configured using the server directive (not to be confused with the server block that defines a virtual server running on NGINX). For example, the following configuration defines a group named backend and consists of three server configurations (which may resolve in more than three actual servers):

http {
    upstream backend {
        server backend1.example.com weight=5;
        server backend2.example.com;
        server 192.0.0.1 backup;
    }
}

To pass requests to a server group, the name of the group is specified in the proxy_pass directive (or fastcgi_pass, memcached_pass, uwsgi_pass, scgi_pass depending on the protocol). In the next example, a virtual server running on NGINX passes all requests to the backend server group defined in the previous example:

server {
    location / {
        proxy_pass http://backend;
    }
}

The following example sums up the two examples above and shows proxying requests to the backend server group, where the server group consists of three servers, two of them run two instances of the same application while one is a backup server, NGINX applies HTTP load balancing to distribute the requests:

http {
    upstream backend {
        server backend1.example.com;
        server backend2.example.com;
        server 192.0.0.1 backup;
    }
    server {
        location / {
            proxy_pass http://backend;
        }
    }
}

Choosing a Load Balancing Method

NGINX supports four load balancing methods, and NGINX Plus an additional fifth method:

The round-robin method: requests are distributed evenly across the servers with server weights taken into consideration. This method is used by default (there is no directive for enabling it):
```
upstream backend {
   server backend1.example.com;
   server backend2.example.com;
}
```
The least_conn method: a request is sent to the server with the least number of active connections with server weights taken into consideration:
```
upstream backend {
    least_conn;

    server backend1.example.com;
    server backend2.example.com;
}
```
The ip_hash method: the server to which a request is sent is determined from the client IP address. In this case, either the first three octets of IPv4 address or the whole IPv6 address are used to calculate the hash value. The method guarantees that requests from the same address get to the same server unless it is not available.
```
upstream backend {
    ip_hash;

    server backend1.example.com;
    server backend2.example.com;
}
```
If one of the servers needs to be temporarily removed, it can be marked with the down parameter in order to preserve the current hashing of client IP addresses. Requests that were to be processed by this server are automatically sent to the next server in the group:
```
upstream backend {
    server backend1.example.com;
    server backend2.example.com;
    server backend3.example.com down;
}
```
The generic hash method: the server to which a request is sent is determined from a user-defined key which may be a text, variable, or their combination. For example, the key may be a source IP and port, or URI:
```
upstream backend {
    hash $request_uri consistent;

    server backend1.example.com;
    server backend2.example.com;
}
```
The optional consistent parameter of the hash directive enables ketama consistent hash load balancing. Requests will be evenly distributed across all upstream servers based on the user-defined hashed key value. If an upstream server is added to or removed from an upstream group, only few keys will be remapped which will minimize cache misses in case of load balancing cache servers and other applications that accumulate state.
The least_time method (NGINX Plus): for each request, NGINX Plus selects the server with the lowest average latency and the least number of active connections, where the lowest average latency is calculated based on which of the following parameters is included on the least_time directive:
- header – Time to receive the first byte from the server
- last_byte – Time to receive the full response from the server
```
upstream backend {
    least_time header;

    server backend1.example.com;
    server backend2.example.com;
}
```

Note: When configuring any method other than round-robin, put the corresponding directive (least_conn, ip_hash, hash, least_time) above the list of server directives in the upstream block.

Server Weights

By default, NGINX distributes requests among the servers in the group according to their weights using the round-robin method. The weight parameter of the server directive sets the weight of a server, by default, it is 1:

upstream backend {
    server backend1.example.com weight=5;
    server backend2.example.com;
    server 192.0.0.1 backup;
}

In the example, backend1.example.com has weight 5; the other two servers have the default weight (1), but the one with IP address 192.0.0.1 is marked as a backup server and does not receive requests unless both of the other servers are unavailable. With this configuration of weights, out of every six requests, five are sent backend1.example.com and one to backend2.example.com.

Server Slow Start

The server slow start feature prevents a recently recovered server from being overwhelmed by connections, which may timeout and cause the server to be marked as failed again.

In NGINX Plus, slow start allows an upstream server to gradually recover its weight from zero to its nominal value after it has been recovered or became available. This can be done with the slow_start parameter of the server directive:

upstream backend {
    server backend1.example.com slow_start=30s;
    server backend2.example.com;
    server 192.0.0.1 backup;
}

The time value sets the time for the server will recover its weight.

Note that if there is only a single server in a group, max_fails, fail_timeout, and slow_start parameters will be ignored and this server will never be considered unavailable.

Enabling Session Persistence

Session persistence means that NGINX Plus identifies user sessions and routes the requests from this session to the same upstream server.

NGINX Plus supports three session persistence methods. The methods are set with the sticky directive.

The sticky cookie method. With this method, NGINX Plus adds a session cookie to the first response from the upstream group and identifies the server which has sent the response. When a client issues next request, it will contain the cookie value and NGINX Plus will route the request to the same upstream server:
```
upstream backend {
    server backend1.example.com;
    server backend2.example.com;

    sticky cookie srv_id expires=1h domain=.example.com path=/;
}
```
In the example, the srv_id parameter sets the name of the cookie which will be set or inspected. The optional expires parameter sets the time for the browser to keep the cookie. The optional domain parameter defines a domain for which the cookie is set. The optional path parameter defines the path for which the cookie is set. This is the simplest session persistence method.
The sticky route method. With this method, NGINX Plus will assign a “route” to the client when it receives the first request. All subsequent requests will be compared with the route parameter of the server directive to identify the server where requests will be proxied. The route information is taken from either cookie, or URI.
```
upstream backend {
    server backend1.example.com route=a;
    server backend2.example.com route=b;

    sticky route $route_cookie $route_uri;
}
```
The cookie learn method. With this method, NGINX Plus first finds session identifiers by inspecting requests and responses. Then NGINX Plus “learns” which upstream server corresponds to which session identifier. Generally, these identifiers are passed in a HTTP cookie. If a request contains a session identifier already “learned”, NGINX Plus will forward the request to the corresponding server:
```
upstream backend {
   server backend1.example.com;
   server backend2.example.com;

   sticky learn 
       create=$upstream_cookie_examplecookie
       lookup=$cookie_examplecookie
       zone=client_sessions:1m
       timeout=1h;
}
```
In the example, one of the upstream servers creates a session by setting the cookie “EXAMPLECOOKIE” in the response.

The obligatory parameter create specifies a variable that indicates how a new session is created. In our example, new sessions are created from the cookie “EXAMPLECOOKIE” sent by the upstream server.

The obligatory parameter lookup specifies how to search for existing sessions. In our example, existing sessions are searched in the cookie “EXAMPLECOOKIE” sent by the client.

The obligatory parameter zone specifies a shared memory zone where all information about sticky sessions is kept. In our example, the zone is named client_sessions and has the size of 1 megabyte.

This is a more sophisticated session persistence method as it does not require keeping any cookies on the client side: all info is kept server-side in the shared memory zone.

Limiting the Number of Connections

With NGINX Plus, it is possible to maintain the desired number of connections by setting the connection limit with the max_conns parameter.

If the max_conns limit has been reached, the request can be placed into the queue for its further processing provided that the queue directive is specified. The directive sets the maximum number of requests that can be simultaneously in the queue:

upstream backend {
    server backend1.example.com  max_conns=3;
    server backend2.example.com;

    queue 100 timeout=70;
}

If the queue is filled up with requests or the upstream server cannot be selected during the timeout specified in the optional timeout parameter, the client will receive an error.

Note that the max_conns limit will be ignored if there are idle keepalive connections opened in other worker processes. As a result, the total number of connections to the server may exceed the max_conns value in a configuration where the memory is shared with multiple worker processes.

Passive Health Monitoring

When NGINX considers a server unavailable, it temporarily stops sending requests to that server until it is considered active again. The following parameters of the server directive configure the conditions to consider a server unavailable:

The fail_timeout parameter sets the time during which the specified number of failed attempts should happen and still consider the server unavailable. In other words, the server is unavailable for the interval set by fail_timeout.
The max_fails parameter sets the number of failed attempts that should happen during the specified time to still consider the server unavailable.

The default values are 10 seconds and 1 attempt. So if NGINX fails to send a request to some server or does not receive a response from this server at least once, it immediately considers the server unavailable for 10 seconds. The following example shows how to set these parameters:

upstream backend {                
    server backend1.example.com;
    server backend2.example.com max_fails=3 fail_timeout=30s;
    server backend3.example.com max_fails=2;
}

Next are some more sophisticated features for tracking server availability available in NGINX Plus.

Active Health Monitoring

Periodically sending special requests to each server and checking for a response that satisfies certain conditions can monitor the availability of servers.

To enable this type of health monitoring in your nginx.conf file the location that passes requests to the group should include the health_check directive. In addition, the server group should also be dynamically configurable with the zone directive:

http {
    upstream backend {
        zone backend 64k;

        server backend1.example.com;
        server backend2.example.com;
        server backend3.example.com;
        server backend4.example.com;
    }

    server {
        location / {
            proxy_pass http://backend;
            health_check;
        }
    }
}

This configuration defines a server group and a virtual server with a single location that passes all requests to a server group. It also enables health monitoring with default parameters. In this case, every 5 seconds NGINX Plus sends the “/” requests to each server in the backend group. If any communication error or timeout occurs (or a proxied server responds with a status code other than 2xx or 3xx) the health check fails for this proxied server. Any server that fails a health check is considered unhealthy, and NGINX Plus stops sending client requests to it until it once again passes a health check.

The zone directive defines a memory zone that is shared among worker processes and is used to store the configuration of the server group. This enables the worker processes to use the same set of counters to keep track of responses from the servers in the group. The zone directive also makes the group dynamically configurable.

This behavior can be overridden using the parameters of the health_check directive:

location / {
    proxy_pass http://backend;
    health_check interval=10 fails=3 passes=2;
}

Here, the duration between 2 consecutive health checks has been increased to 10 seconds using the interval parameter. In addition, a server will be considered unhealthy after 3 consecutive failed health checks by setting the fails=3 parameter. Finally, using the passes parameter, we have made it so that a server needs to pass 2 consecutive checks to be considered healthy again.

It is possible to set a specific URI to request in a health check. Use the uri parameter for this purpose:

location / {
    proxy_pass http://backend;
    health_check uri=/some/path;
}

The provided URI will be appended to the server domain name or IP address specified for the server in the upstream directive. For example, for the first server in the backend group declared above, a health check request will have the http://backend1.example.com/some/path URI.

Finally, it is possible to set custom conditions that a healthy response should satisfy. The conditions are specified in the match block, which is defined in the match parameter of the health_check directive.

http {
    ...

    match server_ok {
        status 200-399;
        body !~ "maintenance mode";
    }

    server {
        ...

        location / {
            proxy_pass http://backend;
            health_check match=server_ok;
        }
    }
}

Here a health check is passed if the response has the status in the range from 200 to 399, and its body does not match the provided regular expression.

The match directive allows NGINX Plus to check the status, header fields, and the body of a response. Using this directive it is possible to verify whether the status is in the specified range, whether a response includes a header, or whether the header or body matches a regular expression. The match directive can contain one status condition, one body condition, and multiple header conditions. To correspond to the match block, the response must satisfy all of the conditions specified within it.

For example, the following match directive looks for responses that have status code 200, contain the Content-Type header with the exact value text/html, and have the text “Welcome to nginx!” in the body:

match welcome {
    status 200;
    header Content-Type = text/html;
    body ~ "Welcome to nginx!";
}

In the following example of using the exclamation point (!), conditions match responses where the status code is anything other than 301, 302, 303, and 307, and Refresh is not among the headers.

match not_redirect {
    status ! 301-303 307;
    header ! Refresh;
}

Health checks can also be enabled for non-HTTP protocols, such as FastCGI, uwsgi, SCGI, and memcached.

Sharing Data with Multiple Worker Processes

If the upstream directive does not include the zone directive, each worker process keeps its own copy of the server group configuration and maintains its own set of related counters. The counters include the current number of connections to each server in the group and the number of failed attempts to pass a request to a server. As a result, the server group configuration isn’t changeable.

If the upstream directive does include the zone directive, the configuration of the server group is placed in a memory area shared among all worker processes. This scenario is dynamically configurable, because the worker processes access the same copy of the group configuration and utilize the same related counters.

The zone directive is mandatory for health checks and on-the-fly reconfiguration of the server group. However, other features of the server groups can benefit from the use of this directive as well.

For example, if the configuration of a group is not shared, each worker process maintains its own counter for failed attempts to pass a request to a server (see the max_fails parameter). In this case, each request gets to only one worker process. When the worker process that is selected to process a request fails to transmit the request to a server, other worker processes don’t know anything about it. While some worker process can consider a server unavailable, others may still send requests to this server. For a server to be definitively considered unavailable, max_fails multiplied by the number of workers processes of failed attempts should happen within the timeframe set by fail_timeout. On the other hand, the zone directive guarantees the expected behavior.

The least_conn load balancing method might not work as expected without the zone directive, at least on small loads. This method of tcp and http load balancing passes a request to the server with the least number of active connections. Again, if the configuration of the group is not shared, each worker process uses its own counter for the number of connections. And if one worker process passes by a request to a server, the other worker process can also pass a request to the same server. However, you can increase the number of requests to reduce this effect. On high loads requests are distributed among worker processes evenly, and the least_conn load balancing method works as expected.

Setting the Size for the Zone

There are no exact settings due to quite different usage patterns. Each feature, such as sticky cookie/route/learn load balancing, health checks, or re-resolving will affect the zone size.

For example, the 256 Kb zone with the sticky_route session persistence method and a single health check can hold up to:

128 servers (adding a single peer by specifying IP:port);
88 servers (adding a single peer by specifying hostname:port, hostname resolves to single IP);
12 servers (adding multiple peers by specifying hostname:port, hostname resolves to many IPs).

Configuring HTTP Load Balancing Using DNS

The configuration of a server group can be modified at run time using DNS.

NGINX Plus can monitor changes of IP addresses that correspond to a domain name of the server and automatically apply these changes to NGINX without its restart. This can be done with the resolver directive which must be specified in the http block, and the resolve parameter of the server directive in a server group:

http {
    resolver 10.0.0.1 valid=300s ipv6=off;
    resolver_timeout 10s;

    server {
        location / {
            proxy_pass http://backend;
        }
    }
   
    upstream backend {
        zone backend 32k;
        least_conn;
        ...
        server backend1.example.com resolve;
        server backend2.example.com resolve;
    }
}

In the example, the resolve parameter of the server directive will periodically re-resolve backend1.example.com and backend2.example.com servers into IP addresses. By default, NGINX re-resolves DNS records basing on their TTL, but the TTL value can be overridden with the valid parameter to the resolver directive, in our example it is 5 minutes.

The optional ipv6=off parameter allows resolving only to IPv4 addresses, though both IPv4 and IPv6 resolving is supported.

If a domain name resolves to several IP addresses, the addresses will be saved to the upstream configuration and load-balanced. In our example, the servers will be load balanced according to the least_conn load balancing method. If one or more IP addresses has been changed or added/removed, then the servers will be re-balanced.

Load Balancing of Microsoft Exchange Servers

In Release 7 and later, NGINX Plus can proxy Microsoft Exchange traffic to a server or a group of servers and load balance it.

To set up load balancing of Microsoft Exchange Servers:

In a location, configure proxying to Microsoft Exchange upstream server group with the proxy_pass directive:
```
location / {
    proxy_pass https://exchange;
    ...
}
```
In order for Microsoft Exchange connections to pass to the upstream servers, in the location block set the proxy_http_version directive value to 1.1, and the proxy_set_header directive to “Connection “” ”, just like for a keepalive connection:
```
location / {
    ...
    proxy_http_version 1.1;
    proxy_set_header   Connection "";
    ...
}
```
In the http block, configure a load balancing group with Microsoft Exchange servers. Specify the upstream directive with the name of the server group previously specified in the proxy_pass directive. Then specify the ntlm directive that will allow the group to accept the requests with NTLM authentication:
```
http {
    ...
    upstream exchange {
        zone exchange 64k;
        ntlm;
        ...
    }
}
```

Add Microsoft Exchange servers to the upstream group and optionally specify a http load balancing method:

http {
    ...
    upstream exchange {
        zone exchange 64k;
        ntlm;
        server exchange1.example.com;
        server exchange2.example.com;
        ...
    }
}

NTLM Example

http {
    ...
    upstream exchange {
        zone exchange 64k;
        ntlm;
        server exchange1.example.com;
        server exchange2.example.com;
    }

    server {
        listen              443 ssl;
        ssl_certificate     /etc/nginx/ssl/company.com.crt;
        ssl_certificate_key /etc/nginx/ssl/company.com.key;
        ssl_protocols       TLSv1 TLSv1.1 TLSv1.2;

        location / {
            proxy_pass         https://exchange;
            proxy_http_version 1.1;
            proxy_set_header   Connection "";
        }
    }
}

For more information about configuring Microsoft Exchange and NGINX Plus, see the Load Balance Microsoft Exchange Servers guide (PDF).

On-the-Fly Configuration

With NGINX Plus, the configuration of a server group can be modified on-the-fly using the HTTP interface. A configuration command can be used to view all servers or a particular server in a group, modify parameter for a particular server, and add or remove servers.

Setting Up the On-the-Fly Configuration

Include the zone directive in the upstream block. The zone directive configures a zone in the shared memory and sets the zone name and size. The configuration of the server group is kept in this zone, so all worker processes use the same configuration:

http {
    ...
    upstream appservers {
        zone appservers 64k;
        server appserv1.example.com      weight=5;
        server appserv2.example.com:8080 fail_timeout=5s;
        server reserve1.example.com:8080 backup;
        server reserve2.example.com:8080 backup;
    }
}

Place the upstream_conf directive in a separate location:

server {
    location /upstream_conf {
        upstream_conf;
        ...
    }
}

It is highly recommended restricting access to this location, for example, allow access only from the 127.0.0.1 address:

server {
    location /upstream_conf {
        upstream_conf;
        allow 127.0.0.1;
        deny  all;
    }
}

A complete example of this configuration:

http {
    ...
    # Configuration of the server group
    upstream appservers {
        zone appservers 64k;

        server appserv1.example.com      weight=5;
        server appserv2.example.com:8080 fail_timeout=5s;

        server reserve1.example.com:8080 backup;
        server reserve2.example.com:8080 backup;
    }

    server {
        # Location that proxies requests to the group
        location / {
            proxy_pass http://appservers;
            health_check;
        }

        # Location for configuration requests
        location /upstream_conf {
            upstream_conf;
            allow 127.0.0.1;
            deny  all;
        }
    }
}

In the example, the access to the second location is allowed only from the 127.0.0.1 IP address. Access from all other IP addresses is denied.

Configuring Persistence of On-the-Fly Configuration

The configuration from the previous example allows storing the on-the-fly changes only in the shared memory. These changes will be discarded any time NGINX Plus configuration file is reloaded.

To make these changes persistent across configuration reloads, you will need to move the list of upstream servers from the upstream block to a special file that will keep the state of the upstream servers. The path to the file is set with the state directive. Recommended path for Linux distributions is /var/lib/nginx/state/, for FreeBSD distributions is /var/db/nginx/state/:

http {
    ...
    upstream appservers {
        zone appservers 64k;
        state /var/lib/nginx/state/appservers.conf;

        # All these servers should be moved to the file using the upstream_conf API:
        # server appserv1.example.com      weight=5;
        # server appserv2.example.com:8080 fail_timeout=5s;
        # server reserve1.example.com:8080 backup;
        # server reserve2.example.com:8080 backup;
    }
}

Keep in mind that this file can be modified only with configuration commands from the upstream_conf API interface, modifying the file directly should be avoided.

Configuring Upstream Servers On-the-Fly

To pass a configuration command to NGINX, send an HTTP request. The request should have an appropriate URI to get into the location that includes the upstream_conf directive. The request should also include the upstream argument set to the name of the server group.

For example, to view all backup servers (marked with backup) in the group, send:

http://127.0.0.1/upstream_conf?upstream=appservers&backup=

To add a new server to the group, send a request with add and server arguments:

http://127.0.0.1/upstream_conf?add=&upstream=appservers&server=appserv3.example.com:8080&weight=2&max_fails=3

To remove a server, send a request with the remove command and the id argument identifying the server:

http://127.0.0.1/upstream_conf?remove=&upstream=appservers&id=2

To modify a parameter of a specific server, send a request with the id argument identifying the server and the parameter:

http://127.0.0.1/upstream_conf?upstream=appservers&id=2&down=

See the upstream_conf module for more examples.

Table of Contents