FreeBSD Nginx Performance

At $work, we have been looking at Nginx Plus because unlike the free open-source version of Nginx, Plus has load-balancing features. We’re pretty excited about that because it means we could potentially replace our reliance on HAProxy. Below is a simple graphic showing the potential infrastructure change.

With an nginx plus license, you can eliminate the need for haproxy

The next thing we had to do was determine which platform we would be using for our test of Nginx Plus. That ended up being limited to just two choices based on our constraints — must be physical hardware and must support HTTP/2 and ALPN.

While the system requirements page doesn’t say this directly, we were informed that the only configuration currently supporting HTTP/2 and ALPN on physical hardware is FreeBSD 11.0 and Ubuntu 16.04 LTS. Others such as CentOS were disqualified because, for example, the OpenSSL version was too old to support ALPN.

With a little bit of work in massaging our PXE server (resulting in a complete rewrite of pxe-config for FreeBSD deployment automation), we had the FreeBSD systems deployed in a matter of days. The generation of a Debian Installer preseed for Ubuntu proved to be far more time-consuming and challenging, but after a week we had the Ubuntu systems deployed as well.

We then registered for a free 30-day trial of Nginx Plus and went through the installation instructions for each Operating System. The steps for FreeBSD and Ubuntu are similar wherein you configure your system to add their package server and use the standard package utilities (pkg on FreeBSD and apt-get on Ubuntu) to install a binary package.

With FreeBSD 11.0-RELEASE-p1 and Ubuntu 16.04 LTS on identical servers in the same rack/switch, we were ready to do some benchmarking to help us determine whether Nginx Plus can deliver equitable performance in a load-balanced array compared with an haproxy cluster sitting in front of the free and open-source version of Nginx.

To generate the right type of load for our performance benchmark, we are using hey by Jaana B. Dogan. Below is the usage statement from hey -h:

Usage: hey [options...] <url>

Options:
  -n  Number of requests to run. Default is 200.
  -c  Number of requests to run concurrently. Total number of requests cannot
      be smaller than the concurrency level. Default is 50.
  -q  Rate limit, in seconds (QPS).
  -o  Output type. If none provided, a summary is printed.
      "csv" is the only supported alternative. Dumps the response
      metrics in comma-separated values format.

  -m  HTTP method, one of GET, POST, PUT, DELETE, HEAD, OPTIONS.
  -H  Custom HTTP header. You can specify as many as needed by repeating the flag.
      For example, -H "Accept: text/html" -H "Content-Type: application/xml" .
  -t  Timeout for each request in seconds. Default is 20, use 0 for infinite.
  -A  HTTP Accept header.
  -d  HTTP request body.
  -D  HTTP request body from file. For example, /home/user/file.txt or ./file.txt.
  -T  Content-type, defaults to "text/html".
  -a  Basic authentication, username:password.
  -x  HTTP Proxy address as host:port.
  -h2 Enable HTTP/2.

  -host	HTTP Host header.

  -disable-compression  Disable compression.
  -disable-keepalive    Disable keep-alive, prevents re-use of TCP
                        connections between different HTTP requests.
  -cpus                 Number of used cpu cores.
                        (default for current machine is 8 cores)
  -more                 Provides information on DNS lookup, dialup, request and
                        response timings.

NOTE: Although our requirements include HTTP/2 and hey has the -h2 flag to enable HTTP/2, our performance benchmarks will be using HTTP/1[.1] because our current edge infrastructure to which we can make comparisons does not yet support HTTP/2.

The command that we used to test the performance of our setup is as follows:

hey -n 3000 -c 300 -m GET -disable-keepalive <url>

This asks hey to perform a total of 3000 HTTP/1[.1] GET requests for <url> with up to 300 concurrent requests.

When <url> points to our vanilla Ubuntu 16.04 LTS test box running Nginx Plus, the results are as follows:

Nginx Plus and Ubuntu
The hey command, hammering a vanilla Ubuntu 16.04 LTS server running Nginx Plus

When <url> points instead to our vanilla FreeBSD 11.0-RELEASE-p1 test box, the results are as follows:

Nginx Plus and FreeBSD
The hey command, hammering a vanilla FreeBSD 11.0-RELEASE-p1 server running Nginx Plus

Focusing on the Summary Total, we can see that vanilla FreeBSD takes 3x longer than Ubuntu and something is causing high response times in FreeBSD.

SPOILER: After much digging, it was discovered that the nginx binary for Nginx Plus was linking against an OpenSSL that was compiled from ports without the ASM optimizers (highly optimized Assembly routines for speeding up calculations on supported CPU architectures).

The instructions for installing Nginx Plus on FreeBSD include “pkg install nginx-plus” and this brings in security/openssl from ports instead of using the OpenSSL that comes with the FreeBSD base Operating System. This is generally a good thing because ports are updated more frequently than base which helps keep Nginx Plus up-to-date with the latest OpenSSL.

The standard UNIX utility ldd shows us that /usr/local/sbin/nginx (as-installed by the nginx-plus package) links not against the system’s /usr/lib/libssl.so.8 but instead the non-base (read: ports) version located at /usr/local/lib/libssl.so.9

However, as we will see in the below photo, both the system OpenSSL in /usr/bin and the ports OpenSSL in /usr/local/bin are the same version compiled on the same calendar day (despite having different shared library suffixes).

Command Line
Nginx Plus links to the ports OpenSSL

Though the two versions of OpenSSL appear to be the same, they are actually quite different. When OpenSSL is compiled with ASM optimizations, it can take full advantage of AES-NI and PCLMULQDQ, two important CPU instructions that increase the efficiency of cryptographic calculations.

Command Line
Testing OpenSSL for AES-NI and PCLMULQDQ to ensure ASM optimizations in support of CPU based crypto

The OPENSSL_ia32cap environment variable is a bit-mask of OpenSSL capabilities which allows us to disable AES-NI and PCLMULQDQ. Combining the values of ~0x200000000000000 (disable AES-NI) and ~0x200000000 (disable PCLMULQDQ) to get ~0x200000200000000, we can disable both AES-NI and PCLMULQDQ for individual runs of “openssl speed“.

In the below two commands, if your CPU supports AES-NI and your OpenSSL has been compiled with ASM optimizations, the first command will be many times faster than the second (wherein optimizations are disabled if available).

% openssl speed -elapsed -evp aes-256-cbc
% env OPENSSL_ia32cap="~0x2000002000000000" openssl speed -elapsed -evp aes-256-cbc

NOTE: See https://wiki.freebsd.org/SSHPerf for additional details.

As one might expect, the OpenSSL in /usr/bin shows a huge performance increase when AES-NI is not disabled. It was quite the shock to find that the ports OpenSSL in /usr/local/bin showed no differences in performance between the two commands.

We, the FreeBSD committers, took a look at the security/openssl port and discovered that it did not enable ASM optimizations by default at the time the binary packages were last compiled for FreeBSD 11. So I worked with the maintainer of port to fix that for the next time the packages get recompiled.

Review: D9480: security/openssl: Enable ASM by default
Submitted: rP433671

The next step is to determine the impact that an un-optimized OpenSSL has on our hey tests. The FreeBSD dynamic linker supports configurable dynamic object mapping through libmap.conf(5), so it is a fairly simple matter of telling /usr/local/sbin/nginx to use a different OpenSSL.

Creating /usr/local/etc/libmap.d/nginx.conf with the following contents will cause nginx to use the OpenSSL libraries that came with the base Operating System:

#
# origin		target
#
[/usr/local/sbin/nginx]
libssl.so.9		libssl.so.8
libcrypto.so.9		libcrypto.so.8

After creating this file and restarting nginx with “service nginx restart“, the hey performance tests now show FreeBSD ahead of Ubuntu in a head-to-head test.

To better illustrate the effects that the unoptimized OpenSSL had on the hey benchmarks, I wrote a utility that generates JSON from the output.

NOTE: While there are many ways to benchmark, this test focused on “time to completion” for 3000 requests with up-to 300 concurrent. The JSON generated depicts the non-linear approach toward completion.

Wrapper script for hey named hey_genlog for generating a log capable of being converted into JSON:

#!/bin/sh
############################################################ IDENT(1)
#
# $Title: Script to generate statistics from hey against a host $
# $Copyright: 2017 Devin Teske. All rights reserved. $
# $Smule$
#
############################################################ INFORMATION
#
# Statistics are logged to stdout. Use hey2graph to generate JSON.
# JSON is designed for highcharts/highstock API.
#
############################################################ CONFIGURATION

#
# hey utility from https://github.com/rakyll/hey
#
HEY=hey

#
# File to request
#
FILE=/aud1.m4a

#
# Total number of requests to perform
#
TOTAL=3000

#
# Maximum number of concurrent requests
#
CONCURRENT=300

#
# QoS rate limiting
# NB: Set to NULL to disable rate limiting
#
#RATE_LIMIT=1 # seconds
RATE_LIMIT= # seconds

#
# Should we use Secure-HTTP (https)?
# NB: Set to NULL to disable https
#
SECURE=1

############################################################ GLOBALS

pgm="${0##*/}" # Program basename

#
# Global exit status
#
SUCCESS=0
FAILURE=1

#
# Command-line arguments
#
HOST=$1

############################################################ FUNCTIONS

usage()
{
	exec >&2
	printf "Usage: %s HOST\n" "$pgm"
	exit $FAILURE
}

############################################################ MAIN

case "$HOST" in
"") usage ;; # NOTREACHED
*:*) : fall through ;;
*)
	if [ "$SECURE" ]; then
		HOST="$HOST:443"
	else
		HOST="$HOST:80"
	fi
esac
echo "Performing $TOTAL total requests"
echo "Maximum $CONCURRENT concurrent requests"
set -x
$HEY \
	-n $TOTAL \
	-c $CONCURRENT \
	${RATE_LIMIT:+-q $RATE_LIMIT} \
	-m GET \
	-disable-keepalive \
	http${SECURE:+s}://$HOST$FILE |
	awk -v cmd="date +%s.%N" '
		BEGIN {
			cmd | getline start
			close(cmd)
		}
		/requests done/ {
			cmd | getline date
			close(cmd)
			date = sprintf("%0.4f", date - start)
			sub(/^/, date " ")
		}
		1
	' # END-QUOTE

################################################################################
# END
################################################################################

Sample output:

[root@tc1.sf.smle.co ~]# ./hey_genlog a291.sf.smle.co | tee hey.freebsdbase.log
Performing 3000 total requests
Maximum 300 concurrent requests
0.5026 51 requests done.
1.0026 235 requests done.
1.5028 429 requests done.
2.0029 610 requests done.
2.5028 832 requests done.
3.0033 1025 requests done.
3.5031 1224 requests done.
4.0033 1427 requests done.
4.5038 1617 requests done.
5.0037 1816 requests done.
5.5039 2015 requests done.
6.0039 2240 requests done.
6.5041 2431 requests done.
7.0041 2639 requests done.
7.5041 2874 requests done.
7.7668 All requests done.

Summary:
  Total:	7.7650 secs
  Slowest:	4.9109 secs
  Fastest:	0.0157 secs
  Average:	0.6898 secs
  Requests/sec:	386.3489
  Total data:	8580021000 bytes
  Size/request:	2860007 bytes

Status code distribution:
  [200]	3000 responses

Response time histogram:
  0.016 [1]	|
  0.505 [1253]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.995 [1174]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  1.484 [360]	|∎∎∎∎∎∎∎∎∎∎∎
  1.974 [140]	|∎∎∎∎
  2.463 [52]	|∎∎
  2.953 [13]	|
  3.442 [5]	|
  3.932 [0]	|
  4.421 [1]	|
  4.911 [1]	|

Latency distribution:
  10% in 0.2235 secs
  25% in 0.3667 secs
  50% in 0.5729 secs
  75% in 0.8668 secs
  90% in 1.3381 secs
  95% in 1.6279 secs
  99% in 2.3221 secs

Script for converting output from hey_genlog into JSON, hey2graph:

#!/usr/bin/awk -f
BEGIN { fmt = "[%0.4f, %u],\n"; printf fmt, 0, 0 }
/total requests/ { total = $2 }
/Total:/ { time = $2 }
/requests done/ && $2 ~ /^[0-9]+$/ { printf fmt, $1, $2 - y; y = $2 }
END { printf fmt, time, total - y }

Sample output:

[root@tc1.sf.smle.co ~]# ./hey2graph hey.freebsdbase.log
[0.0000, 0],
[0.5026, 51],
[1.0026, 184],
[1.5028, 194],
[2.0029, 181],
[2.5028, 222],
[3.0033, 193],
[3.5031, 199],
[4.0033, 203],
[4.5038, 190],
[5.0037, 199],
[5.5039, 199],
[6.0039, 225],
[6.5041, 191],
[7.0041, 208],
[7.5041, 235],
[7.7650, 126],

The process of generating JSON graph data was performed for Ubuntu, FreeBSD with ports OpenSSL, and FreeBSD with base OpenSSL. As you can see in the below graph, FreeBSD with base OpenSSL is the fastest with Ubuntu very close behind, and FreeBSD with an unoptimized ports OpenSSL coming in at 3x slower.

Satisfied that we had eliminated the performance issue causing FreeBSD to be 3x slower, we now asked why is Ubuntu slower than FreeBSD?

Intensive throughput benchmarks showed that FreeBSD is capable of reaching 87.1% line-rate while Ubuntu was only capable of 86.5%. Both systems given an Intel 10GE network interface, FreeBSD appears to be utilizing the hardware more efficiently. At the switch, we can see that FreeBSD is 99.1% efficient on 10GE, resulting in a measured 10.3% TCP overhead at the time of testing.

Switch Graph
FreeBSD line rate test on Intel 10GE, switch-level throughput graph

The result of our testing is that FreeBSD running Nginx Plus is a suitable replacement for our HaProxy and Nginx topology. You and everyone reading this won’t have to worry about the documented issue with OpenSSL because I worked with Bernard Spil and Allan Jude to get it fixed in the FreeBSD ports tree. The security/openssl port has been updated to enable ASM optimizations by default and fairly soon the binary packages will be rebuilt — until then, you can use the above /usr/local/etc/libmap.d/nginx.conf file to temporarily use the base OpenSSL if you’re unable to update /usr/ports/security/openssl/Makefile and recompile it yourself.

Cheers!