Benchmark TCP/IP, Unix domain socket and Named pipe

For local calls via gRPC, we have better choices for transports than TCP/IP: Unix domain socket and Named pipe.

CONTENTS

Why you should consider UDS and named pipe?
Benchmark
- Server
- Client
- Results
  - Reqs/Sec
  - Avg Latency (ms)
Conclusion

Why you should consider UDS and named pipe?

Both Unix domain socket and Named pipe (AKA fifos) offer better performance than TCP/IP because they can avoid the networking overhead. In our scenario, we have a variable number of worker processes in a single server. Managing ports and firewalls is a nightmare. Instead, UDS uses file paths instead of IP addresses and port numbers, which can simplify configuration and setup. Same for named pipes.

Benchmark

The setup is quite similar to Compare the performance of HTTP/1.1, HTTP/2 and HTTP/3 except that we are testing a gRPC service here instead of a web service.

Server

Server code is based on the template generated by dotnet new grpc -o GrpcGreeter with the 2 changes below:

Besides the default TCP/IP mode, I add Unix domain socket and Named pipe. The server will listen to the 3 protocols at the same time.
- Inter-process communication with gRPC and Unix domain sockets
- Inter-process communication with gRPC and Named pipes
Add a new rpc method Download that returns requested number of bytes (varying from 10 bytes to 1 MB) to reveal the real world scenario

Client

Client is a wrk like benchmark tool for gRPC written in c#

Key parameters:

3 runs: the final result shown below is the median of 3 runs
in each run, we request data with different sizes, varying from 10 bytes to 1 mb
for each size, the test is running for 20 seconds and established 400 concurrent connections in 5 threads

Results

Full output of benchmarking TCP/IP

PS C:\Users\xuryan> GrpcGreeterClient.exe -U https://localhost:5001
===================================
Round 1/3
===================================
Running 20s test @ https://localhost:5001 with size 10
5 threads and 400 connections
3093920 requests in 20s, 0.03GB read
Requests/sec: 154696
Average Latency: 2.59ms

Running 20s test @ https://localhost:5001 with size 100
5 threads and 400 connections
3367059 requests in 20s, 0.31GB read
Requests/sec: 168352
Average Latency: 2.38ms

Running 20s test @ https://localhost:5001 with size 1000
5 threads and 400 connections
3064905 requests in 20s, 2.85GB read
Requests/sec: 153245
Average Latency: 2.61ms

Running 20s test @ https://localhost:5001 with size 10000
5 threads and 400 connections
1308877 requests in 20s, 12.19GB read
Requests/sec: 65443
Average Latency: 6.11ms

Running 20s test @ https://localhost:5001 with size 100000
5 threads and 400 connections
330798 requests in 20s, 30.81GB read
Requests/sec: 16539
Average Latency: 24.19ms

Running 20s test @ https://localhost:5001 with size 1000000
5 threads and 400 connections
32037 requests in 20s, 29.84GB read
Requests/sec: 1601
Average Latency: 250.98ms

===================================
Round 2/3
===================================
Running 20s test @ https://localhost:5001 with size 10
5 threads and 400 connections
3360259 requests in 20s, 0.03GB read
Requests/sec: 168012
Average Latency: 2.38ms

Running 20s test @ https://localhost:5001 with size 100
5 threads and 400 connections
3339618 requests in 20s, 0.31GB read
Requests/sec: 166980
Average Latency: 2.40ms

Running 20s test @ https://localhost:5001 with size 1000
5 threads and 400 connections
2998046 requests in 20s, 2.79GB read
Requests/sec: 149902
Average Latency: 2.67ms

Running 20s test @ https://localhost:5001 with size 10000
5 threads and 400 connections
1391264 requests in 20s, 12.96GB read
Requests/sec: 69563
Average Latency: 5.75ms

Running 20s test @ https://localhost:5001 with size 100000
5 threads and 400 connections
331975 requests in 20s, 30.92GB read
Requests/sec: 16598
Average Latency: 24.11ms

Running 20s test @ https://localhost:5001 with size 1000000
5 threads and 400 connections
31251 requests in 20s, 29.10GB read
Requests/sec: 1562
Average Latency: 257.07ms

===================================
Round 3/3
===================================
Running 20s test @ https://localhost:5001 with size 10
5 threads and 400 connections
3367724 requests in 20s, 0.03GB read
Requests/sec: 168386
Average Latency: 2.38ms

Running 20s test @ https://localhost:5001 with size 100
5 threads and 400 connections
3354996 requests in 20s, 0.31GB read
Requests/sec: 167749
Average Latency: 2.38ms

Running 20s test @ https://localhost:5001 with size 1000
5 threads and 400 connections
3001872 requests in 20s, 2.80GB read
Requests/sec: 150093
Average Latency: 2.66ms

Running 20s test @ https://localhost:5001 with size 10000
5 threads and 400 connections
1361260 requests in 20s, 12.68GB read
Requests/sec: 68063
Average Latency: 5.88ms

Running 20s test @ https://localhost:5001 with size 100000
5 threads and 400 connections
320817 requests in 20s, 29.88GB read
Requests/sec: 16040
Average Latency: 24.95ms

Running 20s test @ https://localhost:5001 with size 1000000
5 threads and 400 connections
31752 requests in 20s, 29.57GB read
Requests/sec: 1587
Average Latency: 253.54ms

Final results:
10, 168012, 2.38
100, 167749, 2.38
1000, 150093, 2.66
10000, 68063, 5.88
100000, 16539, 24.19
1000000, 1587, 253.54
PS C:\Users\xuryan>

The value of Url parameter is unix:C:\Users\xuryan\AppData\Local\Temp\greeter.tmp for UDS and pipe:/PipeGreeter for pipe.

The final results are shown in the table below:

Reqs/Sec

Response Size (byte)	TCP/IP	UDS	Named Pipe
10	168012	192494	191990
100	167749	191069	192055
1000	150093	173510	151084
10000	68063	62253	47306
100000	16539	24515	7565
1000000	1587	2472	813

Avg Latency (ms)

Response Size (byte)	TCP/IP	UDS	Named Pipe
10	2.38	2.08	2.08
100	2.38	2.09	2.08
1000	2.66	2.31	2.65
10000	5.88	6.43	8.46
100000	24.19	16.32	52.92
1000000	253.54	162.25	496.73

compare the performance of TCP/IP, UDS and named pipe

Conclusion

UDS has a significant improvement over TCP/IP. Specifically, we observed a rise of around 15% in the Queries Per Second (QPS) when the response is relatively small, and an impressive increase of 50% for responses as large as 100KB. That is, UDS performs much better when we need to transmit large data. Named pipes is as efficient as UDS for small response body but it's getting much worse for large response body. In a word, we should use as UDS as much possible for local calls.