The DTN (Data Transfer Node) Focus Group has performed several tests on GTS (GÉANT Testbed Service) in order to get useful results that the NRENs can compare and replicate for their own tests. The following matrix shows a summary of the set-up or each test, the parameters tuned, the software installed and the performance achieved, as well as the links to the information on how to install each software in GTS. Finally, some comments related to the setup and the test are included.
Setting up DTN tests on GTS
The “GÉANT Testbeds Service” (GTS) provides the user with definite experimental networks at the network research community. The aim of GTS is the testing of novel networking and telecommunications concepts, at scale, and across a geographically practical European footprint. In terms of the GTS Users, GTS is intended to aid research teams exploring novel SDN-based solutions and requiring a high performance distributed infrastructure. GTS can furthermore be utilized by application and software development teams needing an isolated testbed to demonstrate their designs without affecting live internet traffic. GTS is rationally isolated from the production GÉANT network to guarantee the integrity of live applications and can support multiple isolated networks concurrently permitting teams to work without affecting each other [GTS].
The following figure shows the GTS nodes setup map in Europe:
Section | |||||
---|---|---|---|---|---|
|
Section | |||||
---|---|---|---|---|---|
|
The tests run in the GTS testbed were:
- Virtual Machines, short distance (AMS-AMS):
- 1 CPU
- 2 CPU
- 4 CPU
- Virtual machines, long distance (AMS-LON):
- 1 CPU
- 2 CPU
- 4 CPU
- Bare metal servers, short distance (HAM-PAR).
- Bare metal servers, long distance (LON-PRA):
- R430
- R520
- Dockerised environment on bare metal servers, short distance (HAM-PAR).
- Dockerised environment on bare metal servers, long distance (LON-PRA).
The DTN Focus Group has performed several tests on GTS in order to get useful results that the NRENs can compare and replicate for their own tests using up to four Bare Metal Servers (BMS) with two different setups. Both BMS setups were connected directly with 10Gbps links. It has also produced examples of tests and setups in Bare Metal Server (BMS) and virtualised environments using both VMs (provided as setup from the GTS testbed administration page) and Dockers.
Simplified tables:
Virtual machine | 1 CPU | 2 CPU | 4 CPU | |||
Nodes/Tools | AMS-AMS | AMS-LON | AMS-AMS | AMS-LON | AMS-AMS | AMS-LON |
iPerf | 9.90 Gb/s | 9.90 Gb/s | 9.90 Gb/s | 9.90 Gb/s | 9.90 Gb/s | 9.90 Gb/s |
gridFTP | 8.30 Gb/s | 8.36 Gb/s | 8.86 Gb/s | 8.47 Gb/s | 8.50 Gb/s | 7.51 Gb/s |
FDT | 9.32 Gb/s | 7.90 Gb/s | 9.19 Gb/s | 8.49 Gb/s | 8.98 Gb/s | 7.77 Gb/s |
Xrootd | 2.60 Gb/s | 2.60 Gb/s | 2.60 Gb/s | 2.60 Gb/s | 2.60 Gb/s | 2.60 Gb/s |
Hardware testing | Docker | |
R430 | R430 | |
Nodes/Tools | HAMB-PAR | LON-PRA |
iPerf | 9.2 Gb/s | 9.0 Gb/s |
gridFTP | 8.53 Gb/s | 8.50 Gb/s |
FDT | 8.87 Gb/s | 8.70 Gb/s |
Xrootd | 8.00 Gb/s | 8.00 Gb/s |
Hardware testing | Bare Metal Servers (BMSs) | ||
R430 | R520 | R430 | |
Nodes/Tools | HAMB-PAR | LON-PRA | LON-PRA |
iPerf | 9.41 Gb/s | 9.32 Gb/s | 9.43 Gb/s |
gridFTP | 8.58 Gb/s | 3.30 Gb/s | 8.52 Gb/s |
FDT | 9.39 Gb/s | 4.12 Gb/s | 9.39 Gb/s |
Xrootd | 8.00 Gb/s | 3.13 Gb/s | 7.99 Gb/s |
More test details:
The tests started with DTNs using several software tools installed in virtual machines (VMs) with 1, 2 or 4 CPUs and followed with the same software tools in Bare Metal Servers (BMSs).
Hardware: two types of Bare Metal Servers (BMS) were used:
- Dell R430 (2x20C/40T Intel® Xeon® E5-2660v3 @ 2,6 GHz 25 MB cache, 128 Gb ECC DDR4 2133 MHz RAM, 6xSSD, 372 GB, 6,0 Gb/s HDD)
- Dell R520 (1x8C/16T Intel® Xeon® E5-2450v2 @ 2,5 GHz 10 MB cache, 32Gb ECC DDR3 1600 MHz RAM, 2xSSD, 372 GB, 6,0 Gb/s HDD).
O/S used For Client and server: Ubuntu-18.04-server
Performance metric: Performance measured as the effective transfer rate (Data File Size / Elapsed Real Time). For the measuring of the transfers, this investigation measured the time that a specific huge (i.e. 250GB, 420GB) transfer is taking and it calculated the bandwidth utilization, based on the size of the file and the time it took to move from the client to server.
Parameters tuned:
sudo sysctl -w net.ipv4.tcp_rmem="4096 16777216 2147483647"
sudo sysctl -w net.ipv4.tcp_wmem="4096 16777216 2147483647"
In the first investigation, DTNs on virtualised environments using VMs were tested. For the direct measurement of the links between the client site and the server, ipref was used. The results in virtualised environments showed that all programs evaluated achieve high bandwidth utilization except XRootd. The reason for not achieving high bandwidth utilization was the virtualisation itself. Xrootd is highly dependant on hardware resources and the virtualised environment with the virtual hypervisor on the physical system and the lack of direct access of Xrootd to resources (i.e. the physical link, not direct access to disks and buffer memory) had as result the low bandwidth utilization. On the other hand, the rest of the DTN services and client programs investigated (i.e. FDT, gridFTP) achieved higher bandwidth utilization. This first experiment was executed in virtual servers with different distances between the client and the server. In this case, both VMs were in Amsterdam for the shortest distance tests and one VM was in Amsterdam and the other one in London for a the longest distance test.Additionally, our investigation shown that in VM testing better results are achieved with VM with 2xCPU than 4xCPU. The aforesaid differences in results are due to different reasons. More specifically, it is due to different server architecture, Hyper-V VM Load Balancing, or that DTN tools do not know "how to" better use the CPU resources. It is important to note that this investigation did not have any access to change VM parameters.
As mentioned before, the results may be affected by the fact that the links are not dedicated and other tests could have been running in GTS simultaneously. This could explain why the tests run between Amsterdam and London with 4 CPUs offered worse results than all the rest, as they were run in different days. When doing the tests, they were repeated at different time slots, to see if a pattern could be found regarding performance, but no significant changes were found.
Set-up (VMs or BMS and location)1 | Number of CPU/each | Operating System | Links | Parameters tunned | DTN Software | Other software installed | Performance | How-to install this tool on GTS |
---|---|---|---|---|---|---|---|---|
| 1 CPU | Ubuntu 18.04 LTS | 10 Gbps | net.ipv4.tcp_rmem="4096 16777216 2147483647" net.ipv4.tcp_wmem="4096 16777216 2147483647" | gridFTP (https://opensciencegrid.org/technology/policy/gridftp-gsi-migration/) | iPerf | 8.30 Gb/s | apt install globus-gass-copy-progs globus-gridftp-server-progs |
| 8.36 Gb/s | |||||||
| 2 CPU | 8.86 Gb/s | ||||||
| 8.47 Gb/s | |||||||
| 4 CPU | 8.50 Gb/s | ||||||
| 7.51 Gb/s | |||||||
| 1 CPU | Ubuntu 18.04 LTS | 10 Gbps | net.ipv4.tcp_rmem="4096 16777216 2147483647" net.ipv4.tcp_wmem="4096 16777216 2147483647" | FDT (https://github.com/fast-data-transfer/fdt) | 9.32 Gb/s | apt install default-jre wget http://monalisa.cern.ch/FDT/lib/fdt.jar | |
| 7.90 Gb/s | |||||||
| 2 CPU | 9.19 Gb/s | ||||||
| 8.49 Gb/s | |||||||
| 4 CPU | 8.98 Gb/s | ||||||
| 7.77 Gb/s | |||||||
| 1 CPU | Ubuntu 18.04 LTS / CentOS 8 | 10 Gbps | Xrootd (https://xrootd.slac.stanford.edu/) | 2.6 Gb/s | Ubuntu: https://root.cern.ch/installing-xrootd xrootd -I v4 -p 1094 -b | ||
| 2.6 Gb/s | |||||||
| 2 CPU | 2.6 Gb/s | ||||||
| 2.6 Gb/s | |||||||
| 4 CPU | 2.6 Gb/s | ||||||
| 2.6 Gb/s | |||||||
| 1 CPU | CentOS 8 | 10 Gbps | Xrootd (https://xrootd.slac.stanford.edu/) | 1.7 Gb/s | Centos: | ||
| 1.6 Gb/s | |||||||
| 2 CPU | 2.2 Gb/s | ||||||
| 2 Gb/s | |||||||
| 4 CPU | 5.3 Gb/s | ||||||
| 5.3 Gb/s |
1: AMS: Amsterdam; LON: London; BMS: Bare Metal Server; VM: Virtual Machine
At the next investigation, the examination of DTN services (same as above) focused on the usage of physical Bare Metal Serves (BMS). At the beginning, the performance of the DTN services and applications at BMS was examined at a short distance (Hamburg-Paris), with the latest technology servers available in GTS (Dell R430). The results for all the DTN services and applications were good an all the services performed well in terms of transferring huge amounts of data among them (as usual, more than 250GB). Also, hardware was not a bottleneck at the process.
Afterwards, longer distance data transfers were also investigated over the GTS test bed between London and Prague with the use of BMS. The London and Prague nodes, with longest optical distance and BMS availability, were used with a combination of Dell R430 and Dell 520 hardware. Therefore, the available system was not the same as for the previous tests, it was used a Dell R520 that was acting as a client with a Dell R430 that was acting as a server. The Dell 520 is weaker in terms of CPU and Memory than the R430 that was acting as client in the previous tests. The results of the test (as shown in the results table) showed that the bandwidth utilization dropped for all the DTN services and applications, highlighting the importance of the hardware to get the best results, which also highlight the importance of the hardware for data transfer. An important note that needs to be mentioned is that during the test run with Dell R520 Geant GTS team were implementing hardware replacements and made additional network configurations so it also contributed to poorer results.
At the final investigation, long-distance transfers among London and Prague with the use of BMS were investigated. However, in this review the available system was the same as the short-range previous examinations (Dell R430). The results of bandwidth utilization showed that nearly identical measurements can be achieved as the ones between Hamburg and Prague. Distances in Europe are not relevant enough to show any differences as the ones achieved with South-Africa or Australia in the Aeneas project, for instance [AENEAS].
Overall, the best data rates achieved with 10 Gbps links were obtained with FDT, reaching 9.4 Gbps, gridFTP followed with 8.9 Gb/s and at the end XrootD with 8.0 Gb/s. From the observations, all services need the servers to have DISKs RAIDS, that are as fast as 6 GB/s in order to achieve the required speed. Also, there is a large demand in memory for the packet transfers to be saved in the memory buffer and then at the disks.Also, it worth mentioning that BMS and "Dockerized BMS" have similar results in the tests applied at a BMS environment.
Hardware:
Dell R430 server (Prague) 2x20C/40T Intel® Xeon® E5-2660v3 @ 2,6 GHz 25 MB cache, 128 Gb ECC DDR4 2133 MHz RAM, 6xSSD, 372 GB, 6,0 Gb/s HDD
Dell R520 (London) 1x8C/16T Intel® Xeon® E5-2450v2 @ 2,5 GHz 10 MB cache, 32Gb ECC DDR3 1600 MHz RAM, 2xSSD, 372 GB, 6,0 Gb/s HDD
Bare Metal Server Testbed | - | Operating System | Links | Parameters tunned | DTN Software | Other software installed | Performance | How-to install this tool on GTS | Comments |
| Ubuntu-18.04-server | 10 Gbps | XrootD | iPerf (9,41 Gb/s) | 8 Gb/s | Ubuntu: https://root.cern.ch/installing-xrootd xrootd -I v4 -p 1094 -b | |||
| Ubuntu-18.04-server | 10 Gbps | gridFTP | apt install globus-gass-copy-progs globus-gridftp-server-progs | Performance measured as the effective transfer rate (Data File Size / Elapsed Real Time) | ||||
| Ubuntu-18.04-server | 10 Gbps | FDT | Java Runtime Environment 8+ | wget http://monalisa.cern.ch/FDT/lib/fdt.jar | Performance measured as the effective transfer rate (Data File Size / Elapsed Real Time) | |||
Bare Metal Server Testbed Dell R430 (Prague) Dell R520 (London) | Operating System | Links | Parameters tunned | DTN Software | Other software installed | Performance | How-to install this tool on GTS | ||
| Ubuntu-18.04-server | 10 Gbps | # allow testing with buffers up to 64MB
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
#
# increase Linux autotuning TCP buffer limit t 32MB
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432
#
# recommended default congestion control is htcp
net.ipv4.tcp_congestion_control=htcp
#
# recommended for hosts with jumbo frames enabled
net.ipv4.tcp_mtu_probing=1
#
# recommended for CentOS7+/Debian8+ hosts
net.core.default_qdisc = fq
ifconfig ethXXX txqueuelen 10000
ifconfig ethXXX mtu 8986 | XrootD | 3.128 Gb/s | ||||
| Ubuntu-18.04-server | 10 Gbps | gridFTP | 3.304 Gb/s | |||||
| Ubuntu-18.04-server | 10 Gbps | FDT | 4.114 Gb/s |
Dockerised Tests
Hardware: Dell R430 server (2x20C/40T Intel® Xeon® E5-2660v3 @ 2,6 GHz 25 MB cache, 128 Gb ECC DDR4 2133 MHz RAM, 6xSSD, 372 GB, 6,0 Gb/s HDD)
O/S used For Client and Dockerised server: Ubuntu-18.04-server
Devops Software used: Docker.
Performance metric: Performance measured as the effective transfer rate (Data File Size / Elapsed Real Time)
Parameters tuned:
sudo sysctl -w net.ipv4.tcp_rmem="4096 16777216 2147483647"
sudo sysctl -w net.ipv4.tcp_wmem="4096 16777216 2147483647"
In the first investigation, Paris had the Dockerised environment with the services and Hamburg was the client to run the client script. This was the setup with the shortest distance between the client and the server.
In the second investigation, London had the Dockerised environment with the services and Prague was the client to run client script. This was the setup with the longest distance between the client and the server. Comparing both tables, anyone can notice that there is not a major change from the shorter distance BMS test. However, there was a slightly reduced bandwidth.
From the ctop results we can see that each container required more memory and cpu than the docker already assigned to it. This is due to the requirements of the DTN services.
For Dockerised environments, the best data rates were achieved also with FDT, reaching 9.2 Gbps, gridFTP followed with 8.6 Gb/s and at the end XrootD with 8.1 Gb/s. As well as for the regular tests, all services need the servers to have DISKs RAIDS, that are as fast as 6 GB/s in order to achieve the required speed. Also, there is a large demand in memory for the packet transfers to be saved in the memory buffer and then at the disks.
Bare Metal Server Testbed (Dockerised Environment) | - | Operating System | Links | Parameters tunned | DTN Software | Other software installed | Performance | How-to install this tool on GTS | Report | Comments |
BMS Hamburg - BMS Paris | Ubuntu-18.04-server | 10 Gbps | sudo sysctl -w net.ipv4.tcp_rmem="4096 16777216 2147483647" sudo sysctl -w net.ipv4.tcp_wmem="4096 16777216 2147483647" | XrootD | bmon/iftop 8.1 Gb/s | 8.00 Gb/s | Use Provided Scripts | Performance measured as the effective transfer rate (Data File Size / Elapsed Real Time) | ||
BMS Hamburg - BMS Paris | Ubuntu-18.04-server | 10 Gbps | sudo sysctl -w net.ipv4.tcp_rmem="4096 16777216 2147483647" sudo sysctl -w net.ipv4.tcp_wmem="4096 16777216 2147483647" | gridFTP | bmon/iftop 8.6 Gb/s | 8.53 Gb/s | Use Provided Scripts | Performance measured as the effective transfer rate (Data File Size / Elapsed Real Time) | ||
BMS Hamburg - BMS Paris | Ubuntu-18.04-server | 10 Gbps | sudo sysctl -w net.ipv4.tcp_rmem="4096 16777216 2147483647" sudo sysctl -w net.ipv4.tcp_wmem="4096 16777216 2147483647" | FDT Java Runtime Environment 8+ | bmon/iftop 9.2 Gb/s | 8.87 Gb/s | Use Provided Scripts | Performance measured as the effective transfer rate (Data File Size / Elapsed Real Time) | ||
BMS Hamburg - BMS Paris | Ubuntu-18.04-server | 10 Gbps | sudo sysctl -w net.ipv4.tcp_rmem="4096 16777216 2147483647" sudo sysctl -w net.ipv4.tcp_wmem="4096 16777216 2147483647" | iPerf | bmon/iftop 9.2 Gb/s | 9.2 Gb/s | Use Provided Scripts | Performance measured as the effective transfer rate (Data File Size / Elapsed Real Time) |
Bare Metal Server Testbed (Dockerised Environment) | - | Operating System | Links | Parameters tunned | DTN Software | Other software installed | Performance | How-to install this tool on GTS | Comments |
BMS London - BMS Prague | Ubuntu-18.04-server | 10 Gbps | sudo sysctl -w net.ipv4.tcp_rmem="4096 16777216 2147483647" | XrootD | bmon/iftop 8.1 Gb/s | 8.00 Gb/s | Use Provided Scripts | Performance measured as the effective transfer rate (Data File Size / Elapsed Real Time) | |
BMS London - BMS Prague | Ubuntu-18.04-server | 10 Gbps | sudo sysctl -w net.ipv4.tcp_rmem="4096 16777216 2147483647" | gridFTP | bmon/iftop 8.6 Gb/s | 8.50 Gb/s | Use Provided Scripts | Performance measured as the effective transfer rate (Data File Size / Elapsed Real Time) | |
BMS London - BMS Prague | Ubuntu-18.04-server | 10 Gbps | sudo sysctl -w net.ipv4.tcp_rmem="4096 16777216 2147483647" | FDT Java Runtime Environment 8+ | bmon/iftop 9.2 Gb/s | 8.70 Gb/s | Use Provided Scripts | Performance measured as the effective transfer rate (Data File Size / Elapsed Real Time) | |
BMS London - BMS Prague | Ubuntu-18.04-server | 10 Gbps | sudo sysctl -w net.ipv4.tcp_rmem="4096 16777216 2147483647" | iPerf | bmon/iftop 9.2 Gb/s | 9.0 Gb/s | Use Provided Scripts | Performance measured as the effective transfer rate (Data File Size / Elapsed Real Time) |
Comparing the results achieved with and without Docker, with a different number of CPUs in a VMs and with long and short distances, shown in the tables above, some comments to summarise them are: