A summary of the performance settings recommended to configure DFTS with a v7 card kit.
These 2 parameters should be set to 0, and they are by default:
- INTRA_PKT_INTERVAL
- Intra Packet Interval is the number of times a write will occur before it will sleep for the INTRA_PKT_DELAY time.
- A value of 1 will force the system to delay at the INTRA_PKT_DELAY for every packet.
- A value of 10 will only delay for every 10 packets, etc.
- In DFTS 3.0.0.2, this is set to 0 by default, in owlDftsSend script
- INTRA_PKT_DELAY
- Intra Packet Delay is the delay in milliseconds (msec) that the application will sleep after an INTRA_PKT_INTERVAL number of packets have been sent to the driver.
- Each file may take multiple packets before the entire file has been transferred.
- This allows time on the receiving side to process the packet and to perform the necessary validation before continuing to the next packet.
- In DFTS 3.0.0.2, this is set to 0 by default, in owlDftsSend script
To make files go faster, I recommend you change these settings:
- fsinterval
- Forced Sleep Interval defines the number of MBs sent before sleeping.
- In DFTS 3.0.0.2, owlDftsSend script, this parameter is not set by default. Therefore, default value of 32 (MB) is used.
- To change this, go to line owlDftsSend line 207, and put in the parameter + value like:
-
OPT_FSINTERVAL=" -fsinternal 100"
-
- fsdelay
- Forced Sleep Delay defines the number of milliseconds to sleep at fsinterval MB.
- In DFTS 3.0.0.2, owlDftsSend script, this parameter is not set by default. Therefore, default value of 1000 (ms) is used. (1 second)
- To make things go faster, lower this value, all the way to 0 if you want.
-
to change this, go to line 208, and put in the parameter + value like:
-
OPT_FSDELAY=" -fsdelay 0"
-
These changes should enable to you to increase performance (for sufficiently large files) from ~30 MB/s to over 400 MB/s.
File transfer failure is almost always caused by the receive side DFTS software not being able to write the file to disk quickly enough. If you disk/RAID write speed can't keep up with data from the diode, buffers will fill up, and data will be lost eventually.
Sleep Algorithm:
For files over 100 MB, the DFTS software, by default, sleeps for 1- 20 seconds at the start of a file transfer. This was implemented because some customers experienced transfer failures when pushing multiple files with the same name. The DFTS receive software tries to rename the old file before writing the new one. While this should be a very fast operation, we found that it could take several seconds in some environments. This delay was enough to cause data loss. If you are not concerned about this scenario, you can disable this feature to eliminate the sleep. This is done in owlDftsSend, line 209, by changing line:
OPT_SLEEPALG=""
to:
OPT_SLEEPALG=" -NOSLEEPALG"
File Hash Verification
The default behavior for DFTS is to hash the file on the send side, and send that hash over the diode after the file itself. Then the receive side recalculates the hash to verify integrity. We have found this application layer hash check to be redundant. There is already a hash at the data diode hardware level, and there is already a data diode packet counter to detect dropped packets. Based on past experience, this hash check is not needed in DFTS.
Disabling the hash algorithm increases max throughput from over 400MB/s to over 1000 MB/s. To disable this, edit owlDftsSend, line 195 from:
DEFAULT_VERTYPE="B2B"
to:
DEFAULT_VERTYPE="NONE"
Summary of changes, excerpt from owlDftsSend:
Now that all of the changes have been described individually, here is what they will look like in the owlDftsSend script together:
######################################################################
#
# Default DFTS (required) parameters
#
######################################################################
DEFAULT_VERTYPE="NONE"
BUFF_FULL_DELAY=0
INTRA_PKT_INTERVAL=0
INTRA_PKT_DELAY=0
EOF_DELAY=0
REPORT_OPTION=S
FILE_LOCK=N
######################################
# DFTS (optional) parameters
#####################################
# more pacing parameters
OPT_FSINTERVAL=" -fsinterval 0"
OPT_FSDELAY=" -fsdelay 0"
OPT_SLEEPALG="-NOSLEEPALG"
DFTS receive process priority:
Increasing the priority of the dd_file_receive process can theoretically increase reliability. This would allow the dd_file_receive process to have CPU and IO priority over other system processes. This can be done with the "renice" linux command.
File Write and Diode Read times in receive logs.
For larger files, you will see statistics every 100 MB, like:
10/02-15:52:54 Info: test.txt, Size: 104,857,600 (bytes) . . . receiving (send pausing 3 seconds)
10/02-15:53:02 Rcvd: 104,857,600 (100.00%) Rate: 22.665 (MB/s) Dur: 00:00:05 DR: 2.19/880 msec FW: 0.06/1 msec
10/02-15:53:02 File: test.txt, Trans: 104,857,600 (bytes), Time: 00:00:07.335, Rate: 13.633 (MB/s), Key: 0x6999CF78, B2B Passed
DR – (Diode Read) The average/peak times in milliseconds for DIODE READ during the reporting interval.
FW – (File Write) The average/peak times in milliseconds for FILE WRITE during the reporting interval.
The FW is the file write time (average/peak) in ms, and the DR is the diode read time average/peak, also in ms. You will see the FW times creeping up if the disk writes start to take longer. You will also see the DR times go down, as the driver already has data waiting when the application requests it. When the DR time are 0.01/1 or 0.00/1, you know that the driver already has data when the application finishes writing to disk and asks for more data. (the peak time is always reported as 1 or higher, never 0) In this condition, the driver buffers are likely getting filled already. These numbers are useful to observe when tuning your system for optimal performance.
Reliability concerns:
With the changes described above, we were able to get large file performance of up to 1099 MB/s over the diode. Note that the Owl V7 high capacity card driver reserves 1G of RAM for a buffer. This is only about 1 second of data. Therefore, while this setup can work reliably, it is sensitive to any disruption in disk write speed performance. Any setup that relies on network storage introduces more potential risk, as network congestion or bursts of other IO on that network storage may degrade performance enough to lose data.