Linux sc-gns3 3.11.0-17-generic #31-Ubuntu SMP Mon Feb 3 21:52:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
The default maximum Linux TCP buffer sizes are way too small. TCP memory is calculated automatically based on system memory; you can find the actual values by typing the following commands:
/proc/sys/net/ipv4/tcp_rmem - memory reserved for TCP rcv buffers
/proc/sys/net/ipv4/tcp_wmem - memory reserved for TCP snd buffers
These are arrays of three values: minimum, initial and maximum buffer size.
$ cat /proc/sys/net/ipv4/tcp_rmem
4096 87380 6291456
$ cat /proc/sys/net/ipv4/tcp_wmem
4096 16384 4194304
Adjust the default socket buffer size for all TCP connections by setting the middle tcp_rmem value to the calculated BDP .
# tcpdump -i eth1 -nn 'tcp[tcpflags] & tcp-syn != 0'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
x > 37.221.174.2.80: Flags [S], seq 3487034889, win 29200, options [mss 1460,sackOK,TS val 38133976 ecr 0,nop,wscale 7], length 0
37.221.174.2.80 > x: Flags [S.], seq 43484932, ack 3487034890, win 14480, options [mss 1460,sackOK,TS val 2074630628 ecr 38133976,nop,wscale 7], length 0
TCP-Window-Size-in-bits / Latency-in-seconds = Bits-per-second-throughput
BDP = Bandwidth * RTT
Bandwidth = 50 Megabits per seconds = 50*10^6 bits / seconds
RTT = 2.7 milliseconds = 0.0027 seconds
BDP = 50*10^6 * 0.0027 = 135000 bits = 16875 bytes
# ping buc.voxility.net
rtt min/avg/max/mdev = 13.444/13.486/13.539/0.076 ms
# wget --report-speed=bits -O /dev/null http://buc.voxility.net/10GB.bin
15% [============> ] 1.665.467.968 934Mb/s eta 82s
BDP= 1000*10^6 * 0.0134=13.4 *10^6 bits = 1675000 bytes = 1.6 MBytes
# ping fra.voxility.net
rtt min/avg/max/mdev = 40.927/40.944/40.960/0.128 ms
# wget --report-speed=bits -O /dev/null http://fra.voxility.net/10GB.bin
7% [=====> ] 771.948.504 550Mb/s eta 2m 48s
BDP= 1000*10^6 * 0.0409=40.9 *10^6 bits = 5112500 bytes = 4.87 MBytes
# ping lon.voxility.net
rtt min/avg/max/mdev = 55.588/55.636/55.677/0.028 ms
wget --report-speed=bits -O /dev/null http://lon.voxility.net/10GB.bin
9% [=======> ] 1.068.301.920 414Mb/s eta 3m 32s
# ping mia.voxility.net
PING mia.voxility.net (5.254.114.10) 56(84) bytes of data.
rtt min/avg/max/mdev = 170.595/170.674/170.895/0.487 ms
# wget --report-speed=bits -O /dev/null http://mia.voxility.net/10GB.bin
4% [==> ] 473.993.752 140Mb/s eta 12m 36s
# ping was.voxility.net
rtt min/avg/max/mdev = 166.263/166.405/166.535/0.422 ms
# wget --report-speed=bits -O /dev/null http://was.voxility.net/10GB.bin
5% [===> ] 550.957.688 143Mb/s eta 11m 22s
# ping lax.voxility.net
rtt min/avg/max/mdev = 206.140/210.484/215.703/4.683 ms
# wget --report-speed=bits -O /dev/null http://lax.voxility.net/10GB.bin
5% [===> ] 609.654.936 109Mb/s eta 15m 12s
BDP= 1000*10^6 * 0.2105=210.5 *10^6 bits = 26312500 bytes = 25.1 MBytes
# tcpdump -i eth1 -nn 'tcp[tcpflags] & tcp-syn != 0'
me > dst: Flags [S], seq 370607417, win 29200, options [mss 1460,sackOK,TS val 38641052 ecr 0,nop,wscale 9], length 0
dst > me: Flags [S.], seq 2345490013, ack 370607418, win 14480, options [mss 1460,sackOK,TS val 2076658892 ecr 38641052,nop,wscale 7], length 0
dst advertise max TCP window size = 14480 * wscale 7 = 14480*128=1853440 bytes = 1.76 MBytes = 14.82 *10^6 bits
as my delay is 210.5 ms, my max TCP speed would be limited by dst as:
BDP= 1000*10^6 * 0.0409=40.9 *10^6 bits = 5112500 bytes = 4.87 MBytes
14.82 *10^6 / 0.2105 = 70.43 mbps
Linux host
wget http://www.lcp.nrl.navy.mil/nuttcp/beta/nuttcp-7.2.1.c
cc -O3 -o nuttcp nuttcp-7.2.1.c
nuttcp -S
http://www.lcp.nrl.navy.mil/nuttcp/nuttcp-6.1.2/nuttcp-6.1.2.c
http://www.lcp.nrl.navy.mil/nuttcp/nuttcp-6.1.2/Makefile
yum groupinstall "Development Tools"
yum install kernel-devel kernel-headers
yum install gcc gcc-c++ autoconf automake
service iptables stop
TCP tuning requires expert knowledge
Application problems:
Inefficient or inappropriate application designs
Operating System or TCP problems:
Negotiated TCP features (SACK, WSCALE, etc)
Failed MTU discovery
Too small retransmission or reassembly buffers
Network problems:
Packet losses, congestion, etc
Packets arriving out of order or even duplicated
“Scenic” IP routing or excessive round trip times
Improper packet sizes limits (MTU)
Nearly all symptoms scale with RTT
Examples of flaws that scale
Chatty application (e.g., 50 transactions per request)
On 1ms LAN, this adds 50ms to user response time
On 100ms WAN, this adds 5s to user response time
Fixed TCP socket buffer space (e.g., 32kBytes)
On a 1ms LAN, limit throughput to 200Mb/s
On a 100ms WAN, limit throughput to 2Mb/s
Packet Loss (e.g., 0.1% loss at 1500 bytes)
On a 1ms LAN, models predict 300 Mb/s
On a 100ms WAN, models predict 3 Mb/s
http://wiki.maxgigapop.net/twiki/bin/view/MAX/PerformanceTuning
http://code.google.com/p/perfsonar-ps/
http://www.wcisd.hpc.mil/tools/
www.speedguide.net/analyzer.php
http://www.linuxfoundation.org/collaborate/workgroups/networking/tcptesting
http://speedtest.umflint.edu/
# netstat -bdhI vmx3f0 1
/etc/rc.conf
hostname="nuttcp-intern-219"
defaultrouter="x.x.x.217"
ifconfig_vmx3f0=" inet x.x.x.219 netmask 255.255.255.248"
/etc/inetd.conf
nuttcp stream tcp nowait/3/10 nobody /usr/local/bin/nuttcp /usr/local/bin/nuttcp -S
#nuttcp stream tcp nowait/3/10 nobody /usr/local/bin/nuttcp /usr/local/bin/nuttcp -S -w1m
/etc/hosts.allow
nuttcp : ALL : allow
/etc/services
nuttcp 5000/tcp
nuttcp-data 5001/tcp
nuttcp6 5000/tcp
nuttcp6-data 5001/tcp
* The Network Diagnostic Tool (NDT) is a client/server program that provides network configuration and performance testing to a users desktop or laptop computer. The system is composed of a client program (command line or java applet) and a pair of server programs (a webserver and a testing/analysis engine). Both command line and web-based clients communicate with a Web100-enhanced server to perform these diagnostic functions. Multi-level results allow novice and expert users to view and understand the test results.
* NPAD (Network Path and Application Diagnosis) is designed to diagnose network performance problems in your end-system (the machine your browser is running on) or the network between it and your nearest NPAD server. For each diagnosed problem, the server prescribes corrective actions with instructions suitable for non-experts.
TCP throughput basics
TCP rate is influenced by packet loss, packet size/MTU, round trip time and host window size.
Intro
-----------------
The Internet has been optimized for
- millions of users behind low speed connections
- thousands of high bandwidth servers serving millions of low speed streams
Single high-speed to high-speed flows get little commercial attention
UDP
speed = packets * length*8/packet
Offsets | Octet | 0 | 1 | 2 | 3 | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Octet | Bit | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
0 | 0 | Source port | Destination port | ||||||||||||||||||||||||||||||
4 | 32 | Sequence number | |||||||||||||||||||||||||||||||
8 | 64 | Acknowledgment number (if ACK set) | |||||||||||||||||||||||||||||||
12 | 96 | Data offset | Reserved | N S | C W R | E C E | U R G | A C K | P S H | R S T | S Y N | F I N | Window Size | ||||||||||||||||||||
16 | 128 | Checksum | Urgent pointer (if URG set) | ||||||||||||||||||||||||||||||
20 ... | 160 ... | Options (if Data Offset > 5,padded at end with "0" bytes if necessary) |
Window size (16 bits) – the size of the receive window, which specifies the number of bytes (beyond the sequence number in the acknowledgment field) that the receiver is currently willing to receive (see Flow control and Window Scaling)
The RWIN size specifies how much data, which can be sent to the receiver without getting an acknowledge from the receiver.
This allows the sender to send several packets without waiting for acknowledge.
It also allows the receiver to acknowledge several packets in one stroke.
http://smallvoid.com/article/tcpip-rwin-size.html
---------------------------------------------------------------------------------------------------
test http://lg.softlayer.com/#
http://wikitech.wikimedia.org/view/Main_Page
http://ndb1.internet2.edu/perfAdmin/directory.cgi
http://www.measurementlab.net/who
http://onlamp.com/pub/a/onlamp/2005/11/17/tcp_tuning.html?page=2
http://www.erg.abdn.ac.uk/~gorry/eg3567/lan-pages/enet-calc.html
How to achieve Gigabit speeds with Linux
http://datatag.web.cern.ch/datatag/howto/tcp.html
High Performance Networking for Data Grids
http://cseminar.web.cern.ch/cseminar/2000/1025/slides.pdf
http://www.wcisd.hpc.mil/~phil/sc2006/M07-2_files/frame.htm
http://fasterdata.es.net/fasterdata/host-tuning
http://www.icir.org/models/tools.html
http://staff.psc.edu/mathis/papers/JET200102
http://staff.psc.edu/mathis/papers/Cisco200102/mgp00006.gif
BDP Bandwidth-delay product - an amount of data measured in bits (or bytes), is equivalent to the maximum amount of data on the network circuit at any given time, i.e. data that has been transmitted but not yet received.
BDP (bits) = link_bandwidth (bits/ses) * RTT (sec)
1.7 MB = 100Mbps*0.140s
TCP performance mostly depends on the congestion control algorithm on the sender side.
The throughput is very dependant on the TCP window size.
For max TCP performance TCP: RWin must > BDP
Throughput = TCP maximum receive windowsize / RTT
For long distance paralel streams must be tested.
Perfomance issues:
1) RTT
2) Host TCP/IP stack implementation (def RWin, TCP Auto-Scaling RWin)
3) Network buffer length (small is good 128KB)
4) Errors, if exists (High-loss networks can dramatically decrease TCP throughput because of frequent timeouts and retransmissions.)
The central problem with high-speed TCP communication over long distances is that it becomes very fragile to packet losses. http://proj.sunet.se/E2E/
FreeBSD
Command | Description |
sysctl net.inet.tcp.rfc1323=1 | Activate window scaling and timestamp options according to RFC 1323. |
sysctl kern.ipc.maxsockbuf=[sbmax] | Set maximum size of TCP window. |
sysctl net.inet.tcp.recvspace=[wstd] | Set default size of TCP receive window. |
sysctl net.inet.tcp.sendspace=[wstd] | Set default size of TCP transmit window. |
sysctl kern.ipc.nmbclusters | View maximum number of mbuf clusters. Used for storage of data packets to/from the network interface. Can only be set att boot time - see above. |
57# sysctl net.inet.tcp.cc.algorithm
net.inet.tcp.cc.algorithm: newreno
net.inet.tcp.recvbuf_max=16777216 # TCP receive buffer space
net.inet.tcp.sendbuf_max=16777216 # TCP send buffer space net.inet.tcp.recvspace=8192 # decrease buffers for incoming data net.inet.tcp.sendspace=16384 # decrease buffers for outgoing data
net.inet.tcp.sendspace: 32768 net.inet.tcp.recvspace: 65536
GNU/Linux:
Linux
The table below shows which parameters that may need to be changed. These are true for both 2.4 and 2.6 kernels. With theese changes, it is possible to get results in the same order as our NetBSD tests, with the exception that Linux will do a data copy in the transmit path so the transmitting machine will be more loaded.Command | Description |
echo "1" > /proc/sys/net/ipv4/tcp_window_scaling | Activate window scaling according to RFC 1323 |
echo "1" > /proc/sys/net/ipv4/tcp_timestamps | Activate timestamps according to RFC 1323 |
echo [wmax] > /proc/sys/net/core/rmem_max | Set maximum size of TCP receive window. |
echo [wmax] > /proc/sys/net/core/wmem_max | Set maximum size of TCP transmit window. |
echo [wmax] > /proc/sys/net/core/rmem_default | Set default size of TCP receive window. |
echo [wmax] > /proc/sys/net/core/wmem_default | Set default size of TCP transmit window. |
echo "[wmin] [wstd] [wmax]" > /proc/sys/net/ipv4/tcp_rmem | Set min, default, max receive window. Used by the autotuning function. |
echo "[wmin] [wstd] [wmax]" > /proc/sys/net/ipv4/tcp_wmem | Set min, default, max transmit window. Used by the autotuning function. |
echo "bmin bdef bmax" > /proc/sys/net/ipv4/tcp_mem | Set maximum total TCP buffer-space allocatable. Used by the autotuning function. |
ifconfig eth? txqueuelen 1000 | Define length of transmit queue. Replace "?" with actual interface number. |
$ sysctl -a | grep tcp
### IPV4 specific settings
# sets min/default/max TCP read buffer, default 4096 87380 174760 # setting to 100M - 10M is too small for cross country (chsmall)sysctl net.ipv4.tcp_window_scaling
net.ipv4.tcp_window_scaling = 1
sysctl net.ipv4.tcp_rmem
# sets minimum/initial/maximum buffer size TCP buffer spacenet.ipv4.tcp_rmem = 4096 87380 4194304
sysctl net.ipv4.tcp_wmem
net.ipv4.tcp_wmem = 4096 16384 4194304
### CORE settings (for socket and UDP effect) # maximum receive socket buffer size, default 131071sysctl net.core.rmem_max
net.core.rmem_max = 131071
# maximum send socket buffer size, default 131071sysctl net.core.wmem_max
net.core.wmem_max = 131071
# default receive socket buffer size, default 65535sysctl net.core.rmem_default
net.core.rmem_default = 137216
# default send socket buffer size, default 65535 sysctl net.core.wmem_default net.core.wmem_default = 137216
Windows 2000/XP/2003
Tests performed with Windows 2003 give results similar to the NetBSD results. To achieve this we had to change some variables in the registry. Registry entry | Description |
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Tcp1323Opts=1 | Turn on window scaling option |
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpWindowSize=[wmax] | Set maximum size of TCP window |
$ nuttcp/nuttcp-v5.1.3.x86fc1 -i1 -T10 -w192 -N15 -l128 serverIP
1.5345 MB / 1.00 sec = 12.8597 Mbps
10.0659 MB / 1.00 sec = 84.5236 Mbps
10.5265 MB / 1.00 sec = 88.3023 Mbps
10.6707 MB / 1.00 sec = 89.5117 Mbps
10.6053 MB / 1.00 sec = 88.9634 Mbps
10.5425 MB / 1.00 sec = 88.4367 Mbps
10.7289 MB / 1.00 sec = 89.9994 Mbps
10.6979 MB / 1.00 sec = 89.7402 Mbps
10.6509 MB / 1.00 sec = 89.3452 Mbps
10.6613 MB / 1.00 sec = 89.4330 Mbps
96.9509 MB / 10.02 sec = 81.1363 Mbps 71 %TX 9 %RX
$ nuttcp/nuttcp-v5.1.3.x86fc1 -i1 -T10 -w192 -N15 -l128 -r -F serverIP
0.6702 MB / 1.00 sec = 5.6167 Mbps
2.5525 MB / 1.00 sec = 21.4133 Mbps
3.7084 MB / 1.00 sec = 31.1112 Mbps
4.1675 MB / 1.00 sec = 34.9621 Mbps
4.8322 MB / 1.00 sec = 40.5363 Mbps
5.4730 MB / 1.00 sec = 45.9174 Mbps
6.2402 MB / 1.00 sec = 52.3511 Mbps
7.7106 MB / 1.00 sec = 64.6867 Mbps
8.3600 MB / 1.00 sec = 70.1350 Mbps
9.4098 MB / 1.00 sec = 78.8619 Mbps
54.9774 MB / 10.18 sec = 45.3209 Mbps 6 %TX 9 %RX
Ethernet/IP/TCP
ETHERNET/IP/TCP
^^^^^^|****[=======|~~~~~~~~~~|-----------|..................DATA....................|==]ETHERNET/VLAN/IP/TCP
12 8 14 4 20 20 6-1460 4
^ IFG 12 bytes
* Preamble 8 bytes
= Ethernet Header/Trailer 18 bytes (14+4)
~ IP header - 20 bytes
- TCP header 20 bytes
. DATA 46-1500 bytes
^^^^^^|****[=======|++|~~~~~~~~~~|-----------|..................DATA....................|==]
12 8 14 4 20 20 6-1460
^ IFG 12 bytes
* Preamble 8 bytes
= Ethernet Header/Trailer 18 bytes (14+4)
++ VLAN ID 4 bytes (Optional)
~ IP header - 20 bytes
- TCP header 20 bytes
. Ethernet DATA 46-1500 bytes
https://docs.google.com/viewer?a=v&q=cache:MDyCzPw7zhcJ:www.wcisd.hpc.mil/~phil/sc2002/SC2002M12.pdf+insite:nuttcp&hl=mo&gl=md&pid=bl&srcid=ADGEESjFZ0s3tnL3J8bE5vjqr62AKrIh_rh009PlN9_-9rfD3VrkI66e83c5hqqw_CP-Z8n_-XNh9JQcktm5DQGrMwuLNf--_IjXjjQbCV5K45D-ssQMsX94PY_GZdgs-k1erQmyXf3s&sig=AHIEtbTblTlKQPzPx7wLZ5y-3nzK6E3fyw
TCP ThroughputBandwidth*Delay Product and TCP
(window/rtt)
• The smallest of three windows determines
throughput
– sbuf, or sender side socket buffers
– rwin, the receive window size
– cwin, TCP congestion window
• Receive window (rwin) and/or sbuf are still the
most common performance limiters
– E.g. 8kB window, 87 msec ping time = 753 kbps
– E.g. 64kB window, 14 msec rtt = 37 Mbps
Dykstra, SC2002
67
• TCP needs a receive window (rwin) equal
to or greater than the BW*Delay product to
achieve maximum throughput
• TCP needs sender side socket buffers of
2*BW*Delay to recover from errors
• You need to send about 3*BW*Delay bytes
for TCP to reach maximum speed
Dykstra, SC2002
69
System Tuning: FreeBSD
# FreeBSD 3.4 defaults are 524288 max, 16384 default
/sbin/sysctl -w kern.ipc.maxsockbuf=1048576
/sbin/sysctl -w net.inet.tcp.sendspace=32768
/sbin/sysctl -w net.inet.tcp.recvspace=32768
Dykstra, SC2002
77
Important Points About TCP
• TCP is adaptive
• It is constantly trying to go faster
• It slows down when it detects a loss
• How much it sends is controlled by windows
• When it sends is controlled by received ACK’s
(or timeouts)
- Packet loss
Each packet lost must be retransmitted. This uses extra network bandwidth and can delay the transmission of any future packets as the endpoints 'recover' from the lost packet. - Packet size
IP payload is encapsulated in an IP header [20 bytes] and either a udp [8 byte] or more typically a tcp [20 byte] header. Ethernet header size is 38 bytes untagged, or 42 bytes .1q tagged. An ethernet frame is typically 1500 bytes maximum. Therefore, the largest typical IP payload is 1500 [ethernet mtu] - 38 [ethernet header] - 20 [tcp header] - 20 [ip header] = 1422. (1422/1500) * 100 = 94.8%, meaning that in general, the -best case scenario- is that 100% - 94.8% = 5.2% of your ethernet frame is overhead. As the size of the payload decreases, the more bandwidth gets 'wasted' on overhead.
The advantage of small payloads is decreased latency between payloads. This historically was more important with low link speeds and real time applications. It takes 0.21 sec to queue a 1500 byte packet down a 56k modem [1500 byte * 8 bits in a byte = 12000 bits. 12000 bits / 56000 = 0.21]. This would be unacceptable latency for a VOIP call. It only takes 0.0012 sec to queue the same packet down a 10Mb ethernet, and 0.0000012 sec to queue on a 10Gb ethernet. Note that the time to queue a packet is different than the time it takes to transmit a packet a certain distance.
In internet time, 56k was ages ago. In 'real time', it wasn't that long ago. Many internet users still use this type of access. As link speeds have increased, there has not, in general, been an increase in maximum ethernet MTU because all hosts on an ethernet LAN -must- agree on the maximum MTU, otherwise, they are -not- able to communicate on the wire. - Round trip time and host window size
You can't beat the speed of light [yet]. Latency, or round trip time, is a consideration for UDP or TCP applications. However, TCP is much more sensitive to round trip time than UDP because of the TCP host window. The TCP host window, in a nutshell, is a buffer that describes how much TCP data for a given host can be in transit at any given time without hearing an 'ACK' back from the receiver. There are many schemes to controlling the size of this window, and instead of going into an exhaustive discussion of those schemes, I will focus on a single example that illustrates what the TCP window is and why round trip time is important.
Lets say that you have 100 items that you need to send. There is a contraption between your current location and the destination that consists of 200 containers, and each container can hold one item. Let's assume that it takes 1 hour for this contraption to make a 'full cycle'. At any given time, 100 containers are headed towards the destination, and 100 containers are headed towards the source [in a loop].
If your shipping department operated like UDP, it would place an item in each container as soon as the container became available. At time 0, the shipping department would have 100 items left, and place an item in the first container. That first item won't arrive at the destination for 30 minutes, but that's okay, because the shipping department will place the next item in the next available container when it becomes available in 60 minutes / 200 containers = 18 seconds.
30 minutes into this example, all items have been placed into transit (100 items * 18 second gap between each). It will still take 30 minutes for that last unit to arrive at the destination, but the total shipping time for the cargo will be one hour. You don't know if any of the units actually arrived, but the shipping department goes home happy.
If your shipping department operated like TCP, it would only be able to send out a certain number of packages at a time. The amount of packages it can have in transit in any given time is akin to the TCP window size.
In the above example, lets say it could send out 50 packages at a time. The best possible total shipping time for the cargo will now be two hours and fifteen minutes. How?
The first 50 items get shipped as fast as they can; 50 * 18 seconds = 15 minutes. The shipping department must then wait 45 more minutes [T=60 minutes] before it gets confirmation that the first package, at ship time T=0, has been received. At T=60 minutes, the shipping department could start shipping the remaining 50 items one at a time as it receives delivery confirmation. At T=75 minutes, all items will have been shipped, but it won't be until T=135 minutes that the shipping department gets confirmation that the last package was received and can go home.
If the shipping department could only ship 25 packages at time, 4 hours and 7.5 minutes are needed, and the shipping pattern could be somewhat like
T=0; start shipping
T=7.5min, first 25 packages in transit.
T=60; start shipping packages 26-50
T=67.5; next 25 packages in transit
T=120; start shipping packages 51-75
T=127.5; next 25 packages in transit
T=180; start shipping packages 76-100
T=187.5; all packages in transit
T=247.5; confirmation that the final package was been received finally arrives.
There is no way to 'optimize' this by spacing out the shipments for, but in the real world, such bursty shipments could really clog up the cargo lines if anyone else wanted to try to use those cargo lines. And of course, if any of those packages get lost, mayhem occurs!
As you can see, both the round trip time and the window size greatly affect the performance of a TCP application.
Thoughts
On the real internet, things are not so simple. Switches and routers have to buffer traffic. Sometimes a transition from two different media types or speeds will occur [100 Mb ethernet in, 10Mb ethernet out] or [ethernet in, sonet out]. There are millions of concurrent users and traffic flows occurring. Specifics aside, 'theoretical maximum' rarely occurs in practice.
Nuttcp speedtest
C:\Users\SER>C:\nuttcp\nuttcp.exe -i1 -w16m damp-ssc.dren.net damp-nrl.dren.net
50.1875 MB / 1.00 sec = 420.9671 Mbps 0 retrans
111.1875 MB / 1.00 sec = 932.7653 Mbps 0 retrans
108.3750 MB / 1.00 sec = 909.1181 Mbps 0 retrans
109.1250 MB / 1.00 sec = 915.3821 Mbps 0 retrans
109.0000 MB / 1.00 sec = 914.3848 Mbps 0 retrans
108.6875 MB / 1.00 sec = 911.7387 Mbps 0 retrans
108.9375 MB / 1.00 sec = 913.8194 Mbps 0 retrans
108.7500 MB / 1.00 sec = 912.2748 Mbps 0 retrans
109.0625 MB / 1.00 sec = 914.8826 Mbps 0 retrans
108.8750 MB / 1.00 sec = 913.3106 Mbps 0 retrans
1046.4799 MB / 10.20 sec = 860.7017 Mbps 35 %TX 14 %RX 0 retrans
C:\Users\SER>C:\nuttcp\nuttcp.exe -4 -xt damp-ssc.dren.net damp-nrl.dren.net
traceroute to damp-nrl.dren.net (2001:480:6:23f::123), 30 hops max, 40 byte packets
1 * * *
2 * * *
3 2001:480:7:230::1 (2001:480:7:230::1) 68.801 ms 68.811 ms 68.786 ms
4 damp-nrl-6 (2001:480:6:23f::123) 68.304 ms 68.278 ms 68.291 ms
traceroute to 2001:480:6:190f::123 (2001:480:6:190f::123), 30 hops max, 40 byte packets
1 * * *
2 * * *
3 2001:480:7:1900::1 (2001:480:7:1900::1) 68.585 ms 68.582 ms 68.546 ms
4 damp-ssc-6 (2001:480:6:190f::123) 68.700 ms 68.685 ms 68.673 ms
Tcpip can't use the full 100Mbps of 100Mbps network hardware, as most packets are paired (data, ack; request, ack). A link carrying full mtu data packets and their corresponding <ack>s, will presumably be only carrying 50Mbps. A better measure of network capacity is the packet throughput. An estimate of the packet throughput comes from the network capacity (100Mbps)/mtu size(1500bytes) = 8333 packets/sec.
UDP
UDP over Ethernet:
Add 20 IPv4 header or 40 IPv6 header (no options)
Add 8 UDP header
Max UDP Payload data rates over ethernet are thus:
(1500-28)/(38+1500) = 95.7087 % IPv4
(1500-28)/(42+1500) = 95.4604 % 802.1q, IPv4
(1500-48)/(38+1500) = 94.4083 % IPv6
(1500-48)/(42+1500) = 94.1634 % 802.1q, IPv6
[root@speedtest /home/sc]# nuttcp -i1 -T10 -u -Ri100m -v -l8000 9.9.9.9
nuttcp-t: Warning: IP frags or no data reception since buflen=8000 > ctlconnmss=1448
nuttcp-t: v7.1.5: socket
nuttcp-t: buflen=8000, nstream=1, port=5101 udp -> 9.9.9.9
nuttcp-t: time limit = 10.00 seconds
nuttcp-t: rate limit = 100.000 Mbps (instantaneous), 1562 pps
nuttcp-t: send window size = 9216, receive window size = 42080
nuttcp-r: v6.1.2: socket
nuttcp-r: buflen=8000, nstream=1, port=5101 udp
nuttcp-r: interval reporting every 1.00 second
nuttcp-r: send window size = 9216, receive window size = 42080
11.4136 MB / 1.00 sec = 95.6681 Mbps 0 / 1496 ~drop/pkt 0.00 ~%loss
11.3983 MB / 1.00 sec = 95.7115 Mbps 0 / 1494 ~drop/pkt 0.00 ~%loss
11.4136 MB / 1.00 sec = 95.7436 Mbps 0 / 1496 ~drop/pkt 0.00 ~%loss
11.4136 MB / 1.00 sec = 95.7434 Mbps 0 / 1496 ~drop/pkt 0.00 ~%loss
11.4136 MB / 1.00 sec = 95.7439 Mbps 0 / 1496 ~drop/pkt 0.00 ~%loss
11.4136 MB / 1.00 sec = 95.7435 Mbps 0 / 1496 ~drop/pkt 0.00 ~%loss
11.4136 MB / 1.00 sec = 95.7427 Mbps 0 / 1496 ~drop/pkt 0.00 ~%loss
11.4136 MB / 1.00 sec = 95.7446 Mbps 0 / 1496 ~drop/pkt 0.00 ~%loss
11.4136 MB / 1.00 sec = 95.7435 Mbps 0 / 1496 ~drop/pkt 0.00 ~%loss
11.4059 MB / 1.00 sec = 95.6797 Mbps 0 / 1495 ~drop/pkt 0.00 ~%loss
nuttcp-t: 114.7614 MB in 10.00 real seconds = 11750.38 KB/sec = 96.2591 Mbps
nuttcp-t: 15069 I/O calls, msec/call = 0.68, calls/sec = 1506.75
nuttcp-t: 2.4user 7.1sys 0:10real 96% 152i+2619d 764maxrss 0+1pf 64+13629csw
nuttcp-r: 114.7614 MB in 10.06 real seconds = 11686.08 KB/sec = 95.7324 Mbps
nuttcp-r: 0 / 15042 drop/pkt 0.00% data loss
nuttcp-r: 15046 I/O calls, msec/call = 0.68, calls/sec = 1496.22
nuttcp-r: 0.0user 0.0sys 0:10real 0% 100i+1340d 610maxrss 0+0pf 15054+0csw
[root@speedtest /home/sc]#
[sc@speedtest ~]$ nuttcp -i1 -T10 -u -Ri100m -l1400 9.9.9.9
11.3901 MB / 1.00 sec = 95.4691 Mbps 0 / 8531 ~drop/pkt 0.00 ~%loss
11.3728 MB / 1.00 sec = 95.4969 Mbps 0 / 8518 ~drop/pkt 0.00 ~%loss
11.3834 MB / 1.00 sec = 95.4904 Mbps 0 / 8526 ~drop/pkt 0.00 ~%loss
11.3848 MB / 1.00 sec = 95.5022 Mbps 0 / 8527 ~drop/pkt 0.00 ~%loss
11.3834 MB / 1.00 sec = 95.4911 Mbps 0 / 8526 ~drop/pkt 0.00 ~%loss
11.3848 MB / 1.00 sec = 95.5018 Mbps 0 / 8527 ~drop/pkt 0.00 ~%loss
11.3834 MB / 1.00 sec = 95.4908 Mbps 0 / 8526 ~drop/pkt 0.00 ~%loss
11.3848 MB / 1.00 sec = 95.5021 Mbps 0 / 8527 ~drop/pkt 0.00 ~%loss
11.3848 MB / 1.00 sec = 95.5019 Mbps 0 / 8527 ~drop/pkt 0.00 ~%loss
11.3834 MB / 1.00 sec = 95.4908 Mbps 0 / 8526 ~drop/pkt 0.00 ~%loss
114.7596 MB / 10.08 sec = 95.4941 Mbps 95 %TX 2 %RX 0 / 85953 drop/pkt 0.00 %loss
[sc@speedtest ~]$
C:\Users\SER>C:\nuttcp\nuttcp.exe -T10 -i1 -u -Ri100m -l1460 9.9.9.9
nuttcp-t: Warning: IP frags or no data reception since buflen=1460 > ctlconnmss=0
11.2976 MB / 1.00 sec = 94.7296 Mbps 51 / 8165 ~drop/pkt 0.62 ~%loss
11.3937 MB / 1.00 sec = 95.6730 Mbps 54 / 8237 ~drop/pkt 0.66 ~%loss
11.4049 MB / 1.00 sec = 95.6707 Mbps 36 / 8227 ~drop/pkt 0.44 ~%loss
11.4062 MB / 1.00 sec = 95.6820 Mbps 163 / 8355 ~drop/pkt 1.95 ~%loss
11.4049 MB / 1.00 sec = 95.6706 Mbps 375 / 8566 ~drop/pkt 4.38 ~%loss
11.4049 MB / 1.00 sec = 95.6704 Mbps 370 / 8561 ~drop/pkt 4.32 ~%loss
11.4049 MB / 1.00 sec = 95.6705 Mbps 367 / 8558 ~drop/pkt 4.29 ~%loss
11.4062 MB / 1.00 sec = 95.6821 Mbps 376 / 8568 ~drop/pkt 4.39 ~%loss
11.4049 MB / 1.00 sec = 95.6706 Mbps 357 / 8548 ~drop/pkt 4.18 ~%loss
11.4062 MB / 1.00 sec = 95.6822 Mbps 361 / 8553 ~drop/pkt 4.22 ~%loss
115.4228 MB / 10.13 sec = 95.5814 Mbps 99 %TX 3 %RX 2585 / 85482 drop/pkt 3.02 %loss
C:\Users\SER>
What does the reported data rate really mean?
All variations of ttcp/iperf report payload or user data rates, i.e. no overhead bytes from headers (TCP, UDP, IP, etc.) are included in the reported data rates. When comparing to "line" rates or "peak" rates, it is important to consider all of this overhead. It is also important to understand what the tools mean by "K" or "M" bits or bytes per second. Versions of the tools differ on this point.Computer memory is measured in powers of two, e.g. 1 KB = 2^10 = 1024 bytes; 1 MB = 2^20 = 1024*1024 = 1048576 bytes. Data communication rates however should always be stated in simple bits per second. For example "100 megabit ethernet" can send exactly 100,000,000 bits per second.
- K and M in ttcp/nttcp/iperf
- ttcp original: K = 1024 (M is not used)
- nttcp: K = 1024, M = 1000*1024
- iperf 1.1.1 and earlier: K = 1024, M = 1024*1024
- iperf 1.2 and above: K=1000, M=1000000
- nuttcp: K=1000, M=1000000
nuttcp
ftp://ftp.lcp.nrl.navy.mil/pub/nuttcp/One of the best! Highly recommended. The author calls it n-u-t-t-c-p, but many of us affectionately call it nut-c-p. nuttcp can run as a server and passes all output back to the client side, so you don't need an account on the server side to see the results.
Usage: nuttcp or nuttcp -h prints this usage info Usage: nuttcp -V prints version info Usage: nuttcp -xt [-m] host forward and reverse traceroute to/from server Usage (transmitter): nuttcp -t [-options] host [3rd-party] [out] -4 Use IPv4 -6 Use IPv6 -l## length of network read buf (default 8192/udp, 65536/tcp) -s don't sink (discard): prints all data from network to stdout -n## number of bufs for server to write to network (default 2048) -w## receiver window size in KB (or (m|M)B or (g|G)B) -ws## server transmit window size in KB (or (m|M)B or (g|G)B) -wb braindead Solaris 2.8 (sets both xmit and rcv windows) -p## port number to listen at (default 5001) -P## port number for control connection (default 5000) -B Only output full blocks, as specified in -l## (for TAR) -u use UDP instead of TCP -m## use multicast with specified TTL instead of unicast (UDP) -N## number of streams (starting at port number), implies -B -R## server transmit rate limit in Kbps (or (m|M)bps or (g|G)bps) -T## server transmit timeout in seconds (or (m|M)inutes or (h|H)ours) -i## client interval reporting in seconds (or (m|M)inutes) -Ixxx identifier for nuttcp output (max of 40 characters) -F flip option to reverse direction of data connection open -xP## set nuttcp process priority (must be root) -d set TCP SO_DEBUG option on data socket -v[v] verbose [or very verbose] output -b brief output (default) Usage (server): nuttcp -S [-options] note server mode excludes use of -s -4 Use IPv4 (default) -6 Use IPv6 -1 oneshot server mode (implied with inetd/xinetd), implies -S -P## port number for server connection (default 5000) note don't use with inetd/xinetd (use services file instead) -xP## set nuttcp process priority (must be root) --no3rdparty don't allow 3rd party capability --nofork don't fork server Format options: -fxmitstats also give transmitter stats (MB) with -i (UDP only) -frunningtotal also give cumulative stats on interval reports -f-drops don't give packet drop info on brief output (UDP) -f-percentloss don't give %loss info on brief output (UDP) -fparse generate key=value parsable output
No comments :
Post a Comment