MTU
Physical and link layers mandate a Maximum Transmission Unit, MTU. For Ethernet, MTU standard size is defined as 1500 bytes - this is the maximum size of payload in a Ethernet Frame, including the encapsulated TCP and IP headers. All machines/routers in the route of the packet must agree on the same MTU size.
# ip link show | grep mtu
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc pfifo_fast master bond0 state DOWN qlen 1000
3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP qlen 1000
4: venet0: <BROADCAST,POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
If the packet arriving at a router is too big for the MTU of the next hop network, it may do any of the following
- Router splits the packet into fragments sized for that MTU, or
- Router uses ICMP to alert senders when fragmentation is needed, and
- Sender adjusts frame size appropriately.
Path MTU Discovery(pMTU):
One approach to solve the fragmentation problem is to find the smallest MTU size supported by routers on a path in adavance. This helps in avoiding fragmentation by not sending packets larger than the smallest MTU size supported by routers in the path/link. In Path MTU discovery, an initial packet is sent with "Don't Fragment" flag set. Router that would fragment the packet, drops it and responds to the sender with ICMP "fragmentation needed" error packet marked with the MTU of the next hop. The kernel can cache the path MTU(pMTU) information for a connection locally in the routing cache(/proc/net/rt_cache) and resend smaller packets.
Path MTU discovery is enabled on by default and can be verified by checking the sysctl setting -
net.ipv4.ip_no_pmtu_disc
# sysctl -a | grep pmtu
net.ipv4.ip_no_pmtu_disc = 0
This can be
turned off with syctl by setting
net.ipv4.ip_no_pmtu_disc to
1.
A tool for simulating the Path MTU discovery is tracepath.
However, if a router in the path filters all the ICMP traffic, Path MTU discovery(pMTU) gets broken and therefore affects performance by either
- Causing intermediate fragmentation at routers, or
- Causing hosts to assume that the minimum path MTU allowed is that of the IP(576 bytes) when higher MTUs are possible.
Note:
- Filtering of all UDP, ICMP 'time-exceeded', ICMP 'dest-unreach/port-unreach' breaks the tracepath and traceroute.
- traceroute -I and mtr commands break if ICMP 'time-exceeded' and ICMP 'echo-request'/'echo-reply' are filtered.
Jumbo Frames:
Standard payload size(MTU) is defined as 1500 bytes. To increase performance, payload size(MTU) can be increased. By increasing the packet size,
- Number of packets transmitted is reduced
- Less packets means number of interrupts to be handled is also reduced(CPU service time), thus smaller CPU load
- Less space on an average is needed for headers
Jumbo frames are Ethernet frames with more than 1,500 bytes of payload (MTU). Conventionally, jumbo frames can carry up to 9,000 bytes of payload. Using a larger MTU value (jumbo frames) can significantly speed up network transfers. All the networking hardware in the route of a jumbo frame must support jumbo frames.
Why jumbo frames cannot exceeed 9000 bytes?
Because the CRC field isn’t long enough to guarantee detection of errors for frames larger than 9000 bytes.
How to set MTU size greater than 1500 bytes for a NIC?
1)
ip link set ethX mtu Y
or
ifconfig ethX mtu Y
where,
ethX is the ethernet adapter (eth0, eth1, etc.) and
Y is the MTU size of the frame(1500, 4000, 9000).
2) Disable path MTU discovery, as follows
net.ipv4.ip_no_pmtu_disc = 1
MSS
MSS is the Maximum Segment Size. This is usually the local MTU minus TCP/IP headers.
For example, if MTU is 1500 bytes, MSS is usually 1460 bytes(1500 - TCP header 20 bytes - IP header 20 bytes).