Friday, June 20, 2025

Network Path MTU Issues: PMTUD Black Holes, ICMP, and MSS Optimization

Efficient network communication relies on understanding and managing the Maximum Transmission Unit (MTU) across the entire end-to-end path. MTU mismatches or improperly handled Path MTU Discovery (PMTUD) can result in silent drops (black holes), degraded performance, or outright connection failures.

It explains:

  • What MTU and MSS are
  • How PMTUD works
  • Why PMTUD black holes occur
  • The role of ICMP in PMTUD
  • Best practices for avoiding issues, including MSS clamping

Maximum Transmission Unit (MTU)

  • The largest packet size (in bytes) that can be transmitted without fragmentation over a link.
  • Common MTUs:
    • Ethernet: 1500 bytes
    • IPsec VPN (ESP): ~1400 bytes (due to encapsulation overhead)
    • GRE tunnels: ~1476 bytes

Maximum Segment Size (MSS)

  • The maximum amount of TCP payload data a device is willing to receive in a single segment.
  • Calculated as: MSS = MTU - IP header (20 bytes) - TCP header (20 bytes)
    • For MTU 1500, MSS is typically 1460.

Tunnel Type Encapsulation Overhead Suggested MSS
IPsec (ESP) ~60 bytes 1380
GRE ~24 bytes 1436
SD-WAN (overhead varies) 60–100 bytes 1300–1360

Common Symptoms of MTU Issues

  • TCP connections hang during TLS handshake (often during Server Hello).
  • Long delays followed by timeouts or retransmissions.
  • Specific applications fail while others succeed.
  • Only large payloads are affected (e.g., HTTP POSTs, file uploads)

MSS Overshoot:
  •   If MSS > path MTU, packets are fragmented or dropped.
  •   Example: MSS=1434 with MTU=1383 → Fragmentation needed.

What Is MSS Clamping?

  • Router/firewall modifies the TCP MSS option in SYN packets.
  • Ensures TCP sessions agree on a safe MSS that fits below the true path MTU.

When to Use

  • When PMTUD is unreliable or ICMP cannot be guaranteed.
  • In environments with tunnels, IPsec, or MPLS which reduce effective MTU.

 Payload Fragmentation

    When It Happens: If DF=0 and MSS > path MTU.

    Risks:
  •  Increases latency.
  •  Some networks/firewalls drop fragments (security policies)  
Does MSS Change Based on Internet Speed/Bandwidth?

MSS is determined by MTU (Maximum Transmission Unit) and protocol overhead, not by bandwidth or speed fluctuations.
  • Example: Whether your link is 10 Mbps or 1 Gbps, MSS remains fixed at MTU - 40 bytes (TCP/IPv4 header).
  • Exception: If the path MTU changes (e.g., due to VPN tunnel adjustments), MSS may be renegotiated during TCP handshake.

 Factors That Can Influence MSS (Client-Side)  

Factor Impact on MSS Example
1. MTU of the Interface (Wi-Fi, Ethernet, LTE)
Directly sets MSS: MSS = MTU - 40 (TCP/IPv4 header). Ethernet MTU 1500 → MSS 1460.
(Primary determinant of MSS)
2. Tunnel Overhead Reduces effective MTU (and thus MSS). IPsec adds 50 bytes → MSS = 1500 - 50 - 40 = 1410.
3. MSS Clamping (by local router/firewall)
Firewalls/SD-WAN can enforce MSS limits to prevent fragmentation. Force MSS ≤ 1343 for VPN tunnels.
4. Path MTU Discovery (PMTUD) Dynamically adjusts MSS if intermediate links have smaller MTUs. Router with MTU 1400 → MSS 1360.
5. TCP Stack Settings
Dual-stack (IPv4 vs IPv6)
OS/kernel can override default MSS (e.g., sysctl net.ipv4.route.mtu in Linux). Manual MSS setting for POS terminals.
(Different header sizes → different MSS)

 What Does Not Influence MSS?

Factor Reason
Bandwidth/Speed (nternet speed (e.g., 1 Mbps vs 100 Mbps))
MSS is a size limit, not throughput-related.
Latency/Jitter  (ping time) Affects performance but not segment size.
Packet Loss Triggers retransmissions but doesn’t change MSS.
Encryption (TLS/SSL) Adds payload overhead but doesn’t alter TCP MSS (handled above transport layer).

PMTUD (Path MTU Discovery)

Path MTU Discovery (PMTUD) is a mechanism that helps a sender find the maximum IP packet size (MTU) that can traverse the network without fragmentation. Each link in a network path can have a different MTU.  PMTUD helps avoid:

  • Sending packets too big, which would get dropped if DF (Don't Fragment) is set
  • The overhead of IP fragmentation, which can hurt performance

How PMTUD Works:

  • Sender sends packets with the "Don’t Fragment" (DF) flag set.
  • If a router along the path encounters a packet larger than its MTU, it:
    • Drops the packet.
    • Sends an ICMP Type 3 (Code 4: "Fragmentation Needed") message back to the sender, including the next-hop MTU.
  • The sender then reduces its MSS/MTU to match.

PMTUD Black Holes

  • Occurs when:
    • Intermediate routers drop packets with DF bit set.
    • ICMP "Fragmentation Needed" messages are blocked or filtered.
  • Result: Sender never learns to reduce packet size → packets silently dropped.

 Blocking ICMP Type 3 Breaks PMTUD:

  • If firewalls block these ICMP messages, the sender never learns it needs to reduce MTU/MSS.
  • Result: Packets are silently dropped, causing timeouts and retries.

Best Practices for Enabling PMTUD

On Firewalls/Routers: For proper Path MTU Discovery (PMTUD) to work, "ICMP Type 3 (Destination Unreachable)- Code 4 (Fragmentation Needed but DF set)" must be allowed in both directions (inbound and outbound) across firewalls, routers, and hosts.

  • Allow outbound ICMP Type 3, Code 4 (from routers to senders).
  • Allow inbound ICMP Type 3, Code 4 (if hosts need to receive PMTUD messages).

 How enabling ICMP Type 3 helps our scenario

Enabling ICMP Type 3 ("Fragmentation Needed") on firewalls is critical for proper Path MTU Discovery (PMTUD) to work. Here's why it resolves our MSS/MTU issues and how to implement it:

Before (ICMP Blocked) After (ICMP Allowed)
MSS=1380 fails (no feedback) Router sends ICMP Type 3, telling client to use MTU=1290.
Client blindly retries with MSS=1250 (guessing) Client immediately adjusts MSS to 1250 (1290 - 40).
Inefficient retries and latency First attempt succeeds with correct MSS.

 Why ICMP Type 3 ("Fragmentation Needed") Matters

  1. How PMTUD Works:

    • Sender sends packets with the "Don’t Fragment" (DF) flag set.

    • If a router along the path encounters a packet larger than its MTU, it:

      • Drops the packet.

      • Sends an ICMP Type 3 (Code 4: "Fragmentation Needed") message back to the sender, including the next-hop MTU.

    • The sender then reduces its MSS/MTU to match.

  2. Blocking ICMP Type 3 Breaks PMTUD:

    • If firewalls block these ICMP messages, the sender never learns it needs to reduce MTU/MSS.

    • Result: Packets are silently dropped, causing timeouts and retries (like your MSS 1380 → 1250 fallback).

        

        

         

Criteria Solution 1 : MSS Clamping (Manual Adjustment) Solution 2 : PMTUD Enabled (Automatic Detection)
Mechanism Forcefully sets TCP MSS to a fixed value (e.g., 1250) Relies on ICMP Type 3, Code 4 to dynamically adjust MTU
Trigger
Pre-configured MTU mismatch (e.g., tunnel)
ICMP "Frag Needed" (Type 3, Code 4)
When Applied During TCP handshake (SYN/SYN-ACK) After packet loss (retransmission)
Implementation Configured on data center firewalls/Store network
Requires allowing ICMP "Fragmentation Needed" end-to-end
Mechanism
TCP option (e.g., MSS 1400)
ICMP error + PMTUD cache update
Pros - Guaranteed packet size reduction
- Always works (no dependency on ICMP)
- Auto-adapts to path changes
- RFC-compliant (RFC 1191)
Cons - Static (fails if path MTU changes)
- May fragment
- Fails if ICMP is blocked
- Slight initial delay
Impact Suboptimal for dynamic networks Industry best practice for reliability
Performance
Prevents fragmentation upfront
May cause delays due to retransmissions
Verification Check SYN packets for clamped MSS (e.g., tcpdump) Test with ping -M do -s 1400 for ICMP responses
Recommended? Fallback option Primary solution (enable ICMP Type 3, Code 4)

Why PMTUD (Option 2) is the Right Approach

  • SD-WAN Confirmed the Root Cause: ICMP blocking at 38.97.129.101 is breaking PMTUD.
  • "Black Hole" Issue Resolved: Unblocking ICMP ensures the server receives MTU feedback, preventing silent failures.
  • Future-Proof: Works seamlessly even if the tunnel MTU changes.

Key Decision Factors

  • If ICMP can be unblocked: PMTUD (Preferred) – Self-healing and scalable.
  • If ICMP must stay blocked: MSS Clamping – Static but predictable (set MSS=1250)

Immediate Actions Requested

  • Unblock ICMP Type 3, Code 4 on all firewalls/routers between the server and SD-WAN tunnel.
  • Monitor: Confirm the sender(client/server/router/firewall)  auto-adjusts MSS after receiving ICMP feedback.

 Why MSS Changes Despite Fixed Tunnel MTU

Intermediate Device Restrictions  : A router/firewall along the path may have a smaller MTU (e.g., 1290), forcing TCP to adjust MSS dynamically. Tunnel MTU is 1383, but the router caps packets at 1290 → MSS = 1290 - 40 = 1250.

PMTUD Behavior : If the initial packet (MSS=1380) is dropped due to fragmentation, ICMP "Packet Too Big" messages force the client to retry with a smaller MSS. Some networks block ICMP, breaking PMTUD and causing persistent failures.

Asymmetric Paths : Outbound/inbound paths may differ (e.g., traffic shaping on one leg). The client sees the strictest MTU.

TCP Stack Heuristics : Modern OSes (Linux/Windows) may aggressively reduce MSS after failures, even if the root cause isn't MTU.

Why the client might reduce its MSS from 1380 to 1250 despite both tunnels having the same MTU (1383)

      
Observation Possible Explanation
First attempt (MSS=1380) fails Path MTU discovery (PMTUD) detects fragmentation and triggers MSS reduction.
Retry (MSS=1250) succeeds Client adapts to a narrower bottleneck (e.g., intermediate device with smaller MTU).
Same tunnel MTU (1383) Tunnel endpoints support 1383, but the path may have a stricter limit (e.g., 1290).

The client reduces MSS because the path MTU is narrower than the tunnel MTU.

What is ICMP Type 3?

ICMP (Internet Control Message Protocol) Type 3 is a "Destination Unreachable" message sent by a router or host to indicate that a packet cannot be delivered to its intended destination. It includes various codes (subtypes) that specify the reason, such as:

  • Code 0 (Net Unreachable) – Network is not accessible.
  • Code 1 (Host Unreachable) – Host is not reachable.
  • Code 3 (Port Unreachable) – The requested port is closed.
  • Code 4 (Fragmentation Needed but DF set) – Indicates Path MTU Discovery (PMTUD) failure.
Is Enabling ICMP Type 3 Recommended?

Yes, in most cases, ICMP Type 3 should be enabled because:
  • Helps with troubleshooting – Without it, connectivity issues become harder to diagnose (e.g., "Request timed out" instead of "Destination unreachable").
  • Supports Path MTU Discovery (PMTUD) – Code 4 (Fragmentation Needed) is critical for TCP performance; blocking it can cause broken connections for large packets.
  • Prevents "black holes" – Without ICMP Type 3, a sender may keep retransmitting packets indefinitely, unaware that the destination is unreachable.

Default Value
  • Most firewalls and operating systems allow ICMP Type 3 by default since it is essential for proper network operation.
  • Some restrictive security policies may block it, but this can cause network issues.

  

How Enabling ICMP Type 3 

Enabling ICMP Type 3 ("Fragmentation Needed") on firewalls is critical for proper Path MTU Discovery (PMTUD) to work. Here's why it resolves your MSS/MTU issues and how to implement it:

Before (ICMP Blocked) After (ICMP Allowed)
MSS=1380 fails (no feedback) Router sends ICMP Type 3, telling client to use MTU=1290.
Client blindly retries with MSS=1250 (guessing) Client immediately adjusts MSS to 1250 (1290 - 40).
Inefficient retries and latency First attempt succeeds with correct MSS.

Key Benefits

  • Prevents silent packet drops: Devices adjust MSS/MTU proactively.
  • Eliminates guesswork: No more arbitrary MSS fallbacks (e.g., 1380 → 1250).
  • Improves performance: Reduces TCP retransmissions and latency.

Best Practices for Enabling PMTUD

On Firewalls/Routers:

  • Allow outbound ICMP Type 3, Code 4 (from routers to senders).
  • Allow inbound ICMP Type 3, Code 4 (if your hosts need to receive PMTUD messages).


Avoid:

  • Blocking all ICMP (breaks PMTUD and troubleshooting).
  • Filtering ICMP Type 3, Code 4 (causes "black hole" connections).

Real time use case :

Client-side TCP MSS values ranged across the following: 1243, 1250, 1259, 1261, 1273, 1291, 1322, 1323, 1331, 1343, 1380 .

The server always responds with MSS=1460, as MTU is 1500. However, the SD-WAN tunnel has an effective MTU of 1383 bytes due to 117 bytes of encapsulation overhead.

When a client advertises MSS=1380, the corresponding packet size becomes:

  • 1380 MSS + 40 (TCP/IP headers) = 1420 bytes MTU
    • This exceeds the SD-WAN tunnel MTU (1383), leading to packet drop.

Connections where the client MSS was set to 1380 consistently failed to complete the TLS handshake. The root cause is that the TLS ServerHello segment sent by the server was dropped in the SD-WAN tunnel, preventing it from reaching the client. As a result, the client receives only out-of-order segments, while the first segment (which is required to continue the TLS exchange) is missing, triggering retransmissions and eventually connection failure.

The server is advertising MSS correctly in the SYN/ACK (typically 1460), which aligns with our internal MTU configuration of 1500 bytes. However, the issue we're observing is that the client is not adhering to this in many cases and continues to send MSS = 1380 (MTU=1420), which causes packet drops due to overshoot beyond the SD-WAN tunnel MTU of 1383.

Step

Direction

TCP Flags

MSS Value Advertised

What it Means

1️

Client → Server

SYN

MSS = 1380

"Hey server, I can receive up to 1380-byte TCP payloads from you."

2️

Server → Client

SYN-ACK

MSS = 1460

"Okay, I acknowledge. I can receive up to 1460-byte TCP payloads from you."

3️

Client → Server

ACK

-

Connection established. Now both sides know each other's limits.

⚠️ Thumb Rule for SD-WAN Tunnel Compatibility

  • TCP Payload (MSS) + IP Header (20B) + TCP Header (20B) ≤ SD-WAN Tunnel MTU
    • If SD-WAN tunnel MTU = 1383 bytes, then: Max Safe MSS = 1383 – 20 (IP) – 20 (TCP) = 1343 bytes
    • If the client advertises MSS = 1380, total packet size becomes: 1380 (MSS) + 40 (headers) = 1420 bytes
      • This exceeds the tunnel MTU (1383) → packet will be dropped unless fragmented (which doesn't happen with DF=1).

Recommended Solutions

To resolve this, the client-side MSS must be clamped to a value that results in packets smaller than the tunnel MTU.

 Option 1: MSS/MTU Clamping at the Network Edge (POS Client)  

Approach

Description

Feasibility

1.a

Clamp MSS/MTU on the client OS

is the doable at every client OS endpoint?

1.b

Clamp MSS/MTU on the router/firewall

Is this manageable across network infrastructure?

If feasible, MSS should be clamped such that:

  • TCP segment size + headers ≤ 1383
    • Example: Clamp MTU between 1290–1383 bytes, or clamp MSS to 1240–1343

Option 2: Increase SD-WAN Tunnel MTU

If edge control is not viable, another option is:

  • Increase the SD-WAN/MPLS tunnel MTU from 1383 to 1420
    • This reduces tunnel overhead from 117 bytes to 80 bytes

   

Visual Flow Summary:

Frame

Direction

TCP Seq → Ack

Payload Length

What Happens

107711

Client → Server

SYN

0

🔹 TCP handshake initiation (MSS = 1380)

107712

Server → Client

SYN, ACK

0

🔹 Server responds (MSS = 1460)

107713

Client → Server

ACK

0

TCP 3-way handshake completed

107714

Client → Server

Seq=1 Ack=1

216

🚀 TLS ClientHello

107715

Server → Client

Seq=1 Ack=217

0

🔄 ACK for ClientHello

107716

Server → Client

Seq=1

2820

TLS ServerHello + Certificate — likely dropped (MSS/MTU overshoot)

107717

Server → Client

Seq=2761

1336

TLS Certificate chunk — received

107718

Server → Client

Seq=4097

1575

ServerKeyExchange + ServerHelloDone — received

107719

Client → Server

Ack=1

0

🛑 Dup ACK #1 — SACK: 2761–4097

107720

Client → Server

Ack=1

0

🛑 Dup ACK #2 — SACK: 2761–4097, 5477–5672

107721

Server → Client

Seq=1

1380

Fast Retransmit of missing segment (MTU-safe)


Key Observations :

  • Frame 107716 contains the first TLS ServerHello response from the server, starting at TCP sequence number Seq=1.
    • This segment is critical for initiating the TLS handshake and must be received by the client before any further TLS processing can occur.
  • However, the client's ACKs indicate it never received this segment :
    • In Frame 107719, the client responds with Ack=1, meaning it is still waiting for the segment starting at Seq=1.
  • The client does include SACK (Selective Acknowledgment) blocks in its duplicate ACKs, such as:
    • SLE=2761, SRE=4097, which means:
      • “I did not receive your segment starting at Seq=1, but I have received the out-of-order segment from 2761 to 4096.”

This behavior is consistent with a packet drop of the ServerHello segment, likely due to an MSS/MTU overshoot.

 Diagnostic Approach 

Tools

  • Wireshark: Inspect TCP MSS values, observe SYN/SYN-ACK, identify dropped packets or retransmissions.
  • ping -M do -s [size] [host]: Manually probe for MTU.
  • tracepath / traceroute --mtu: Discover where fragmentation occur     

Recommendations 

Immediate

  •  Enable MSS clamping on all WAN/tunnel-facing interfaces:
    •  ip tcp adjust-mss 1360
  • Verify that ICMP Type 3, Code 4 is allowed in firewalls and middleboxes. 
Long-Term
  • Perform regular MTU path testing across all critical network paths.
  • Document MTU constraints for each WAN circuit, tunnel, and overlay path.
  • Avoid blindly increasing interface MTU without end-to-end validation. 
Proper MTU management and MSS optimization are essential for reliable network communication, especially across complex SD-WAN and VPN architectures. By understanding and mitigating PMTUD black holes, enabling ICMP feedback, and applying MSS clamping, organizations can prevent silent failures and ensure stable connectivity. 

Monday, June 16, 2025

Jumbo Frames are a LAN-side performance feature, while MTU clamping in SD-WAN is a WAN-side compatibility safeguard.

 Jumbo Frames are completely separate from standard MTU and not affected by our standard MTU clamping — unless we explicitly allow and use Jumbo Frames end-to-end. When we clamp standard MTU from 1420 to 1300, we're only affecting regular Ethernet frames (standard traffic).

Jumbo Frames (LAN-side Optimization)

  • Ethernet frames larger than the standard 1500 bytes, typically up to 9000 bytes.
  • Primarily used within high-speed internal networks (e.g., data centers) to reduce CPU overhead and improve throughput.
  • Jumbo frames require explicit support and configuration end-to-end across all network devices — not used over the public internet or SD-WAN circuits by default.
Aspect Standard MTU Jumbo Frames
Typical Size 1500 (often clamped to 1300–1420) 9000 (custom, non-standard)
Used by Most Internet and enterprise traffic Only within high-speed internal LANs / data centers
Needs Clamping? Yes, for tunnels/overheads (e.g., IPsec, GRE, SD-WAN) No (unless you're running Jumbo over the same constrained path, which is rare)
Negotiated? Yes (Path MTU Discovery, MSS) No — must be manually configured and supported end-to-end
TSO / LSO Impact Still applies Jumbo frame simply increases physical wire frame size

Clamping standard MTU does not impact Jumbo Frames, because :

  • Jumbo Frames work only inside a Local Area Network (LAN) or similarly controlled environments like: Data centers, Storage networks (NAS/iSCSI), High-performance computing clusters.
  • Operate in separate L2 environments (e.g., within LAN, not over WAN tunnels)
  • Are already constrained to trusted, known MTU paths.
  • Jumbo Frames are not “clamped” in the same sense, because:
    • They must be explicitly allowed and configured on the interface (NIC, switch, router).
    • They don’t rely on PMTUD — they’re rejected outright if any hop doesn't support them.

Here is 3 technical concepts : 

Concept Scope Purpose
PMTUD (Path MTU Discovery) 🌐 WAN Dynamically discovers smallest MTU along a path to avoid fragmentation
TSO (TCP Segmentation Offload) 🌐 WAN + LAN (mostly host NIC) Offloads TCP segmentation to NIC to boost performance for large buffers
Jumbo Frames 🏢 LAN Increases Ethernet frame size to reduce per-packet overhead in LAN

1. PMTUD (Path MTU Discovery)

This is our current process by default where DF (Don't Fragment) is always set by default , where we can see more than 25000 bytes are send without any fragmentation , even we never see any where with Fragmentation is enabled. As most OS supports the TCP Segmentation Offload (TSO) or Large Segment Offload (LSO).

Client  exchange the MSS during the TCP 3-way handshake. "MSS is advisory, not enforced.  But the sender is not obliged to obey."

The OS kernel or TLS layer may send larger chunks for performance or protocol reasons. The sender relies on PMTUD (with DF set) to discover if large packets are acceptable along the path.  TCP stacks may choose to initially send large segments, assuming large MTUs, unless PMTUD fails.

Path MTU Discovery (PMTUD) is a mechanism that helps a sender find the maximum IP packet size (MTU) that can traverse the network without fragmentation. Each link in a network path can have a different MTU.  PMTUD helps avoid:

  • Sending packets too big, which would get dropped if DF (Don't Fragment) is set
  • The overhead of IP fragmentation, which can hurt performance

How PMTUD Works :

  • Sender sets the DF bit (Don't Fragment) in IP packets.
  • Starts sending large packets (e.g., 1500 bytes).
    • If a router encounters a too-large packet, and can't fragment:
    • It drops the packet.
  • Sends back an ICMP Type 3, Code 4: “Fragmentation Needed but DF Set”.
  • The sender receives this ICMP and reduces packet size accordingly.

2. TCP Segmentation Offload (TSO) 

Technically A TCP segment with a payload size larger than the path MTU, constructed by the OS kernel and offloaded to the network card for segmentation. This is called as TCP Segmentation Offload (TSO)” or “Large Segment Offload (LSO)".

  • TSO is a hardware-level optimization, handled by the NIC, not part of the IP protocol stack.
  • If TSO is enabled on a host's network interface , it is used for all outbound TCP traffic (LAN or WAN).
  • The OS creates a huge TCP segment (e.g., 64KB or more). TSO allows the OS to send large TCP buffers (e.g., 64KB) to NIC, which NIC splits it into MTU-sized packets (e.g., 1500 bytes) for transmission over the network.
  • Instead of the CPU breaking the data into small packets, the NIC takes over the segmentation task, reducing CPU overhead and improving throughput.

Will TSO Still Work After MTU Clamping?

Yes, TSO/LSO will still function as expected, but the size of the individual MTU-sized frames generated by the NIC will be adjusted based on the clamped MTU.  We can safely clamp MTU/MSS at the edge or adjust tunnel overhead without breaking TSO.  Just ensure : Firewalls/routers don’t apply hard limits below our clamped MTU.

TCP Segmentation Offload (TSO) — also known as Large Segment Offload (LSO) — is a network performance optimization technique used in modern operating systems and network interface cards (NICs).

With TSO enabled:

  • The OS creates a large TCP segment (say 64KB).
  • The NIC is responsible for breaking it into packets that match the effective MTU.
  • If you've clamped the MTU to 1290 or 1383 (instead of 1500), the NIC will now segment based on the new lower MTU.
  • This increases the number of packets per large segment but does not break functionality.

📦 Without TSO:

  • OS creates and segments each TCP packet.
  • CPU is involved in calculating checksums, headers, segmentation.
  • More CPU cycles used, especially at high throughput.

📊 Visual Representation:

Layer Without TSO With TSO
OS (Kernel) Segments into 1460-byte TCP segments Sends 64KB chunk to NIC
NIC (Hardware) Just transmits Splits into MTU-sized packets (e.g., 1460B)
Result More CPU work, more interrupts Less CPU work, higher performance

✅ Benefits of TSO:

  • Reduces CPU load for high-volume TCP traffic.
  • Enables higher throughput without additional CPU consumption.
  • Particularly useful for web servers, file transfers, high-performance applications.

 

TermLayerDescription
PMTUDNetworkDiscovers the largest IP packet that can be sent without fragmentation
MSS (TCP)TransportMax TCP payload advertised by each endpoint
MTU (IP)NetworkMax IP packet size supported by each link
TSO/LSONICOffloads large TCP segments to be split at NIC level
Jumbo FramesEthernetFrames > 1500 bytes (e.g., 9000B) — only supported on compatible networks
Jumbo TCP SegmentTCP/OSLarge segments seen in capture due to TSO — not actual on-wire sizes

 

  Impact of Lowered MTU:
  • More packets per TSO segment → slightly higher CPU/network processing overhead.
  • Possible minor reduction in throughput if NIC offloading is limited or CPU is under pressure.
  • But TLS handshake reliability improves significantly, which is more critical in your case.
ParameterBefore Clamping (MTU=1500)After Clamping (e.g., MTU=1290–1383)
TSO Segment Size (e.g., 64KB)64KB64KB
NIC Segments per TSO SegmentFewer (64KB ÷ 1460)More (64KB ÷ 1250 or similar)

🚫 TSO Could Break Only If:

  • TSO/LSO is explicitly disabled at OS or NIC level.
  • The network path drops fragmented packets (rare with TSO since fragmentation doesn’t occur at IP layer).
  • There's a misconfigured firewall/router that drops larger packet bursts or doesn’t respect the new clamped MTU.

 ✅ Recommendation:

You can safely clamp MTU/MSS at the edge or adjust tunnel overhead without breaking TSO. Just ensure:

  • NIC and OS TSO settings remain enabled.
  • Firewalls/routers don’t apply hard limits below your clamped MTU.
  • Your monitoring considers potential rise in packet counts per request (not errors).

 

3. Jumbo Frames

Jumbo Frames are Ethernet frames that exceed the standard Ethernet MTU of 1500 bytes. They are typically configured to carry up to 9000 bytes of payload (sometimes more), allowing more data per packet on high-speed networks. They must be manually enabled and matched end-to-end — and are not suitable for WAN or tunneled traffic.

Common jumbo frame sizes are 9000 bytes, but they can range from 1500 to 9600 bytes depending on the device. 

  • Standard Ethernet MTU: 1500 bytes (payload)
  • Jumbo Frame MTU: Typically 9000 bytes (payload)
  • Jumbo Frame Ethernet Frame Size: ~9018 bytes (includes Ethernet headers and CRC)

🎯 Why Use Jumbo Frames?

  • Lower CPU overhead: Fewer packets for the same amount of data = fewer interrupts and TCP/IP stack processing per byte.
  • Improved throughput: Larger payloads per packet improve the efficiency of high-speed networks (e.g. 10GbE, 40GbE).
  • Better for large transfers: Especially beneficial in storage networks (iSCSI, NFS), data center replication, and video streaming.

  ⚙️ How They Interact with TSO

  • When TSO and Jumbo Frames are both enabled:
  • OS sends a large segment (e.g. 64 KB) to the NIC (via TSO).
  • NIC splits the segment into larger packets due to the higher MTU.

Example :

Setting Packet Size on Wire Packets for 64 KB
MTU 1500 ~1460 bytes (TCP payload) ~44 packets
MTU 9000 (Jumbo) ~8960 bytes (TCP payload) ~8 packets

So with Jumbo Frames:

  • Fewer segments need to be generated.
  • Less per-packet overhead.
  • Faster total data transfer with less CPU load.

 When to Use Jumbo Frames

In controlled environments like data centers, HPC clusters, and backend storage networks . When all devices between sender and receiver are:

  • In the same broadcast domain or VLAN
  • Configured with matching Jumbo MTU (e.g., 9000)

🔬 When Not to Use Jumbo Frames

  • Not all network devices support it:
    • Switches, routers, firewalls, VPN appliances, etc. must all support and be configured for the same jumbo MTU.
  • Path MTU Discovery (PMTUD) can fail if ICMP is blocked:
    • Leading to black holes or silent packet drops if an intermediate hop can't handle jumbo frames.
  • Debugging becomes harder if partial support exists along the path.
  • Increased latency per packet in some workloads (e.g., small request/response).
  • Over the Internet , Over VPNs / SD-WAN / IPsec / GRE tunnels
  • Where intermediate devices may silently drop large frames
  • On consumer-grade or mixed-vendor networks

How MTU Works in Jumbo Frames

  • Sender checks MTU: If sender NIC is configured for Jumbo MTU (e.g., 9000), and the application generates large enough data, the OS/driver builds Jumbo Frames.
  • No fragmentation needed — if the entire path (Layer 2 switches, NICs) supports the larger MTU.

📊 Jumbo Frames and TSO

Feature Effect
Jumbo Frame Increases wire MTU (e.g. 9000), fewer packets needed
TSO Sends large 64KB+ buffers to NIC; NIC segments
Jumbo + TSO NIC splits large TSO buffers into large MTU-sized packets (e.g. 9000 bytes)
Combined Result Lower CPU use, fewer packets, better throughput

🛠️ Steps to Identify Jumbo Frames in Wireshark

✅ 1. Look at "Frame" and "Ethernet II" Lengths

Each captured packet will show you:

  • Frame Length on the Wire (includes L2 headers)
  • Payload length (Ethernet MTU)

  • Standard Ethernet Frame: ~1514 bytes (MTU 1500 + Ethernet header 14)
  • Jumbo Frame: >1518 bytes (e.g., 9014 for MTU 9000)

2. Filter for large frames

Use the Wireshark display filter, To see only Jumbo Frames.:

frame.len > 1514