Slow upload speed for VMWare virtual machines working via pfSense

Posted on

A server stack is the collection of software that forms the operational infrastructure on a given machine. In a computing context, a stack is an ordered pile. A server stack is one type of solution stack — an ordered selection of software that makes it possible to complete a particular task. Like in this post about Slow upload speed for VMWare virtual machines working via pfSense was one problem in server stack that need for a solution. Below are some tips in manage your windows server when you find problem about windows, vmware-esxi, hp-proliant, pfsense, tcp-offload-engine.

We have ProLiant DL360 Gen8 and Gen9 servers running VMWare ESXi 6.0 with virtual machines under various versions of Windows that are routed via pfSense 2.3.4-RELEASE (64-bit) with Open-VM-Tools package 10.1.0,1.

The virtual machines that work via pfSense are demonstrating very low upload speed, for example: ping 2ms, download 134 Mbps, upload 0.25 Mbps (by the way, 0.25 Mbps is acceptable speed for Remote Desktop connections, but, in practice RDP barely works, the client frequently stalls for a few seconds, or refresh occurs in squares, taking 5-10 seconds to refresh the screen, is unstable, or sometimes even reconnects — making the work via RDP practically impossible).

Tweaks on the affected Windows machines like “netsh interface tcp set global autotuninglevel=highlyrestricted” didn’t change anything.

The virtual machines that have direct connection, bypassing pfSense, don’t have these issues – they have about the same upload and download speed.

All virtual machines (pfSense, Windows, etc – all) are using VMXNET3 adapter.

The following options are all unchecked in the pfSense:

[ ] Disable hardware checksum offload
[ ] Disable hardware TCP segmentation offload
[ ] Disable hardware large receive offload

There is no traffic-shape on pfSense. What can be the reason?

If I CHECK the option “Disable hardware large receive offload”, it becomes fast again, but I don’t want to disable it, I want pfSense to use hardware large receive offload with VMWare VMXNET3.

Update: I have upgraded VMWare to latest 6.5 with all patches and pfSense to 3.4.5 BETA, have updated the firmware to latest versions, and it didn’t help.

I want to absolutely confirm the same scenario. Running pfSense on VMware where the upload bandwidth would be painfully slow while download was just fine. For us, it was ONLY if the pfSense VM and the guest VMs were on the same host. When the pfSense VM and host VM were on a different host the problem went away. When disabling the offloads on the pfsense VMs (check the boxes ON) it instantly fixed the problems. I am not sure if it’s only the VMXNET 3 NICs but that is how the pfSense VMs are also configured. I hope this helps others as this is not documented anywhere. I will try to get pfSense to update the VMware configuration page on their site.

I have solved the issue by disabling “Hardware Large Receive Offloading” in pfSense settings (System / Advanced / Networking | Network Interfaces)

There is a checkbox “Disable hardware large receive offload” and I have turned it to “Checked” (ON).

The description says the following on this option:

Checking this option will disable hardware large receive offloading (LRO). This offloading is broken in some hardware drivers, and may impact performance with some specific NICs. This will take effect after a machine reboot or re-configure of each interface.

Other options are unchecked. So now the options in the “Network Interfaces” are the following:

[ ] Disable hardware checksum offload
[ ] Disable hardware TCP segmentation offload
[✓] Disable hardware large receive offload

According to HP documentation, the network adapters on Gen8/Gen9 (model 331 based on the Broadcom BCM5719 chipset) support standard TCP/IP offloading techniques including:
– TCP/IP, UDP checksum offload (TCO) (moves the TCP and IP checksum offloading from the CPU to the network adapter).
– Large send offload (LSO) or TCP segmentation offload (TSO) (allows the TCP segmentation to be handled by the adapter rather than the CPU).

That’s what pfSense writes about these features:

The settings for Hardware TCP Segmentation Offload (TSO) and Hardware Large Receive Offload (LRO) under System > Advanced on the Networking tab default to checked (disabled) for good reason. Nearly all hardware/drivers have issues with these settings, and they can lead to throughput issues. Ensure the options are checked. Sometimes disabling via sysctl is also necessary.

In fact there were not hardware/drivers have issues, but a misconfiguration. LRO and TSO should never be enabled on a router. Only if pfSense is configured as an end-point (e.g. a DNS server), these options may be enabled.

Let me quote from the FreeBSD bugtracking entry:

From my testing this is not a bug and everything is working as designed. I am seeing a large decrease in performance when LRO is turned on and using pfSense as a gateway. This is due to the originating packets having the IP DF (don’t fragment) flag set which then gets combined into larger packets via LRO. When this (larger) packet needs to be fragmented to match the other NIC the FreeBSD kernel sees the DF flag, drops the packet, and then sends back an ICMP “unreachable – need to frag” message to the sender. The reason it works at all is due to other traffic which disallows the LRO to occur and some packets get forwarded. One test I did was turning LRO on and using scp to put a file onto the pfSense appliance which resulted in good performance (not seeing the same drop in performance). I would be interested if you 1) see good performance with LRO turned on and scp a large file to the appliance and 2) see ICMP “need to frag” with LRO turned on and scp to a machine on the remote side. Since the pfSense appliance is being used as a gateway you should leave LRO turned off.

I’ve experimented these problem sometimes, and, the fast solution are: reboot machine. Windows mangement of memory it’s not the best, and they need a reboot sometimes.

If reboot doesn’t work, determine the problem. Are the servers or the client? Servers are on TS mode, or TS for administration only? Are you connecting to console or to a standard remote session?

Think, also, if they’re all “new” machines (servvers, supported ones) they can get all the same update. Maybe, you need a update on the client to work with the changes of the terminal server service.

As direct response, I’ve administrating a group of 15 servers for more than 6 years. From Windows 2000 to Windows 2012 R2 ones. I have these problem sometimes, but a 90% of times they get solved with a reboot. The another 10%, with a update of the client.

My recommendation about this, use WSUS service, and manage the approve of all updates installed on the servers.

P.s. If you cannot get the problem solved, you can use “system restore” utility to restore machine to one week ago, before updates were installed. Uninstallation doesn’t reconfigure, but, system restore reverts all system to a past state (uninstalling the app, undoing config changes, but also, deleting your documents or another things on the machine).

Leave a Reply

Your email address will not be published. Required fields are marked *