5 Replies Latest reply on Jan 13, 2014 10:59 PM by alexey2k2

    Unstable operation of WinOFED 4.40.0 with HCA fw 2.30.3200 on Win2012 Server

    alexey2k2

      Hi,

       

      I've encountered an issue of unstable operation of WinOFED 4.40.0 with HCA firmware 2.30.3200 on Windows 2012 Server.

      Last week I'm going mad trying to get it working well.

       

      I have a system of two Mellanox Infiniscale-IV IS5023 switches, three hosts running Windows 2012 Server / Windows 2012 Server R2 and two hosts running Ubuntu Linux 12.04 LTS. All hosts are equiped with Connect-X3 VPI Mellanox network cards ( MCX354A-QCBT ). Each host is plugged into both switches, swithes are connected to each other and use shared fabric. All hosts are based on SuperMicro X9DRW-7TPF mainboards with Intel Xeon E5-2667 v2 CPUs and DDR3-1866 memory.

      WinOFED 4.40.0 in installed on Win Server 2012, WinOFED 4.55 on Win Server 2012R2. Both linux hosts are routers MLNX_OFED_LINUX-2.0-3.0.0-ubuntu12.04-x86_64 packet installed, IB interfaces are joined into active-backup bond by ifenslave means. Following modules are loaded on Linux routers: mlx4_core, mlx4_ib, b_umad, ib_mad, ib_ipoib, ib_uverbs. All HCAs are burned with 2.30.3200 firmware.

      As test I use L3 icmp ping. In case of Linux - Linux communications all is fine. I'm doing flood ping through Infiniband network with amazing results: rtt min/avg/max/mdev = 0.011/0.013/1.682/0.002 ms. But I see different when it goes to Windows.

      Flood ping from linux host to w2012 gives almost same good latency numbers (rtt min/avg/max/mdev = 0.022/0.024/2.492/0.021 ms), but packet loss rate is always about 1-2%. At same time, IBping shows no packet loss at all and ibdiagnet on Linux show no warnings or errors, so I conclude IB works good and issue exist higher than L2.

      So I've decided to try Win2012R2 with 4.55 OFED version. It resolved issue with packet loss, but also gave latency growth: rtt min/avg/max/mdev = 0.093/0.102/17.550/0.101 ms. Digging this issue I've found that in other system I have no issues like it, and the difference is a firmware version of HCA. Win2012 with OFED 4.40.0 goes fine with firmware 2.11.500. But downgrading to fw 2.11.500 on my servers didn't help, with all same versions of fw and software still I see packet loss.

      Still I want it all together - low latency, no packet loss, and latest software and firmware versions.

       

       

      Running out of ideas about it, any comments and advises are appreciated.