1 Reply Latest reply on May 23, 2017 5:21 AM by inatec

    100GB CX4 + MSN2100 Switch -> slow speed (2-9Gb/s) (solved)

    inatec

      Hello,

       

      I'm new to the whole Mellanox stuff and was able to get the card and the switch working, as I mean "network just works" :-)

       

      We have per node:

      • Dual port MT27700 Family [ConnectX-4] (16x PCIx Gen3)
      • MSN2100 switch
      • Connected with QSFP28 (100GB DAC cables)
      • 2 x E5-2620 v4 @ 2.10GHz
      • 2x 16GB Ram DDR4
      • Supermicro X10DRi
      • Debian Jessie (Proxmox 4.x with 4.4.59-1-pve)
      • Latest firmware for the CX4 and the switch
      • Module version 4.0-2.0.0
      • Installed packages: mlnx-en-dkms / mlnx-en-eth-only / mlnx-en-utils
      • Network basic configured with MTU 9000 (NIC + switch ports)

       

      I tested the plain speed with iperf2.x and iperf3.x

      • iperf2
      [  4]  0.0-10.2 sec  16.1 GBytes  13.6 Gbits/sec
      [ 11]  0.0-10.2 sec  7.68 GBytes  6.49 Gbits/sec
      [ 10]  0.0-12.9 sec  3.50 MBytes  2.28 Mbits/sec
      [  6]  0.0-14.3 sec  7.25 MBytes  4.25 Mbits/sec
      [  9]  0.0-15.4 sec  8.49 GBytes  4.75 Gbits/sec
      [  8]  0.0-19.2 sec  8.00 MBytes  3.49 Mbits/sec
      [  5]  0.0-26.0 sec  4.12 MBytes  1.33 Mbits/sec
      [  7]  0.0-26.0 sec  3.12 MBytes  1.01 Mbits/sec
      [SUM]  0.0-26.0 sec  32.3 GBytes  10.7 Gbits/sec
      
      • iperf3
      [ ID] Interval           Transfer     Bandwidth       Retr
      [  4]   0.00-10.00  sec  5.28 GBytes  4.53 Gbits/sec  1213             sender
      [  4]   0.00-10.00  sec  5.26 GBytes  4.52 Gbits/sec                  receiver
      [  6]   0.00-10.00  sec  5.47 GBytes  4.70 Gbits/sec  1029             sender
      [  6]   0.00-10.00  sec  5.46 GBytes  4.69 Gbits/sec                  receiver
      [  8]   0.00-10.00  sec  4.70 GBytes  4.04 Gbits/sec  1064             sender
      [  8]   0.00-10.00  sec  4.70 GBytes  4.04 Gbits/sec                  receiver
      [ 10]   0.00-10.00  sec  5.64 GBytes  4.85 Gbits/sec  927             sender
      [ 10]   0.00-10.00  sec  5.62 GBytes  4.83 Gbits/sec                  receiver
      [ 12]   0.00-10.00  sec  3.60 GBytes  3.10 Gbits/sec  716             sender
      [ 12]   0.00-10.00  sec  3.59 GBytes  3.08 Gbits/sec                  receiver
      [ 14]   0.00-10.00  sec  4.80 GBytes  4.12 Gbits/sec  1240             sender
      [ 14]   0.00-10.00  sec  4.78 GBytes  4.11 Gbits/sec                  receiver
      [ 16]   0.00-10.00  sec  5.26 GBytes  4.52 Gbits/sec  1154             sender
      [ 16]   0.00-10.00  sec  5.26 GBytes  4.52 Gbits/sec                  receiver
      [ 18]   0.00-10.00  sec  5.98 GBytes  5.14 Gbits/sec  969             sender
      [ 18]   0.00-10.00  sec  5.98 GBytes  5.14 Gbits/sec                  receiver
      [SUM]   0.00-10.00  sec  40.7 GBytes  35.0 Gbits/sec  8312             sender
      [SUM]   0.00-10.00  sec  40.7 GBytes  34.9 Gbits/sec                  receiver
      

      iperf was started with iperf -c <ip>  -P8.

      So it is much slower, than the Mellanox examples, which reaches over 11Gb/s. If I try setting the sysctl examples, than the speed goes mostly down. So I'm searching the handbrake ..

       

      ~# ethtool eth4
      Settings for eth4:
          Supported ports: [ FIBRE Backplane ]
          Supported link modes:   1000baseKX/Full
                                  10000baseKR/Full
                                  40000baseKR4/Full
                                  40000baseCR4/Full
                                  40000baseSR4/Full
                                  40000baseLR4/Full
          Supported pause frame use: Symmetric Receive-only
          Supports auto-negotiation: Yes
          Advertised link modes:  1000baseKX/Full
                                  10000baseKR/Full
                                  40000baseKR4/Full
                                  40000baseCR4/Full
                                  40000baseSR4/Full
                                  40000baseLR4/Full
          Advertised pause frame use: No
          Advertised auto-negotiation: Yes
          Link partner advertised link modes:  Not reported
          Link partner advertised pause frame use: No
          Link partner advertised auto-negotiation: Yes
          Speed: 100000Mb/s
          Duplex: Full
          Port: Direct Attach Copper
          PHYAD: 0
          Transceiver: internal
          Auto-negotiation: on
          Supports Wake-on: d
          Wake-on: d
          Current message level: 0x00000004 (4)
                         link
          Link detected: yes
      

      What is a bit strange, I used only the DEB packages and didn't find the mlxconfig, but the mstconfig. With that tool, I switched to ethernet protocol:

      mstconfig -y -d 02:00.0 set LINK_TYPE_P1=2

       

      It would be nice, if someone can help, to get over 10Gb/s :-)

       

      cu denny

        • Re: 100GB CX4 + MSN2100 Switch -> slow speed (2-9Gb/s)
          inatec

          Hello,

           

          I solved the problem with using two DIMMS per CPU socket. The throughput jumps from ~6GB up to 13GB (~11 Gb/s).  The next one I did, was to change the PCI slots a bit, so that the CX4 goes to CPU2 and some other cards now handled by CPU1. In the first test cases (all iperf2) the throughput jumps between ~9 and 14GB, after changing the PCI slots, the values more constantly are between 11 and 13GB (~10-11 Gb/s).

           

          With the settings:

           

          ## MLXNET tuning parameters ##

          net.core.rmem_max = 2147483647

          net.core.wmem_max = 2147483647

           

          net.ipv4.tcp_rmem = 4096 87380 2147483647

          net.ipv4.tcp_wmem = 4096 87380 2147483647

          ## END MLXNET ##

           

          I get:

           

          [ 10]  0.0-10.0 sec  10.6 GBytes  9.11 Gbits/sec

          [  4]  0.0-10.0 sec  12.1 GBytes  10.4 Gbits/sec

          [  5]  0.0-10.0 sec  12.5 GBytes  10.8 Gbits/sec

          [  6]  0.0-10.0 sec  13.2 GBytes  11.4 Gbits/sec

          [  3]  0.0-10.0 sec  15.0 GBytes  12.9 Gbits/sec

          [  7]  0.0-10.0 sec  15.0 GBytes  12.9 Gbits/sec

          [  8]  0.0-10.0 sec  12.0 GBytes  10.3 Gbits/sec

          [  9]  0.0-10.0 sec  12.9 GBytes  11.1 Gbits/sec

          [SUM]  0.0-10.0 sec   103 GBytes  88.8 Gbits/sec

           

          cu denny