1 Reply Latest reply on Jul 14, 2016 6:21 AM by karen

    Slow speeds and PortXmitWait - Any tips?

    thildemar

      Hi Everyone,

      I have a new setup using 3 windows server hosts running WinOF with dual port Connectx-2 cards.  These connect to two 4036 switches (one port to each switch and two links between switches).  Here is the netdiscover:

      #

      # Topology file: generated on Thu Jun 30 10:32:28 2016

      #

      # Initiated from node 0008f10500203b28 port 0008f10500203b28

       

       

      vendid=0x8f1

      devid=0x5a5a

      sysimgguid=0x8f10500109553

      switchguid=0x8f10500109552(8f10500109552)

      Switch  36 "S-0008f10500109552"        # "Mellanox 4036 # 4036-SW2" enhanced port 0 lid 6 lmc 0

      [1]    "S-0008f10500203b28"[1]        # "Mellanox 4036 # 4036-SW1" lid 1 4xQDR

      [2]    "S-0008f10500203b28"[2]        # "Mellanox 4036 # 4036-SW1" lid 1 4xQDR

      [34]    "H-0002c903004e445a"[1](2c903004e445b)          # "IGA-S2D1" lid 2 4xQDR

      [35]    "H-0008f104039a3c1c"[2](8f104039a3c1e)          # "IGA-S2D2" lid 5 4xQDR

      [36]    "H-0008f104039a4e3c"[2](8f104039a4e3e)          # "IGA-S2D3" lid 8 4xQDR

       

       

      vendid=0x8f1

      devid=0x5a5a

      sysimgguid=0x8f10500203b29

      switchguid=0x8f10500203b28(8f10500203b28)

      Switch  36 "S-0008f10500203b28"        # "Mellanox 4036 # 4036-SW1" enhanced port 0 lid 1 lmc 0

      [1]    "S-0008f10500109552"[1]        # "Mellanox 4036 # 4036-SW2" lid 6 4xQDR

      [2]    "S-0008f10500109552"[2]        # "Mellanox 4036 # 4036-SW2" lid 6 4xQDR

      [34]    "H-0002c903004e445a"[2](2c903004e445c)          # "IGA-S2D1" lid 3 4xQDR

      [35]    "H-0008f104039a3c1c"[1](8f104039a3c1d)          # "IGA-S2D2" lid 4 4xQDR

      [36]    "H-0008f104039a4e3c"[1](8f104039a4e3d)          # "IGA-S2D3" lid 7 4xQDR

       

       

      vendid=0x2c9

      devid=0x673c

      sysimgguid=0x8f104039a4e3f

      caguid=0x8f104039a4e3c

      Ca      2 "H-0008f104039a4e3c"          # "IGA-S2D3"

      [1](8f104039a4e3d)      "S-0008f10500203b28"[36]                # lid 7 lmc 0 "Mellanox 4036 # 4036-SW1" lid 1 4xQDR

      [2](8f104039a4e3e)      "S-0008f10500109552"[36]                # lid 8 lmc 0 "Mellanox 4036 # 4036-SW2" lid 6 4xQDR

       

       

      vendid=0x2c9

      devid=0x673c

      sysimgguid=0x8f104039a3c1f

      caguid=0x8f104039a3c1c

      Ca      2 "H-0008f104039a3c1c"          # "IGA-S2D2"

      [1](8f104039a3c1d)      "S-0008f10500203b28"[35]                # lid 4 lmc 0 "Mellanox 4036 # 4036-SW1" lid 1 4xQDR

      [2](8f104039a3c1e)      "S-0008f10500109552"[35]                # lid 5 lmc 0 "Mellanox 4036 # 4036-SW2" lid 6 4xQDR

       

       

      vendid=0x2c9

      devid=0x673c

      sysimgguid=0x2c903004e445d

      caguid=0x2c903004e445a

      Ca      2 "H-0002c903004e445a"          # "IGA-S2D1"

      [1](2c903004e445b)      "S-0008f10500109552"[34]                # lid 2 lmc 0 "Mellanox 4036 # 4036-SW2" lid 6 4xQDR

      [2](2c903004e445c)      "S-0008f10500203b28"[34]                # lid 3 lmc 0 "Mellanox 4036 # 4036-SW1" lid 1 4xQDR

      4036-SW1(utilities)#

      Everything us up and transmitting data, but I have two concerns.  ntttcp tests are only getting 1500-1800 MB/Sec, I would expect a bit more from QDR even with overhead.  Can anyone tell me if this is normal or what could be tuned to improve? Additionally ibqueryerrorsis showing an increasing increasing PortXmitWait cound on any sending interface while I test.  From what I can see some increase here is normal, but these seem high.

      PS C:\> ibqueryerrors

      Errors for "IGA-S2D3"

         GUID 0x8f104039a4e3d port 1: [PortXmitWait == 1]

         GUID 0x8f104039a4e3e port 2: [PortXmitWait == 541647202]

      Errors for 0x8f10500203b28 "Mellanox 4036 # 4036-SW1"

         GUID 0x8f10500203b28 port ALL: [PortXmitWait == 86266712]

         GUID 0x8f10500203b28 port 1: [PortXmitWait == 33430055]

         GUID 0x8f10500203b28 port 34: [PortXmitWait == 52836657]

      Errors for 0x8f10500109552 "Mellanox 4036 # 4036-SW2"

         GUID 0x8f10500109552 port ALL: [PortXmitWait == 2344169726]

         GUID 0x8f10500109552 port 0: [PortXmitWait == 261]

         GUID 0x8f10500109552 port 34: [PortXmitWait == 2344169465]

      Errors for "IGA-S2D1"

         GUID 0x2c903004e445b port 1: [PortXmitWait == 59]

       

      1500MB/sec is not terrible, but it is closer to a 15gbps connection than the 32gbps that this should be.  Anyone have any hints?

       

      Thanks!