1 Reply Latest reply on Sep 29, 2017 3:54 PM by martijn@mellanox.com

    Trouble making Infiniband running udaddy

    maverickjin8

      Hello currently I am using Mellanox ConnectX-3 Adapter for test

      currently the pingpong test that was included in the Mellanox install package (ibv_rc_pingpong) are working

       

      However the tests such as rping and udaddy that were mentioned in the post HowTo Enable, Verify and Troubleshoot RDMA

      https://community.mellanox.com/docs/DOC-2086#jive_content_id_4_rping

       

      None of the tests will run

      here are the error result below

      sungho@c1n15:~$ udaddy -s 172.23.10.30                           │sungho@c1n14:~$                

      udaddy: starting client                                          │sungho@c1n14:~$                

      udaddy: connecting                                               │sungho@c1n14:~$ udaddy         

      udaddy: event: RDMA_CM_EVENT_ADDR_ERROR, error: -19              │udaddy: starting server        

      test complete                                                   

      return status -19            

        

       

       

      I have two servers running connected with a switch,

      and the infiniband ethernets are all pingable with each other

      and all the ethernets are installed and running

       

      However I have doubts about the arp table

      because it doesn't seem to look like to be connected properly. (listed below)

       

      here is the information of the two servers below

      Do you think I need to statistically add the arp table? or is there something fundamentally wrong?

       

      server (A)

      sungho@c1n14:/usr/bin$ ibstat

      CA 'mlx4_0'

              CA type: MT4099

              Number of ports: 1

              Firmware version: 2.42.5000

              Hardware version: 1

              Node GUID: 0x7cfe9003009a7c30

              System image GUID: 0x7cfe9003009a7c33

              Port 1:

                      State: Active

                      Physical state: LinkUp

                      Rate: 56

                      Base lid: 3

                      LMC: 0

                      SM lid: 3

                      Capability mask: 0x0251486a

                      Port GUID: 0x7cfe9003009a7c31

                      Link layer: InfiniBand

      Kernel IP routing table

      Destination     Gateway         Genmask         Flags Metric Ref    Use Iface

      0.0.0.0         172.23.1.1      0.0.0.0         UG    0      0        0 enp1s0f0

      172.23.0.0      0.0.0.0         255.255.0.0     U     0      0        0 enp1s0f0

      172.23.0.0      0.0.0.0         255.255.0.0     U     0      0        0 ib0

      sungho@c1n14:/usr/bin$ arp -n

      Address                  HWtype  HWaddress           Flags Mask            Iface

      172.23.10.1              ether   0c:c4:7a:3a:35:88   C                     enp1s0f0

      172.23.10.15             ether   0c:c4:7a:3a:35:72   C                     enp1s0f0

      172.23.1.1               ether   00:1b:21:5b:6a:a8   C                     enp1s0f0

      enp1s0f0  Link encap:Ethernet  HWaddr 0c:c4:7a:3a:35:70

                inet addr:172.23.10.14  Bcast:172.23.255.255  Mask:255.255.0.0

                inet6 addr: fe80::ec4:7aff:fe3a:3570/64 Scope:Link

                UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

                RX packets:12438 errors:0 dropped:5886 overruns:0 frame:0

                TX packets:5861 errors:0 dropped:0 overruns:0 carrier:0

                collisions:0 txqueuelen:1000

                RX bytes:2356740 (2.3 MB)  TX bytes:836306 (836.3 KB)

       

      ib0       Link encap:UNSPEC  HWaddr A0-00-02-20-FE-80-00-00-00-00-00-00-00-00-00-00

                inet addr:172.23.10.30  Bcast:172.23.255.255  Mask:255.255.0.0

                inet6 addr: fe80::7efe:9003:9a:7c31/64 Scope:Link

                UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1

                RX packets:0 errors:0 dropped:0 overruns:0 frame:0

                TX packets:8 errors:0 dropped:0 overruns:0 carrier:0

                collisions:0 txqueuelen:256

                RX bytes:0 (0.0 B)  TX bytes:616 (616.0 B)

       

      lo        Link encap:Local Loopback

                inet addr:127.0.0.1  Mask:255.0.0.0

                inet6 addr: ::1/128 Scope:Host

                UP LOOPBACK RUNNING  MTU:65536  Metric:1

                RX packets:189 errors:0 dropped:0 overruns:0 frame:0

                TX packets:189 errors:0 dropped:0 overruns:0 carrier:0

                collisions:0 txqueuelen:1

                RX bytes:13912 (13.9 KB)  TX bytes:13912 (13.9 KB)

       

      server (B) 

      sungho@c1n15:~$ ibstat

      CA 'mlx4_0'

              CA type: MT4099

              Number of ports: 1

              Firmware version: 2.42.5000

              Hardware version: 1

              Node GUID: 0x7cfe9003009a6360

              System image GUID: 0x7cfe9003009a6363

              Port 1:

                      State: Active

                      Physical state: LinkUp

                      Rate: 56

                      Base lid: 1

                      LMC: 0

                      SM lid: 3

                      Capability mask: 0x02514868

                      Port GUID: 0x7cfe9003009a6361

                      Link layer: InfiniBand

      sungho@c1n15:~$ route -n

      Kernel IP routing table

      Destination     Gateway         Genmask         Flags Metric Ref    Use Iface

      0.0.0.0         172.23.1.1      0.0.0.0         UG    0      0        0 enp1s0f0

      172.23.0.0      0.0.0.0         255.255.0.0     U     0      0        0 enp1s0f0

      172.23.0.0      0.0.0.0         255.255.0.0     U     0      0        0 ib0

      sungho@c1n15:~$ arp -n

      Address                  HWtype  HWaddress           Flags Mask            Iface

      172.23.10.14             ether   0c:c4:7a:3a:35:70   C                     enp1s0f0

      172.23.10.1              ether   0c:c4:7a:3a:35:88   C                     enp1s0f0

      172.23.10.30             ether   0c:c4:7a:3a:35:70   C                     enp1s0f0

      172.23.1.1               ether   00:1b:21:5b:6a:a8   C                     enp1s0f0

       

      enp1s0f0  Link encap:Ethernet  HWaddr 0c:c4:7a:3a:35:72

                inet addr:172.23.10.15  Bcast:172.23.255.255  Mask:255.255.0.0

                inet6 addr: fe80::ec4:7aff:fe3a:3572/64 Scope:Link

                UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

                RX packets:19432 errors:0 dropped:5938 overruns:0 frame:0

                TX packets:8783 errors:0 dropped:0 overruns:0 carrier:0

                collisions:0 txqueuelen:1000

                RX bytes:8246898 (8.2 MB)  TX bytes:1050793 (1.0 MB)

       

      ib0       Link encap:UNSPEC  HWaddr A0-00-02-20-FE-80-00-00-00-00-00-00-00-00-00-00

                inet addr:172.23.10.31  Bcast:172.23.255.255  Mask:255.255.0.0

                inet6 addr: fe80::7efe:9003:9a:6361/64 Scope:Link

                UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1

                RX packets:0 errors:0 dropped:0 overruns:0 frame:0

                TX packets:16 errors:0 dropped:0 overruns:0 carrier:0

                collisions:0 txqueuelen:256

                RX bytes:0 (0.0 B)  TX bytes:1232 (1.2 KB)

       

      lo        Link encap:Local Loopback

                inet addr:127.0.0.1  Mask:255.0.0.0

                inet6 addr: ::1/128 Scope:Host

                UP LOOPBACK RUNNING  MTU:65536  Metric:1

                RX packets:109 errors:0 dropped:0 overruns:0 frame:0

                TX packets:109 errors:0 dropped:0 overruns:0 carrier:0

                collisions:0 txqueuelen:1

                RX bytes:7992 (7.9 KB)  TX bytes:7992 (7.9 KB)

        • Re: Trouble making Infiniband running udaddy
          martijn@mellanox.com

          Hi Sungho,

           

          Thank you for posting your question on the Mellanox Community.

           

          In your environment, when using multiple interfaces in the same address range, please bind the address on which you want to run udaddy/rping and / or ib_send_bw

           

          For example:

          rping - Server

          # rping -d -s -a <ip-address-of-ib0>

          rping - Client

          # rping -d -c -a <ip-address-of-server>

           

          udaddy - Server

          # udaddy -b <ip-address-of-ib0>

          udaddy - Client

          # udaddy -b <ip-address-of-ib0> -s <ip-address-of-server>

           

          ib_send_bw - Server

          # ib_send_bw -d <ib-dev> -p <port> --report_gbits -R -a -F

          Example: # ib_send_bw -d mlx5_0 -p 1 --report_gbits -R -a -F

          ib_send_bw - Client

          # ib_send_bw -d <ib-dev> -p <port> <IPoIB-of-server> --report_gbits -a -R -F

          Example: # ib_send_bw -d mlx4_0 -p1 1.1.1.101 --report_gbits -a -R -F

           

          In our lab, we have seen no issues running the above tests. All tests established and confirmed RDMA connectivity.

           

          If you still experiencing issues, running the provided example, we recommend you to open a Support Case with Mellanox Technical Support.

           

          Thanks.

           

          Cheers,

          ~Martijn