2 Replies Latest reply on Apr 5, 2018 7:43 AM by sijisaula

    Pinging IPoIB address fails


      Hi Folks,


      I've got a strange issue with being able to ping other IP addresses over IB. This is a live environment so I know the issue is with this machine or it's configuration. Where might my issue lie?


      [root@transfer ~]# ping -c 3

      PING ( 56(84) bytes of data.

      From icmp_seq=1 Destination Host Unreachable

      From icmp_seq=2 Destination Host Unreachable

      From icmp_seq=3 Destination Host Unreachable


      --- ping statistics ---

      3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3000ms


      Now, I recently installed the MLNX_OFED_LINUX-4.3- driver on a Centos 6.9 system and here is my environment in a nutshell:



      "ip addr" shows this:

      10: ib0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 256

          link/infiniband a0:00:02:20:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:1e:05:b1 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

          inet brd scope global ib0


      And interface file looks like this:











      However, the port is up  according to IB:


      [root@transfer ~]# ibv_devinfo

      hca_id: mlx4_0

        transport: InfiniBand (0)

        fw_ver: 2.36.5000

        node_guid: e41d:2d03:001e:05b0

        sys_image_guid: e41d:2d03:001e:05b3

        vendor_id: 0x02c9

        vendor_part_id: 4099

        hw_ver: 0x1

        board_id: DEL1090001019

        phys_port_cnt: 2

        Device ports:

        port: 1

        state: PORT_DOWN (1)

        max_mtu: 4096 (5)

        active_mtu: 4096 (5)

        sm_lid: 0

        port_lid: 0

        port_lmc: 0x00

        link_layer: InfiniBand


        port: 2

        state: PORT_ACTIVE (4)

        max_mtu: 4096 (5)

        active_mtu: 4096 (5)

        sm_lid: 1

        port_lid: 29

        port_lmc: 0x00

        link_layer: InfiniBand


      I notice the MTU size from "ibv_devinfo" seem to conflict with "ip addr" output


      Some excerpts from lspci -vvv | grep -i mell :


      04:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

        Subsystem: Mellanox Technologies Device 0065


        Product Name: CX354A - ConnectX-3 QSFP

        Read-only fields:

        [PN] Part number: 01T7NW              

        [EC] Engineering changes: A00

        [V0] Vendor specific: PCIe Gen3 x8   


        Capabilities: [18c v1] #19

        Kernel driver in use: mlx4_core

        Kernel modules: mlx4_core




      [root@transfer ~]# ibhosts

      src/query_smp.c:195; umad (DR path slid 0; dlid 0; 0,2,1,4,26 Attr 0x11:0) bad status 110; Connection timed out

      Ca : 0x001e67030068e310 ports 1 "prod-0034 HCA-1"

      Ca : 0x001e67030068e3f8 ports 1 "prod-0026 HCA-1"

      Ca : 0x001e67030068e3a0 ports 1 "prod-0044 HCA-1"

      Ca : 0x001e67030068caf0 ports 1 "prod-0043 HCA-1"

      Ca : 0x001e67030068c9a8 ports 1 "prod-0023 HCA-1"

      Ca : 0x001e67030066de04 ports 1 "prod-0022 HCA-1"

      Ca : 0x001e670300670844 ports 1 "prod-0021 HCA-1"

      Ca : 0x001e67030068ccc0 ports 1 "prod-0020 HCA-1"

      Ca : 0x001e67030068d230 ports 1 "prod-0019 HCA-1"

      Ca : 0x001e67030067257c ports 1 "prod-0047 HCA-1"

      . . .