2 Replies Latest reply on Apr 5, 2018 7:43 AM by sijisaula

    Pinging IPoIB address fails

    sijisaula

      Hi Folks,

       

      I've got a strange issue with being able to ping other IP addresses over IB. This is a live environment so I know the issue is with this machine or it's configuration. Where might my issue lie?

       

      [root@transfer ~]# ping -c 3 10.12.0.1

      PING 10.12.0.1 (10.12.0.1) 56(84) bytes of data.

      From 10.12.200.17 icmp_seq=1 Destination Host Unreachable

      From 10.12.200.17 icmp_seq=2 Destination Host Unreachable

      From 10.12.200.17 icmp_seq=3 Destination Host Unreachable

       

      --- 10.12.0.1 ping statistics ---

      3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3000ms

       

      Now, I recently installed the MLNX_OFED_LINUX-4.3-1.0.1.0-rhel6.9-x86_64 driver on a Centos 6.9 system and here is my environment in a nutshell:

       

       

      "ip addr" shows this:

      10: ib0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 256

          link/infiniband a0:00:02:20:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:1e:05:b1 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

          inet 10.12.200.17/16 brd 10.12.255.255 scope global ib0

       

      And interface file looks like this:

      =========

      DEVICE=ib0

      ONBOOT=yes

      NM_CONTROLLED=no

      BOOTPROTO=none

      IPADDR=10.12.200.17

      PREFIX=16

      MTU=1500

      ==========

       

      However, the port is up  according to IB:

       

      [root@transfer ~]# ibv_devinfo

      hca_id: mlx4_0

        transport: InfiniBand (0)

        fw_ver: 2.36.5000

        node_guid: e41d:2d03:001e:05b0

        sys_image_guid: e41d:2d03:001e:05b3

        vendor_id: 0x02c9

        vendor_part_id: 4099

        hw_ver: 0x1

        board_id: DEL1090001019

        phys_port_cnt: 2

        Device ports:

        port: 1

        state: PORT_DOWN (1)

        max_mtu: 4096 (5)

        active_mtu: 4096 (5)

        sm_lid: 0

        port_lid: 0

        port_lmc: 0x00

        link_layer: InfiniBand

       

        port: 2

        state: PORT_ACTIVE (4)

        max_mtu: 4096 (5)

        active_mtu: 4096 (5)

        sm_lid: 1

        port_lid: 29

        port_lmc: 0x00

        link_layer: InfiniBand

       

      I notice the MTU size from "ibv_devinfo" seem to conflict with "ip addr" output

       

      Some excerpts from lspci -vvv | grep -i mell :

       

      04:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

        Subsystem: Mellanox Technologies Device 0065

      ...

        Product Name: CX354A - ConnectX-3 QSFP

        Read-only fields:

        [PN] Part number: 01T7NW              

        [EC] Engineering changes: A00

        [V0] Vendor specific: PCIe Gen3 x8   

      ...

        Capabilities: [18c v1] #19

        Kernel driver in use: mlx4_core

        Kernel modules: mlx4_core

       

       

       

      [root@transfer ~]# ibhosts

      src/query_smp.c:195; umad (DR path slid 0; dlid 0; 0,2,1,4,26 Attr 0x11:0) bad status 110; Connection timed out

      Ca : 0x001e67030068e310 ports 1 "prod-0034 HCA-1"

      Ca : 0x001e67030068e3f8 ports 1 "prod-0026 HCA-1"

      Ca : 0x001e67030068e3a0 ports 1 "prod-0044 HCA-1"

      Ca : 0x001e67030068caf0 ports 1 "prod-0043 HCA-1"

      Ca : 0x001e67030068c9a8 ports 1 "prod-0023 HCA-1"

      Ca : 0x001e67030066de04 ports 1 "prod-0022 HCA-1"

      Ca : 0x001e670300670844 ports 1 "prod-0021 HCA-1"

      Ca : 0x001e67030068ccc0 ports 1 "prod-0020 HCA-1"

      Ca : 0x001e67030068d230 ports 1 "prod-0019 HCA-1"

      Ca : 0x001e67030067257c ports 1 "prod-0047 HCA-1"

      . . .