3 Replies Latest reply on Apr 19, 2016 7:53 AM by mpoleg

    Joining + Leaving multicast group leads to hanging on ib_close_al

    mpoleg

      Hi

      Operations ib_join_mcast and ib_leave_mcast in my application return success but ib_close_al hangs.

      If I do not use ib_join_mcast and ib_leave_mcast than ib_close_al does not hang and the application successfully exits.

      Joining was checked to be successfull because it was possible to receive messages sent to the multicast group.

      But even if we do not try to receive any messages but just call  ib_join_mcast and ib_leave_mcast one after another the application infinitely hangs on ib_close_al.

       

      Are there any suggestions?

       

      Output from join callback:

      ---------------------------------------------------

      Callback: ib_pfn_mcast_cb:

       

      status=IB_SUCCESS

      error_status=0

      h_mcast =0x0000000000340FC0

       

      p_member_rec:

      mgid=0xFF12401BFFFF0000AB000000000000;

      mlid=49160;

      qkey=2843;

      pkey=65535;

      port_gid=0xFE80000000000000E41D2D03007536D1

       

       

      Handle from ib_join_mcast :

      ---------------------------------------------------

      handleMcast=0x0000000000340FC0

       

       

      Output from ib_leave_mcast :

      ---------------------------------------------------

      Leaving mcast group...0x0000000000340FC0

      OK

        • Re: Joining + Leaving multicast group leads to hanging on ib_close_al
          mpoleg

          Additional info:

           

          Init data for ib_join_mcast

          (joinInit - data from config - they are correct because messages were received):

           

              ib_mcast_req_t mcast_req;

              memset(&mcast_req, 0, sizeof(mcast_req));

           

           

              mcast_req.create = 1;

              mcast_req.mcast_context = this;

              mcast_req.pfn_mcast_cb = &ib_pfn_mcast_cb;

              mcast_req.timeout_ms = (uint32_t)-1;

              mcast_req.retry_cnt = 3;

              mcast_req.flags = IB_FLAGS_SYNC;

              mcast_req.port_guid = joinInit.recvPort_.guid_;

              mcast_req.pkey_index = 0;

           

           

              mcast_req.member_rec.mgid = joinInit.mcastGroupGid_;

              mcast_req.member_rec.pkey = joinInit.pkey_.net_;

              mcast_req.member_rec.qkey = joinInit.qkey_;

              mcast_req.member_rec.rate = joinInit.rate_;

           

           

              mcast_req.member_rec.port_gid = joinInit.recvPort_.gid_;

              mcast_req.member_rec.mtu = joinInit.mtu_;

              mcast_req.member_rec.tclass = joinInit.serviceLevel_;

           

           

              mcast_req.member_rec.pkt_life = 0x81;

              mcast_req.member_rec.sl_flow_hop = 0;

              mcast_req.member_rec.scope_state = 0x01;

              mcast_req.member_rec.proxy_join = 0;

            • Re: Joining + Leaving multicast group leads to hanging on ib_close_al
              mpoleg

              Environment:

               

              IB driver:

              Mellanox OFED for Windows - WinOF VPI Rev 5.10.50000

              Windows Client 8.1

               

              Hardware (output from vstat):

              hca_idx=0

              uplink={BUS=PCI_E Gen3, SPEED=8.0 Gbps, WIDTH=x8, CAPS=8.0*x8}

              MSI-X={ENABLED=1, SUPPORTED=128, GRANTED=6, ALL_MASKED=N}

              vendor_id=0x02c9

              vendor_part_id=4099

              hw_ver=0x0

              fw_ver=2.35.5100

              PSID=MT_1060110018

              node_guid=e41d:2d03:0075:36d0

              num_phys_ports=1

                      port=1

                      port_guid=e41d:2d03:0075:36d1

                      port_state=PORT_ACTIVE (4)

                      link_speed=10.00 Gbps

                      link_width=4x (2)

                      rate=40.00 Gbps

                      real_rate=32.00 Gbps (QDR)

                      port_phys_state=LINK_UP (5)

                      active_speed=10.00 Gbps

                      sm_lid=0x0003

                      port_lid=0x0001

                      port_lmc=0x0

                      transport=IB

                      max_mtu=4096 (5)

                      active_mtu=4096 (5)

                      GID[0]=fe80:0000:0000:0000:e41d:2d03:0075:36d1

               

              Network:

              2 desktops with 1 HCA adapter each

               

              OpenSM (its own output):

              -------------------------------------------------

              OpenSM 3.3.11 UMAD

              Command Line Arguments:

              verbose option -D = 0xb

              d level = 0x2

              Debug mode: Force Log Flush

              Creating new log file

              Log file max size is 1 MBytes

              -------------------------------------------------

              OpenSM 3.3.11 UMAD

              Entering DISCOVERING state

              Using default GUID 0xe41d2d03006f0cb1

              Entering MASTER state

              SUBNET UP

              osm_log: log file exceeds the limit 1048576. Truncating.

              ...

              -------------------------------------------------

            • Re: Joining + Leaving multicast group leads to hanging on ib_close_al
              mpoleg

              Actually the application does not infinitely hang on ib_close_al as described above - it timeouts in a minute or so.