12 Replies Latest reply on May 17, 2018 8:20 AM by pasokan

    How to stop and start rdma on CentOS 7?

    pasokan

      How to stop and start rdma on CentOS 7?

       

      My testing requires to pause the rdma connectivity and start it back. ifdown ib0 doesn't stop the communication once it is established.

        • Re: How to stop and start rdma on CentOS 7?
          stepheny

          HI there -

             Have you tried rmmod/insmod the rdma modules?

          • Re: How to stop and start rdma on CentOS 7?
            pasokan

            Instead I tried this, ibportstate helps disabling the port but couldn't enable it back

             

            [hmarne@cn37 ~]$ ibstat

            CA 'mlx4_0'

                    CA type: MT4099

                    Number of ports: 1

                    Firmware version: 2.30.8000

                    Hardware version: 1

                    Node GUID: 0x002590fffff76c70

                    System image GUID: 0x002590fffff76c73

                    Port 1:

                            State: Active

                            Physical state: LinkUp

                            Rate: 56

                            Base lid: 94

                            LMC: 0

                            SM lid: 4

                            Capability mask: 0x02514868

                            Port GUID: 0x002590fffff76c71

                            Link layer: InfiniBand

            [hmarne@cn37 ~]$ sudo /usr/sbin/ibportstate 94 1 disable

            Initial CA PortInfo:

            # Port info: Lid 94 port 1

            LinkState:.......................Active

            PhysLinkState:...................LinkUp

            Lid:.............................94

            SMLid:...........................4

            LMC:.............................0

            LinkWidthSupported:..............1X or 4X

            LinkWidthEnabled:................1X or 4X

            LinkWidthActive:.................4X

            LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps

            LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps

            LinkSpeedActive:.................10.0 Gbps

            LinkSpeedExtSupported:...........14.0625 Gbps

            LinkSpeedExtEnabled:.............14.0625 Gbps

            LinkSpeedExtActive:..............14.0625 Gbps

            Mkey:............................<not displayed>

            MkeyLeasePeriod:.................0

            ProtectBits:.....................0

            # MLNX ext Port info: Lid 94 port 1

            StateChangeEnable:...............0x00

            LinkSpeedSupported:..............0x01

            LinkSpeedEnabled:................0x01

            LinkSpeedActive:.................0x00

            Disable may be irreversible

             

            After PortInfo set:

            # Port info: Lid 94 port 1

            LinkState:.......................Active

            PhysLinkState:...................LinkUp

            Lid:.............................94

            SMLid:...........................4

            LMC:.............................0

            LinkWidthSupported:..............1X or 4X

            LinkWidthEnabled:................1X or 4X

            LinkWidthActive:.................4X

            LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps

            LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps

            LinkSpeedActive:.................Extended speed

            LinkSpeedExtSupported:...........14.0625 Gbps

            LinkSpeedExtEnabled:.............14.0625 Gbps

            LinkSpeedExtActive:..............14.0625 Gbps

            Mkey:............................<not displayed>

            MkeyLeasePeriod:.................0

            ProtectBits:.....................0

            [hmarne@cn37 ~]$ ibstat

            CA 'mlx4_0'

                    CA type: MT4099

                    Number of ports: 1

                    Firmware version: 2.30.8000

                    Hardware version: 1

                    Node GUID: 0x002590fffff76c70

                    System image GUID: 0x002590fffff76c73

                    Port 1:

                            State: Down

                            Physical state: Disabled

                            Rate: 10

                            Base lid: 94

                            LMC: 0

                            SM lid: 4

                            Capability mask: 0x02514868

                            Port GUID: 0x002590fffff76c71

                            Link layer: InfiniBand

            [hmarne@cn37 ~]$ sudo /usr/sbin/ibportstate 94 1 query | grep -i state

            ibwarn: [11055] mad_rpc_open_port: can't open UMAD port ((null):0)

            /usr/sbin/ibportstate: iberror: failed: Failed to open '(null)' port '0'

            [hmarne@cn37 ~]$

            • Re: How to stop and start rdma on CentOS 7?
              halr

              I think that the openibd script exists on CentOS 7. Is it /etc/init.d/openibd ? If it does exist, you can do restart or stop and then start.

               

              /etc/init.d/openibd restart

               

              or

               

              service openibd restart

               

              This should do everything needed (including module reloading) for restarting.

               

              -- Hal

                • Re: How to stop and start rdma on CentOS 7?
                  pasokan

                  Stopping openibd requires removing of modules. Since lustre is mounted we can't do that. We want the lustre and fuse to be mounted during the operation only it shouldn't be able to do IO operations

                   

                   

                  You mean disabling the corresponding switch can help in later enabling it ?

                   

                   

                   

                  Get Outlook for Android<https://aka.ms/ghei36>

                    • Re: How to stop and start rdma on CentOS 7?
                      halr

                      Yes, as long as you do this from CA that is not being disabled since switch will still be accessible through other ports. Only thing this does is disable the egress switch port which is peer to remote CA. Then you should be able to re-enable it when desired.

                       

                      I don't know if this will accomplish what you need as I'm not sure of all the lustre interactions.

                       

                      Can you try it and see what happens ?

                        • Re: How to stop and start rdma on CentOS 7?
                          pasokan

                          Hi HAL

                           

                          like may I know how to do this ?  I need bring down the peer [switch port] i.e 17 of remote CA [cn41]

                           

                          Switch: 0xf452140300f83e50 MF0;ime-mlx216-ib-sw-01:SX6512/L12/U1:

                                    41    1[  ] ==( 4X       14.0625 Gbps Active/  LinkUp)==>      97    1[  ] "sn31 HCA-2" ( )

                                    41    2[  ] ==( 4X       14.0625 Gbps Active/  LinkUp)==>     168    1[  ] "sn52 HCA-1" ( )

                                    41    3[  ] ==(                Down/ Polling)==>             [  ] "" ( )

                                    41    4[  ] ==( 4X       14.0625 Gbps Active/  LinkUp)==>     100    1[  ] "sn31 HCA-1" ( )

                                    41    5[  ] ==(                Down/ Polling)==>             [  ] "" ( )

                                    41    6[  ] ==( 4X       14.0625 Gbps Active/  LinkUp)==>     167    1[  ] "sn53 HCA-2" ( )

                                    41    7[  ] ==( 4X       14.0625 Gbps Active/  LinkUp)==>     170    1[  ] "cn43 HCA-1" ( )

                                    41    8[  ] ==(                Down/ Polling)==>             [  ] "" ( )

                                    41    9[  ] ==(                Down/ Polling)==>             [  ] "" ( )

                                    41   10[  ] ==( 4X       14.0625 Gbps Active/  LinkUp)==>     166    1[  ] "sn52 HCA-2" ( )

                                    41   11[  ] ==( 4X       14.0625 Gbps Active/  LinkUp)==>     117    1[  ] "cn42 HCA-1" ( )

                                    41   12[  ] ==(                Down/ Polling)==>             [  ] "" ( )

                                    41   13[  ] ==(                Down/ Polling)==>             [  ] "" ( )

                                    41   14[  ] ==( 4X       14.0625 Gbps Active/  LinkUp)==>     151    1[  ] "sn08 HCA-2" ( )

                                    41   15[  ] ==( 4X       14.0625 Gbps Active/  LinkUp)==>     141    1[  ] "sn22 HCA-4" ( )

                                    41   16[  ] ==(                Down/ Polling)==>             [  ] "" ( )

                                    41   17[  ] ==( 4X       14.0625 Gbps Active/  LinkUp)==>     120    1[  ] "cn41 HCA-1" ( )

                           

                           

                          [root@cn41 ~]# ibstat

                          CA 'mlx4_0'

                              CA type: MT4099

                              Number of ports: 1

                              Firmware version: 2.30.8000

                              Hardware version: 1

                              Node GUID: 0x002590fffff76da4

                              System image GUID: 0x002590fffff76da7

                              Port 1:

                                  State: Active

                                  Physical state: LinkUp

                                  Rate: 56

                                  Base lid: 120

                                  LMC: 0

                                  SM lid: 1

                                  Capability mask: 0x02514868

                                  Port GUID: 0x002590fffff76da5

                                  Link layer: InfiniBand

                          [root@cn41 ~]#