1 Reply Latest reply on Aug 8, 2016 8:31 AM by alkx

    Kernel panic while booting linux


      Kernel panics while booting linux if mellanox card is connected to the network. It boots okay if I disconnect the card.

      (after it successfully boots I can connect it to the network. though it sometime(not always) causes host to hang when I run ping over the network, for which I don't have much details to post..)


      Here are details on the system

      # uname -a

      Linux <hostname> 4.2.0-35-generic #40-Ubuntu SMP Tue Mar 15 22:15:45 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


      # mlxup

      Querying Mellanox devices firmware ...

      Device #1:


        Device Type:      ConnectX3Pro

        Part Number:      MCX312B-XCC_Ax

        Description:      ConnectX-3 Pro EN network interface card; 10GigE; dual-port SFP+; PCIe3.0 x8 8GT/s; RoHS R6

        PSID:             MT_1200111023

        PCI Device Name:  0000:02:00.0

        Port1 MAC:        e41d2db25040

        Port2 MAC:        e41d2db25041

        Versions:         Current        Available   

           FW             2.36.5000      2.36.5000   

           PXE            3.4.0718       3.4.0718    

        Status:           Up to date


      Stack dump from crash(dmesg file is attached)


            KERNEL: /usr/lib/debug/boot/vmlinux-4.2.0-35-generic
          DUMPFILE: ../201607301001/dump.201607301001  [PARTIAL DUMP]
              CPUS: 8
              DATE: Sat Jul 30 10:01:52 2016
            UPTIME: 00:00:14
      LOAD AVERAGE: 1.19, 0.25, 0.08
             TASKS: 584
          NODENAME: <hostname>
           RELEASE: 4.2.0-35-generic
           VERSION: #40-Ubuntu SMP Tue Mar 15 22:15:45 UTC 2016
           MACHINE: x86_64  (3409 Mhz)
            MEMORY: 16 GB
             PANIC: "BUG: unable to handle kernel paging request at 0000001100000002"
               PID: 1625
           COMMAND: "docker"
              TASK: ffff8803e1f5a940  [THREAD_INFO: ffff8803de0e8000]
               CPU: 4
      crash> bt
      PID: 1625   TASK: ffff8803e1f5a940  CPU: 4   COMMAND: "docker"
      #0 [ffff88041ed033f0] machine_kexec at ffffffff8105913b
      #1 [ffff88041ed03460] crash_kexec at ffffffff81109bf2
      #2 [ffff88041ed03530] oops_end at ffffffff81018ead
      #3 [ffff88041ed03560] no_context at ffffffff810682a5
      #4 [ffff88041ed035d0] __bad_area_nosemaphore at ffffffff81068570
      #5 [ffff88041ed03620] bad_area_nosemaphore at ffffffff810686f3
      #6 [ffff88041ed03630] __do_page_fault at ffffffff810689d7
      #7 [ffff88041ed03690] do_page_fault at ffffffff81068d42
      #8 [ffff88041ed036b0] page_fault at ffffffff817fabc8
          [exception RIP: __netdev_pick_tx+102]
          RIP: ffffffff816e64e6  RSP: ffff88041ed03768  RFLAGS: 00010202
          RAX: ffff88040c2d97f0  RBX: 0000000000000000  RCX: ffffffff816e6480
          RDX: 000000000000000c  RSI: ffff8803d4359b00  RDI: ffff8803fb440000
          RBP: ffff88041ed037a8   R8: ffff88041ed19b00   R9: ffff8803d4359b00
          R10: 0000000000000000  R11: 0000000000000150  R12: ffff8803fb440000
          R13: 0000000000000000  R14: 00000000ffffffff  R15: 0000001100000002
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
      #9 [ffff88041ed037b0] mlx4_en_select_queue at ffffffffc0187a7f [mlx4_en]
      #10 [ffff88041ed037d0] netdev_pick_tx at ffffffff816edac1
      #11 [ffff88041ed03800] __dev_queue_xmit at ffffffff816edc07
      #12 [ffff88041ed03860] dev_queue_xmit_sk at ffffffff816ee0e3
      #13 [ffff88041ed03870] netdev_send at ffffffffc04de305 [openvswitch]
      #14 [ffff88041ed038b0] ovs_vport_send at ffffffffc04ddc28 [openvswitch]
      #15 [ffff88041ed038d0] do_output at ffffffffc04d0289 [openvswitch]
      #16 [ffff88041ed038f0] do_execute_actions at ffffffffc04d0874 [openvswitch]
      #17 [ffff88041ed039a0] ovs_execute_actions at ffffffffc04d177f [openvswitch]
      #18 [ffff88041ed039d0] ovs_dp_process_packet at ffffffffc04d4f04 [openvswitch]
      #19 [ffff88041ed03a60] ovs_vport_receive at ffffffffc04dd38b [openvswitch]
      #20 [ffff88041ed03c10] netdev_frame_hook at ffffffffc04de5d0 [openvswitch]
      #21 [ffff88041ed03c40] __netif_receive_skb_core at ffffffff816eb2d4
      #22 [ffff88041ed03ce0] __netif_receive_skb at ffffffff816eb988
      #23 [ffff88041ed03d00] netif_receive_skb_internal at ffffffff816eba02
      #24 [ffff88041ed03d40] napi_gro_frags at ffffffff816ec4a7
      #25 [ffff88041ed03d70] mlx4_en_process_rx_cq at ffffffffc0189870 [mlx4_en]
      #26 [ffff88041ed03e10] mlx4_en_poll_rx_cq at ffffffffc0189db6 [mlx4_en]
      #27 [ffff88041ed03e60] net_rx_action at ffffffff816ebf09
      #28 [ffff88041ed03ef0] __do_softirq at ffffffff81081131
      #29 [ffff88041ed03f60] irq_exit at ffffffff81081433
      #30 [ffff88041ed03f70] do_IRQ at ffffffff817fb878
      --- <IRQ stack> ---
      #31 [ffff8803de0ebf58] ret_from_intr at ffffffff817f97eb
          RIP: 000000000088d618  RSP: 000000c82024d118  RFLAGS: 00000202
          RAX: 0000000073f84770  RBX: 0000000000000400  RCX: 0000000054423aca
          RDX: 0000000089ecd45f  RSI: 000000c820542940  RDI: 000000c820544000
          RBP: 00000000e4458357   R8: 000000008847594a   R9: 0000000039eb6dc2
          R10: 00000000d57b5eff  R11: 00000000fa36c492  R12: 0000000000000004
          R13: 0000000000dd5c19  R14: 0000000000000002  R15: 0000000000000008
          ORIG_RAX: ffffffffffffff3d  CS: 0033  SS: 002b


      Has anyone seen the similar issue?