8 Replies Latest reply on Sep 28, 2017 3:19 AM by yairi

    Melanox grid director 4036e won't boot.

    switchman

      I have been asked to look at the aforementioned 4036e. This is my first time with Melanox switches.

       

      No warning LEDS. All green. Power supply and fans ok.

       

      Boots then crashes at different places in the boot sequence.

      I am seeing a 'Warning - Bad CRC' before the switch decides to boot from the secondary flash.

      The boot sequence creates 9 partitions.

       

      When we get to the NAND device line it scans for bad blocks.

      Then it creates 1 MTD partition.

       

      Later it identifies a bad area from kernel access.

      Then we just get a call trace and instruction dump and the loading process halts.

      Line connection no longer responds.

       

      I suspect a bad/faulty NAND flash chip.

       

      Does anyone have any suggestions, is this replaceable? Should I try flashing the firmware.

       

      I am not currently at that site, I will visit on Sunday and copy the full configuration then post back here.

      I would appreciate any suggestions or ideas.

       

      Many thanks.

      Switchman.

        • Re: Melanox grid director 4036e won't boot.
          switchman

          I had saved a portion the end of the output and then attempting to boot a second time at the bottom:

          ============================================================================

          Intel/Sharp Extended Query Table at 0x010A
          Intel/Sharp Extended Query Table at 0x010A
          Intel/Sharp Extended Query Table at 0x010A
          Intel/Sharp Extended Query Table at 0x010A
          Using buffer write method
          Using auto-unlock on power-up/resume
          cfi_cmdset_0001: Erase suspend on write enabled
          cmdlinepart partition parsing not available
          RedBoot partition parsing not available
          Creating 9 MTD partitions on "4cc000000.nor_flash":
          0x00000000-0x001e0000 : "kernel"
          0x001e0000-0x00200000 : "dtb"
          0x00200000-0x01dc0000 : "ramdisk"
          0x01dc0000-0x01fa0000 : "safe-kernel"
          0x01fa0000-0x01fc0000 : "safe-dtb"
          0x01fc0000-0x03b80000 : "safe-ramdisk"
          0x03b80000-0x03f60000 : "config"
          0x03f60000-0x03fa0000 : "u-boot env"
          0x03fa0000-0x04000000 : "u-boot"
          NAND device: Manufacturer ID: 0x20, Chip ID: 0xda (ST Micro NAND 256MiB 3,3V 8-bit)
          Scanning device for bad blocks
          Creating 1 MTD partitions on "4e0000000.ndfc.nand":
          0x00000000-0x10000000 : "log"
          i2c /dev entries driver
          IBM IIC driver v2.1
          ibm-iic(/plb/opb/i2c@ef600700): using standard (100 kHz) mode
          ibm-iic(/plb/opb/i2c@ef600800): using standard (100 kHz) mode
          i2c-2: Virtual I2C bus (Physical bus i2c-0, multiplexer 0x70 port 0)
          i2c-3: Virtual I2C bus (Physical bus i2c-0, multiplexer 0x70 port 1)
          i2c-4: Virtual I2C bus (Physical bus i2c-0, multiplexer 0x70 port 2)
          i2c-5: Virtual I2C bus (Physical bus i2c-0, multiplexer 0x70 port 3)
          rtc-ds1307 6-0068: rtc core: registered ds1338 as rtc0
          rtc-ds1307 6-0068: 56 bytes nvram
          i2c-6: Virtual I2C bus (Physical bus i2c-0, multiplexer 0x70 port 4)
          i2c-7: Virtual I2C bus (Physical bus i2c-0, multiplexer 0x70 port 5)
          i2c-8: Virtual I2C bus (Physical bus i2c-0, multiplexer 0x70 port 6)
          i2c-9: Virtual I2C bus (Physical bus i2c-0, multiplexer 0x70 port 7)
          pca954x 0-0070: registered 8 virtual busses for I2C switch pca9548
          TCP cubic registered
          NET: Registered protocol family 10
          lo: Disabled Privacy Extensions
          IPv6 over IPv4 tunneling driver
          sit0: Disabled Privacy Extensions
          ip6tnl0: Disabled Privacy Extensions
          NET: Registered protocol family 17
          RPC: Registered udp transport module.
          RPC: Registered tcp transport module.
          rtc-ds1307 6-0068: setting system clock to 2000-01-18 01:06:09 UTC (948157569)
          RAMDISK: Compressed image found at block 0
          VFS: Mounted root (ext2 filesystem) readonly.
          Freeing unused kernel memory: 172k init
          init started: BusyBox v1.12.2 (2011-01-03 14:13:22 IST)
          starting pid 15, tty '': '/etc/rc.d/rcS'
          mount: no /proc/mounts
          Mounting /proc and /sys
          Mounting filesystems
          Loading module Voltaire
          Empty flash at 0x0cdcf08c ends at 0x0cdcf800
          Starting crond:
          Starting telnetd:
          ibsw-init.sh start...
          Tue Jan 18 01:06:42 UTC 2000
          INSTALL FLAG  0x0
          starting syslogd & klogd ...
          Starting ISR:                   Unable to handle kernel paging request for data at address 0x0000001e
          Faulting instruction address: 0xc00ec934
          Oops: Kernel access of bad area, sig: 11 [#1]
          Voltaire
          Modules linked in: ib_is4(+) ib_umad ib_sa ib_mad ib_core memtrack Voltaire
          NIP: c00ec934 LR: c00ec930 CTR: 00000000
          REGS: d7bdfd10 TRAP: 0300   Not tainted  (2.6.26)
          MSR: 00029000 <EE,ME>  CR: 24000042  XER: 20000000
          DEAR: 0000001e, ESR: 00000000
          TASK = d7b9c800[49] 'jffs2_gcd_mtd9' THREAD: d7bde000
          GPR00: 00000001 d7bdfdc0 d7b9c800 00000000 000000d0 00000003 df823040 0000007f
          GPR08: 22396d59 d9743920 c022de58 00000000 24000024 102004bc c026b9a0 c026b910
          GPR16: c026b954 c026b630 c026b694 c022b790 d8938150 d8301000 c022b758 d7bdfe30
          GPR24: 00000000 0000037c d8301400 00000abf d9743d80 00000000 d8938158 df823000
          NIP [c00ec934] jffs2_get_inode_nodes+0xb6c/0x1020
          LR [c00ec930] jffs2_get_inode_nodes+0xb68/0x1020
          Call Trace:
          [d7bdfdc0] [c00ec758] jffs2_get_inode_nodes+0x990/0x1020 (unreliable)
          [d7bdfe20] [c00ece28] jffs2_do_read_inode_internal+0x40/0x9e8
          [d7bdfe90] [c00ed838] jffs2_do_crccheck_inode+0x68/0xa4
          [d7bdff00] [c00f1ed8] jffs2_garbage_collect_pass+0x160/0x664
          [d7bdff50] [c00f36c8] jffs2_garbage_collect_thread+0xf0/0x118
          [d7bdfff0] [c000bdb8] kernel_thread+0x44/0x60
          Instruction dump:
          7f805840 409c000c 801d0004 48000008 801d0008 2f800000 409effdc 2f9d0000
          40be0010 48000180 4802ba05 7c7d1b78 <a01d001e> 7fa3eb78 2f800000 409effec
          ---[ end trace b57e19dd3d61c6af ]---
          ib_is4 0000:81:00.0: ep0_dev_name 0000:81:00.0
          Unable to handle kernel paging request for data at address 0x00000034
          Faulting instruction address: 0xc002f3b0
          Oops: Kernel access of bad area, sig: 11 [#2]
          Voltaire
          Modules linked in: is4_cmd_driver ib_is4 ib_umad ib_sa ib_mad ib_core memtrack Voltaire
          NIP: c002f3b0 LR: c002fb00 CTR: c00f3a10
          REGS: df8a3de0 TRAP: 0300   Tainted: G      D    (2.6.26)
          MSR: 00021000 <ME>  CR: 24544e88  XER: 20000000
          DEAR: 00000034, ESR: 00000000
          TASK = df88e800[8] 'pdflush' THREAD: df8a2000
          GPR00: c002fb00 df8a3e90 df88e800 00000001 d7b9c800 d7b9c800 00000000 00000001
          GPR08: 00000001 00000000 24544e22 00000002 00004b1a 67cfb19f 1ffef400 00000000
          GPR16: 1ffe42d8 00000000 1ffebfa4 00000000 00000000 00000004 c0038778 c0261ac4
          GPR24: 00000001 c02f0000 00000000 d7b9c800 00000001 d7b9c800 00000000 d8301400
          NIP [c002f3b0] prepare_signal+0x1c/0x1a4
          LR [c002fb00] send_signal+0x28/0x214
          Call Trace:
          [df8a3e90] [c0021bb8] check_preempt_wakeup+0xd8/0x110 (unreliable)
          [df8a3eb0] [c002fb00] send_signal+0x28/0x214
          [df8a3ed0] [c002fe40] send_sig_info+0x28/0x48
          [df8a3ef0] [c00f35c4] jffs2_garbage_collect_trigger+0x3c/0x50
          [df8a3f00] [c00f3a40] jffs2_write_super+0x30/0x5c
          [df8a3f10] [c007340c] sync_supers+0x80/0xd0
          [df8a3f30] [c0054dc8] wb_kupdate+0x48/0x150
          [df8a3f90] [c0055434] pdflush+0x104/0x1a4
          [df8a3fe0] [c00387c4] kthread+0x4c/0x88
          [df8a3ff0] [c000bdb8] kernel_thread+0x44/0x60
          Instruction dump:
          80010034 bb810020 7c0803a6 38210030 4e800020 9421ffe0 7c0802a6 bf810010
          90010024 7c9d2378 83c4034c 7c7c1b78 <801e0034> 70090008 40820100 2f83001f
          ---[ end trace b57e19dd3d61c6af ]---
          ------------[ cut here ]------------
          Badness at kernel/exit.c:965
          NIP: c00273f0 LR: c000a03c CTR: c013b2b4
          REGS: df8a3cb0 TRAP: 0700   Tainted: G      D    (2.6.26)
          MSR: 00021000 <ME>  CR: 24544e22  XER: 20000000
          TASK = df88e800[8] 'pdflush' THREAD: df8a2000
          GPR00: 00000001 df8a3d60 df88e800 0000000b 00002d73 ffffffff c013e13c c02eb620
          GPR08: 00000001 00000001 00002d73 00000000 24544e84 67cfb19f 1ffef400 00000000
          GPR16: 1ffe42d8 00000000 1ffebfa4 00000000 00000000 00000004 c0038778 c0261ac4
          GPR24: 00000001 c02f0000 00000000 d7b9c800 df8a3de0 0000000b df88e800 0000000b
          NIP [c00273f0] do_exit+0x24/0x5ac
          LR [c000a03c] kernel_bad_stack+0x0/0x4c
          Call Trace:
          [df8a3d60] [00002d41] 0x2d41 (unreliable)
          [df8a3da0] [c000a03c] kernel_bad_stack+0x0/0x4c
          [df8a3dc0] [c000ef90] bad_page_fault+0xb8/0xcc
          [df8a3dd0] [c000c4c8] handle_page_fault+0x7c/0x80
          [df8a3e90] [c0021bb8] check_preempt_wakeup+0xd8/0x110
          [df8a3eb0] [c002fb00] send_signal+0x28/0x214
          [df8a3ed0] [c002fe40] send_sig_info+0x28/0x48
          [df8a3ef0] [c00f35c4] jffs2_garbage_collect_trigger+0x3c/0x50
          [df8a3f00] [c00f3a40] jffs2_write_super+0x30/0x5c
          [df8a3f10] [c007340c] sync_supers+0x80/0xd0
          [df8a3f30] [c0054dc8] wb_kupdate+0x48/0x150
          [df8a3f90] [c0055434] pdflush+0x104/0x1a4
          [df8a3fe0] [c00387c4] kthread+0x4c/0x88
          [df8a3ff0] [c000bdb8] kernel_thread+0x44/0x60
          Instruction dump:
          bb61000c 38210020 4e800020 9421ffc0 7c0802a6 bf010020 90010044 7c7f1b78
          7c5e1378 800203e0 3160ffff 7d2b0110 <0f090000> 54290024 8009000c 5409012f

          U-Boot 1.3.4.32 (Feb  6 2011 - 10:18:30)

          CPU:   AMCC PowerPC 460EX Rev. B at 666.666 MHz (PLB=166, OPB=83, EBC=83 MHz)
                 Security/Kasumi support
                 Bootstrap Option E - Boot ROM Location EBC (16 bits)
                 Internal PCI arbiter disabled
                 32 kB I-Cache 32 kB D-Cache
          Board: 4036QDR - Voltaire 4036 QDR Switch Board
          I2C:   ready
          DRAM:  512 MB (ECC enabled, 333 MHz, CL3)
          FLASH: 64 MB
          NAND:  256 MiB
          *** Warning - bad CRC, using default environment

          MAC Address: 00:08:F1:20:52:E8
          PCIE1: successfully set as root-complex
          PCIE:   Bus Dev VenId DevId Class Int
                  01  00  15b3  bd34  0c06  00
          Net:   ppc_4xx_eth0

          Type run flash_nfs to mount root filesystem over NFS

          Hit any key to stop autoboot:  0
          => run flash_nfs
          ## Booting kernel from Legacy Image at fc000000 ...
             Image Name:   Linux-2.6.26
             Image Type:   PowerPC Linux Kernel Image (gzip compressed)
             Data Size:    1406000 Bytes =  1.3 MB
             Load Address: 00000000
             Entry Point:  00000000
             Verifying Checksum ... OK
             Uncompressing Kernel Image ... OK