0 Replies Latest reply on Oct 8, 2014 12:50 PM by zpodlovics

    HSA and Infinband - ConnectX-2 adapter (MT26428) AMD-Vi IO_PAGE_FAULT problems with MLNX_OFED_LINUX-2.3-1.0.1-ubuntu14.04-x86_64

      I have a ASUS A88XM-A motherboard with AMD A10-7850K Radeon R7, bios version: 1301, bios date: 04/01/2014, and I would like to use this HSA (Heterogeneous System Architecture) enabled system with the MT26428 adapter. I have installed the HSA ubuntu kernel from HSAFoundation/HSA-Drivers-Linux-AMD · GitHub and the MLNX_OFED_LINUX-2.3-1.0.1-ubuntu14.04-x86_64 ditributions successfully, but it seems there are IOMMU issues with this card. The exact card type is HP InfiniBand 4X QDR CX-2 PCI-E G2 2-port 592520-B21.

       

      I have checked almost every possible option to boot, but the boot process most of the time will hang and/or crash. The kernel logs contains lots of messages something like this:

       

      AMD-Vi: Event logged [IO_PAGE_FAULT device=00:07.2 domain=0x0000 address=0x0000000000000040 flags=0x0050]

       

      I have contacted the HSA Foundation / AMD Developers on GitHub at and they provided lot's of valuable help to resolve the issue. Thanks! They have also contacted the AMD Research who are working with National Labs, and they also provided valuable help, and shared their experience:

       

      "I talked to our Research team who is using Infiniband in the Kaveri systems, We have HSA-enabled Kaveri systems with Mellanox ConnectIB HCAs that are working in the lab. They are know to work with have run with both Pre-Alpha and Alpha and upcoming Beta HSA releases. The driver we are using is the InfiniBand drivers, we are using MLNX_OFED_LINUX-2.3-1.0.1-ubuntu14.04-x86_64 as well. They did not have to do anything special after the Mellanox installation to get things to work. They are using generic kernel level is 3.14.0-031499-generic. One delta they are using the ConnectIB cards where your using the ConnectX-2 HCAs. We know that the two cards use a different kernel driver modules. What I suspect is there is but in Mellonox kernel driver module. I had seen other issue with this card and Intel's IOMMU."

       

      More details, including kernel crash screenshots, kernel dmesg and other logs are available on GitHub issue:

      HSA and High speed RDMA supported network devices (eg.: Mellanox ConnectX-2 VPI PCIe 2.0 5GT/s - IB QDR / 10GigE (MT2642…


      Please note IOMMU is required for HSA support, so turning off the IOMMU is not an option. Could anybody help to resolve the issue?


      Thanks for your help,

      Zoltan