1 Reply Latest reply on Oct 6, 2015 11:34 AM by rage@mellanox.com

    why disabling irq on linux causes rdma_read and rdma_write to fail ?


      I have two howts machine connected by Mellanox infiniband HCA. I'm executing a simple RDMA application to perform RDMA write and RDMA read operation

      from one machine (client) on the other machine (server).  To know which interrupts are related to HCA cards on each machine,  I ran the following command less proc/interrupts


        67:475880 50253     0     0   PCI-MSI-edge mlx4-async@pci:0000:01:00.0
        68:399002     0    73     0   PCI-MSI-edge mlx4_0-0
        69:     3264    23     0   PCI-MSI-edge mlx4_0-1
        70:     0     0     0     0   PCI-MSI-edge mlx4_0-2
        71:     0     0     0     0   PCI-MSI-edge mlx4_0-3


      On the server machine, I've experimented that using the function __disable_irq() on those 4 interrupts causes all RDMA read/write operations performed by the client to fail with the error message "transport retry counter exceeded".


      My question is why and when RDMA read/write operations can generate irqs on the remote machine, I taught that they don't involve the remote CPU, then they will not perform any kind of IRQ ?

      Then, why disabling those interrupts causes these operations to fail ?


      Message was edited by: FOPA Léon Constantin