HowTo Increase Memory Size used by Mellanox Adapters

Version 10

    This post shows and discusses the memory aspects of Mellanox ConnectX adapters.

     

    References

     

    Mellanox ConnectX adapters have 2 parameters that dictate the amount of memory can be used.

    1. log_num_mtt  - The number of Memory Translation Table (MTT) segments per HCA. The default number ranges between 20-30. The value is the log2 of the number.

    2. log_mtts_per_seg - The number of MTT entries per segment. The default number of is 0. The value is the log2 of the number.

     

     

    When using applications which expected to consume a respected amount of memory, application might fail due to not enough memory that can be registered by RDMA.

    MPI Jobs may fail to run with the error message “MTT allocation error”. This error is caused from the HCA’s inability to register additional memory.

    Increasing the MTT size might increase the number of “cache misses” and increase latency. Some applications require lower latency that could be achieved by reducing the MTT size.

     

    The formula to compute the maximum value of pagepool when using RDMA is:

     

    max_reg_mem = 2^log_num_mtt  x  2^log_mtts_per_seg * x PAGE_SIZE

     

    For example, if the physical memory on the server is 64GB, it is recommended to have twice this size (2x64GB=128GB) for the max_reg_mem.

     

    max_reg_mem = (2^ log_num_mtt) * (2^1) * (4 kB)

    128GB = (2^ log_num_mtt) * (2^1) * (4 kB)

    2^37 = (2^ log_num_mtt) * (2^1) * (2^12)

    2^24 = (2^ log_num_mtt)

    24 = log_num_mtt

     

    To view the parameters value:

     

    # cat /sys/module/mlx4_core/parameters/log_num_mtt

    23

    # cat /sys/module/mlx4_core/parameters/log_mtts_per_seg

    0

     

    Set the parameter of the mlx4_core module (to 24 for example), change the file /etc/modprobe.d/mlx4_core.conf  to:

    options mlx4_core log_num_mtt=24

     

    Same procedure for the log_mtts_per_seg parameter.

     

    Restart the driver after this change:

    # /etc/init.d/openibd restart

     

    Additional updated  information regarding the above and suggested workaround:

     

    -          This is commonly seen when running HPC applications with MLNX_OFED 1.5.3

    -          There is ibv_reg_mr() failure like this (this comes from ANSYS Fluent which uses Platform MPI, similar messages from ibv_reg_mr() are seen when using other MPIs)

     

    fluent_mpi.14.0.0: Rank 0:0: MPI_Send: ibv_reg_mr() failed: addr 0x2ae4879a4e58, len 1057200

    fluent_mpi.14.0.0: Rank 0:0: MPI_Send: Internal MPI error

     

    -          For RHEL/CentOS 5.x, the fix/workaround is add a line to the /etc/modprobe.conf file, and restart openibd service.

    -          For RHEL/CentOS 6.x, the fix/workaround is create a file (e.g. /etc/modprobe.d/mofed.conf file) and add this line below, and restart openibd service.

     

    options mlx4_core log_num_mtt=24

     

    For other kernel parameters and more examples, refer to MLNX_OFED user manuals.