Kubernetes RDMA (InfiniBand) shared HCA with ConnectX4/ConnectX5

Version 2

    This post shows how to use single Mellanox ConnectX-4/ConnectX-5 InfiniBand HCA in Kubernetes cluster shared among multiple Pods.

    You must use Kubernetes version 1.10.3 or higher.

     

    Overview

    One RDMA device (HCA) can be shared among multiple Pods running in a Kubernetes worker node. In this mode virtual networking device support is provided by networking connectivity for IP networking using vxlan or veth networking devices.

     

    Configuration and setup involves following steps.

    1. HCA device plugin configuration and installation
      1. Create device plugin configuration
      2. Install device plugin
    2. Pod configuration
    3. Check configuration
      1. Check device plugin configuration
      2. Check sriov cni installation

     

    Configuration

    1.HCA device plugin configuration and installation

    Apply HCA device plugin configuration

    # kubectl https://cdn.rawgit.com/Mellanox/k8s-rdma-sriov-dev-plugin/7b27f8cf/example/hca/rdma-hca-node-config.yml

    This applies HCA device plugin configuration as Kubernetes ConfigMap.

     

    2.Deploy HCA device plugin

    # kubectl create –f https://github.com/Mellanox/k8s-rdma-sriov-dev-plugin/blob/master/example/device-plugin.yaml

     

    3. Pod Configuration

    Each Pod’s container configuration needs to have configuration to indicate that it demands one virtual networking device.

    resources:

      limits:

        rdma/hca: 1

    Example Pod configuration can be found below.

    https://github.com/Mellanox/k8s-rdma-sriov-dev-plugin/blob/master/example/hca/test-hca-pod.yml

     

    4. Check configuration

    Check device plugin and configuration

    #kubectl get ds --namespace=kube-system