HowTo Efficiently Utilize Multiple Cores with TGT Block Storage

Version 9

    This post is focused on performance enhancement for the TGT block storage in muli-core (multi tgt processes) environment.

     

    References

     

     

    Spawning multiple tgt processes

    In order to utilize more than one core on a multi-core environment, and by this increase performance, it is needed to spawn several tgt processes in parallel. One such tgt process per available core is recommended.

    To do that, each tgt instance must be bound to different control and data ports. For example:

    # tgtd -C 1001 --iscsi portal=*:2001 --iser port=2001

    # tgtd -C 1002 --iscsi portal=*:2002 --iser port=2002

    # tgtd -C 1003 --iscsi portal=*:2003 --iser port=2003

          

     

    Controlling each instances is done by providing the control port to tgtadm or tgt-setup-lun. For example:

    # tgt-setup-lun –C 1001 –n tg1 –d /dev/ram0 -t iser

          

    Creating LUNs and connecting initiators

    Admin should try to evenly distribute the different LUNs and initiators connections across the different tgt instances.

    Special care must be taken not to expose the same backing store device via different LUNs and/or tgt instances, unless the Admin knows that this is the required configuration. Normally it isn't.

    To connect to the correct tgt instance, initiator should use the correct iscsi data port (2001-2003 in the example above).

     

    CPU Affinity

    In case you are new to this term, refer to What is CPU Affinity?

    Each tgt process can be bound to a specific CPU or one CPU out of a group of CPUs. This is done via the taskset command, for example:

     

    Bind tgtd to CPU 5:

    # taskset –c 5 tgtd -C 1001 --iscsi portal=*:2001 --iser port=2001

          

     

    Bind tgtd to one of the CPUs 0-7:

    # taskset –c 0-7 tgtd -C 1001 --iscsi portal=*:2001 --iser port=2001

          

     

    It is also possible to change binding for already running process, for example:

    Change the binding of process id 8889 to cpus 0-3

    # taskset –c –p 0-3 8889

          

     

    IRQ Affinity

    With iSER transport interrupts can be received on specific IRQs. By running multiple tgtd instances Admin can direct the interrupts across multiple IRQs and then bind them to a specific CPU. For example:

    # tgtd -C 1001 --iscsi portal=*:2001 --iser port=2001 cq_vector=0

    # tgtd -C 1002 --iscsi portal=*:2002 --iser port=2002 cq_vector=1

    # tgtd -C 1003 --iscsi portal=*:2003 --iser port=2003 cq_vector=2

          

    In this examples the tgtd instances will report interrupts to the 1st, 2nd and 3rd IRQs owned by the HCA.

     

    It is possible to check which interrupts are being used by looking in /proc/interrupts, for example:

    # cat  /proc/interrupts

         CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15

    155:  0    0    0    0    0    0    0    0    0    0    0     0     0     0     0     0  IR-PCI-MSI-edge      mlx5_comp0@pci:0000:05:00.0

    156:  0    0    0    0    0    0    0    0    0    0    0     0     0     0     0     0  IR-PCI-MSI-edge      mlx5_comp1@pci:0000:05:00.0

    157:  0    0    0    0    0    0    0    0    0    0    0     0     0     0     0     0  IR-PCI-MSI-edge      mlx5_comp2@pci:0000:05:00.0

          

     

    Then Admin can bind each interrupt be handled by a different CPU (0, 1, 2 are the CPU number per IRQ, the input can also be a range like 0-3):

    # echo 0 > /proc/irq/155/smp_affinity_list

    # echo 1 > /proc/irq/156/smp_affinity_list

    # echo 2 > /proc/irq/157/smp_affinity_list

          

     

    Quick IRQ affinity can be done by using  set_irq_affinity.sh that comes with MLNX_OFED package. For example:

    #  set_irq_affinity.sh eth5
    ---------------------------------------
    Optimizing IRQs for Single port traffic
    ---------------------------------------
    Discovered irqs for eth5: 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
    Assign irq 79 to its affinity_hint 00000001
    Assign irq 80 to its affinity_hint 00000004
    Assign irq 81 to its affinity_hint 00000010
    Assign irq 82 to its affinity_hint 00000040
    Assign irq 83 to its affinity_hint 00000100
    Assign irq 84 to its affinity_hint 00000400
    Assign irq 85 to its affinity_hint 00001000
    Assign irq 86 to its affinity_hint 00004000
    Assign irq 87 to its affinity_hint 00000002
    Assign irq 88 to its affinity_hint 00000008
    Assign irq 89 to its affinity_hint 00000020
    Assign irq 90 to its affinity_hint 00000080
    Assign irq 91 to its affinity_hint 00000200
    Assign irq 92 to its affinity_hint 00000800
    Assign irq 93 to its affinity_hint 00002000
    Assign irq 94 to its affinity_hint 00008000


    done.