HowTo Configure SMB Direct (RoCE) over PFC on Windows 2012 Server

Version 37

    This post shows how to configure Priority Flow Control (PFC) on ConnectX-3 adapter card with WinOF driver installed on Windows 2012 operating system.

     

    References

     

    Powershell Commands

     

    Required Setup

    1%3Fauth_token%3D03c29e75ddefcda5fa061b79a01e4790b34c5681

     

    Prerequisites

    1. Make sure that the server is equipped with Mellanox ConnectX-3 adapter and that Windows 2012 Server is installed.

     

    2. Install the latest Mellanox WinOF driver.

     

    3. Enable required interface.

     

    4. Configure a single VLAN on the required interface

    Note 1: In case teaming is enabled on the interface, RDMA cannot be used.

    Note 2: To configure multiple VLANs for SMB high availability with teaming, use Virtual Ethernet Adapter (VEA). Please refer to WinOF User Manual.

    Note 3: VLAN ID valid range is 1-4094

    98.PNG

     

    5. (Optional) Priority Flow Control (PFC) and regular Flow Control (global pause) cannot operate together on the same interface.

    There is no actual need to disable the Flow Control configuration on the port, as the PFC configuration overrides the Flow Control configuration. Therefore, it doesn't matter what is configured in the Flow Control field in this case.

    Anyhow, to avoid confusions and ensure easy debugging, it is recommended to disable Flow Control.

    1.PNG.png

    6. "Data Center Bridging" Feature must be installed on the server to enable Quality of System (QoS) and PFC.

    There are two methods through which to install Data Center Bridging, either via GUI or via PowerShell.

     

    Via GUI:

    Go to Server Manager -> Add Roles and Features -> Select the server and click on Features -> Select Data Center Bridging and install.

    2.PNG.png

    It is also possible to perform the same operation via PowerShell using the following command:

     

    PS C:\> Install-WindowsFeature Data-Center-Bridging

     

    Success Restart Needed Exit Code      Feature Result

    ------- -------------- ---------      --------------

    True    No             Success        {Data Center Bridging}

     

    7. Set the required RoCE mode, refer to HowTo Configure RoCE in Windows Environment (Global Pause).

     

    Few notes here:

    a. Make sure you have the same RoCE mode on both servers.

    b. RoCE v2 (routable) works on ConnectX-3 Pro adapter cards and should not be enabled on the ConnectX-3 cards (not supported).

    c. If there is no need for routable RoCE v2, leave the default configuration (RoCE v1) for WinOF version 4.90 or older.

    d. Since WinOF 5.00 the default RoCE mode is disabled (Unlike WinOF 4.90). RoCE mode must be set manually.

     

    8. Make sure the switch used in the setup is configured with PFC (on the desired priority, e.g. 3) and enabled on the relevant interfaces.

     

    In this case, when using Mellanox Ethernet switches (e.g. SX1036), refer to HowTo Enable PFC on Mellanox Switches (SwitchX).

     

    For other switch vendors, you may refer to the following sources:

    PFC and QoS Configuration for SMB Direct

     

    High Level

    PFC and QoS configuration for RoCE requires 3 configuration steps:

    1. Enable PFC on a specific priority (e.g. priority 3)

    2. Enable QoS on the required port (e.g. Ethernet 15)

    3. Map SMB traffic to the configured priority (e.g. priority 3)

    Note: You can map additional traffic to a different priority (e.g. priority 1) or leave the traffic unmapped, depending on what you wish to run on this interface.

    4. (Optional) Configure the egress scheduler (ETS, SP) with specific bandwidth to each flow (e.g. SMB, TCP, UDP ...)

     

    To configure PFC and QoS for SMB Direct, follow the steps below:

     

    1. Get the Current PFC Configuration. All priorities are disabled by default.

    PS C:\> Get-NetQosFlowControl

     

    Priority   Enabled

    --------   -------

    0          False

    1          False

    2          False

    3          False

    4          False

    5          False

    6          False

    7          False

     

    PS C:\>

     

    2. Enable one of the priorities (in this example, priority 3). Run:

    PS C:\> Enable-NetQosFlowControl -Priority 3

    PS C:\> Get-NetQosFlowControl

     

    Priority   Enabled

    --------   -------

    0          False

    1          False

    2          False

    3          True

    4          False

    5          False

    6          False

    7          False

     

    PS C:\>

     

     

    3. Get the Current QoS configuration:

     

    PS C:\> Get-NetAdapterQoS

     

    ...

     

    Name         : Ethernet 15

    Enabled      : False

    Capabilities :                       Hardware     Current

                                         --------     -------

                   MacSecBypass        : NotSupported NotSupported

                   DcbxSupport         : None         None

                   NumTCs(Max/ETS/PFC) : 8/8/8        0/0/0

     

    PS C:\>

     

    4. Enable QoS on the relevant interface.

    Note: There is no option to enable it on a specific VLAN interface, but on the physical interface.

    PS C:\> Enable-NetAdapterQos -InterfaceAlias "Ethernet 15"

    PS C:\> Get-NetAdapterQoS

    ...

     

    Name                       : Ethernet 15

    Enabled                    : True

    Capabilities               :                       Hardware     Current

                                                       --------     -------

                                 MacSecBypass        : NotSupported NotSupported

                                 DcbxSupport         : None         None

                                 NumTCs(Max/ETS/PFC) : 8/8/8        8/8/8

     

     

    OperationalTrafficClasses  : TC TSA    Bandwidth Priorities

                                 -- ---    --------- ----------

                                  0 ETS    100%      0-7

     

     

    OperationalFlowControl     : Priority 3 Enabled

    OperationalClassifications : Not Available

     

    PS C:\>

     

    5. Get the current NetQoSPolicy:

    PS C:\Users\Administrator> Get-NetQosPolicy -PolicyStore ActiveStore

    PS C:\Users\Administrator>

     

     

    6. Create specific QoSPolicyMap to map all traffic that reaches NetDirect port 445 (the default SMB port) to the proper priority (in this case priority 3):

     

    PS C:\Users\Administrator> New-NetQosPolicy “My SMB Direct QoS Map” -PolicyStore ActiveStore –NetDirectPort 445 -PriorityValue8021Action 3
    Name           : My SMB Direct QoS Map
    Owner          : PowerShell / WMI
    NetworkProfile : All
    Precedence     : 127
    NetDirectPort  : 445
    PriorityValue  : 3

     

    7. Map other flow types to different priority.

     

    In this example, TCP and UDP flows are mapped to priority 0 and the default traffic types (the rest) are mapped to priority 3.

    Note: selecting priority 3 for RDMA and priority 0 for TCP are only examples, you can choose different priorities depends on the network design and QoS required.

    PS C:\> New-NetQosPolicy "DEFAULT" -Default -PolicyStore ActiveStore -PriorityValue8021Action 3

     

    Name           : DEFAULT

    Owner          : PowerShell / WMI

    NetworkProfile : All

    Precedence     : 127

    Template       : Default

    PriorityValue  : 3

     

    PS C:\> New-NetQosPolicy "TCP" -IPProtocolMatchCondition TCP -PolicyStore ActiveStore -PriorityValue8021Action 0

     

    Name           : TCP

    Owner          : PowerShell / WMI

    NetworkProfile : All

    Precedence     : 127

    IPProtocol     : TCP

    PriorityValue  : 0

     

    PS C:\> New-NetQosPolicy "UDP" -IPProtocolMatchCondition UDP -PolicyStore ActiveStore -PriorityValue8021Action 0

     

    Name           : UDP

    Owner          : PowerShell / WMI

    NetworkProfile : All

    Precedence     : 127

    IPProtocol     : UDP

    PriorityValue  : 0

     

    PS C:\> Get-NetQosPolicy -PolicyStore ActiveStore

     

    Name           : DEFAULT

    Owner          : PowerShell / WMI

    NetworkProfile : All

    Precedence     : 127

    Template       : Default

    PriorityValue  : 3

     

    Name           : My SMB Direct QoS Map

    Owner          : PowerShell / WMI
    NetworkProfile : All
    Precedence     : 127
    NetDirectPort  : 445
    PriorityValue  : 3

     

    Name           : TCP

    Owner          : PowerShell / WMI

    NetworkProfile : All

    Precedence     : 127

    IPProtocol     : TCP

    PriorityValue  : 0

     

    Name           : UDP

    Owner          : PowerShell / WMI

    NetworkProfile : All

    Precedence     : 127

    IPProtocol     : UDP

    PriorityValue  : 0

     

    PS C:\>

     

    8. (Optional) Configure ETS parameters for all traffic types (SMB, TCP, UDP, etc.)

    For example, set 50% of the bandwidth to the SMB traffic with priority 3 to ETS algorithm.

     

    PS C:\> Get-NetQosTrafficClass

     

    Name                      Algorithm Bandwidth(%) Priority

    ----                      --------- ------------ --------

    [Default]                 ETS       100          0-7

     

    PS C:\> New-NetQosTrafficClass -name "SMB class" -priority 3 -bandwidthPercentage 50 -Algorithm ETS

     

    Name                      Algorithm Bandwidth(%) Priority

    ----                      --------- ------------ --------

    SMB class                 ETS       50           3

     

    PS C:\> Get-NetQosTrafficClass

     

    Name                      Algorithm Bandwidth(%) Priority

    ----                      --------- ------------ --------

    [Default]                 ETS       50           0-2,4-7

    SMB class                 ETS       50           3

     

    9. Create a startup QoS script and add it to the list of startup scripts.

    This script will run upon each system boot and will make sure that the QoS is reconfigured properly.

     

    10. Create a PowerShell script file (e.g. qos.ps1) in C:\Windows\System32\GroupPolicy\Machine\Scripts\Startup\qos.ps1 (or any another location) and add the script below with the right changes.

    Below is an example of the startup script, please review the parameters and change accordingly (according to priority, interface names, etc.).

    Remove-NetQosTrafficClass

    Remove-NetQosPolicy -Confirm:$False

    Set-NetQosDcbxSetting -Willing 0

    New-NetQosPolicy "SMB" -PolicyStore ActiveStore -NetDirectPortMatchCondition 445 -PriorityValue8021Action 3

    New-NetQosPolicy "DEFAULT" -PolicyStore Activestore -Default -PriorityValue8021Action 3

    New-NetQosPolicy "TCP" -PolicyStore ActiveStore -IPProtocolMatchCondition TCP -PriorityValue8021Action 0

    New-NetQosPolicy "UDP" -PolicyStore ActiveStore -IPProtocolMatchCondition UDP -PriorityValue8021Action 0

    Disable-NetQosFlowControl 0,1,2,4,5,6,7

    Enable-NetQosFlowControl -Priority 3

    Enable-NetAdapterQos -InterfaceAlias "Ethernet 15"

    New-NetQosTrafficClass -name "SMB class" -priority 3 -bandwidthPercentage 50 -Algorithm ETS

     

    11. To add the file to the list of startup scripts, perform the following steps:

    a. Run gpedit from the PowerShell

    gpedit

    b. In the pop-up window, under the 'Computer Configuration' section, perform the following:

      - Select Windows Settings

      - Select Scripts (Startup/Shutdown)

      - Double click Startup to open the Startup Properties

      - Select the PowerShell Scripts tab and click add

      - Browse to find the file you just created (e.g. qos.ps1)

    13 - script.PNG

     

    Verification

    There are various ways to test network performance. It is recommended to use RamDisk and not HDD to eliminate the HDD latency.

     

    To test network performance, follow the steps below:

    1. Refer toRam Disk Application for Windows Environment (imdisk, sqlio) to create a Ramdisk on one of the servers (e.g. server 2) and to run iosql on the other server (e.g. server 1)

    PS C:\Users\Administrator> sqlio2_15 -BYRT -e200 -b16 -fsequential -T0 -t32 -o4 -s5 -LS \\11.11.11.2\S\1.txt

    sqlio v2.15. 64bit_SG

    32 threads writing for 5 secs to file \\11.11.11.2\S\1.txt

            using a 0/100 read/write ratio

            using 16KB sequential IOs

            enabling multiple I/Os per thread with 4 outstanding

            buffering set to use both file and disk caches

            buffering will occur on the remote server, not locally

            software buffer cache will defer writeback, honoring temporary attribute

    Ensuring that  file \\11.11.11.2\S\1.txt, as requested,

    is at least 209715200 bytes in size.

    size of file \\11.11.11.2\S\1.txt needs to be: 209715200 bytes

    current file size:      0 bytes

    need to expand by:      209715200 bytes

    expanding \\11.11.11.2\S\1.txt ... done.

    using current size: 200 MB for file: \\11.11.11.2\S\1.txt

    initialization done

    CUMULATIVE DATA:

    throughput metrics:

    IOs/sec: 173698.56

    MBs/sec:  2714.04

    latency metrics:

    Min_Latency(ms): 0

    Avg_Latency(ms): 0

    Max_Latency(ms): 4

    histogram:

    ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+

    %: 83 16  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

     

    2. Check that SMB Direct (RDMA) is running:

     

    a. Lunch Performance Monitor using "perfmon" command from PowerShell:

     

    10 - Perfmon.PNG

     

    b. Add "RDMA Activity" counters of the proper interface:

    10 - RDMA activity.PNG

     

    c. Repeat the iosql test and check that the RDMA counters are raising:

    11 - Performance test.png

     

    3. Check PFC counters. For instance, if you're using Mellanox Ethernet Switches (MLNX-OS), run the command below to check the PFC counters.

    Note: Make sure that pause packets are being sent or received on the proper priority. In normal cases, you may expect pauses to be sent from the receiver side via the switch and back to the sender side.

    In the example below, server 1 is the sender and is connected via switch port number 1, while server 2 is the receiver and is connected to switch port number 2.

    You can see that server 2 is sending pause frames to the switch, and the switch passes it on via port 1 to server 1.

     

    switch (config) # show interfaces ethernet 1/1 counters priority 3

     

    Rx

      11809280             packets

      11809280             unicast packets

      0                    multicast packets

      0                    broadcast packets

      12238754633          bytes

      0                    pause packets

      0                    pause duration milliseconds

     

     

     

    Tx

      2046842              packets

      2046841              unicast packets

      0                    multicast packets

      1                    broadcast packets

      287974440            bytes

      53558                pause packets

     

     

    switch (config) # show interfaces ethernet 1/2 counters priority 3

     

     

     

    Rx

      2046842              packets

      2046841              unicast packets

      0                    multicast packets

      1                    broadcast packets

      247037600            bytes

      56966                pause packets

      1621                 pause duration milliseconds

     

     

     

    Tx

      11809283             packets

      11809280             unicast packets

      3                    multicast packets

      0                    broadcast packets

      12474940827          bytes

      0                    pause packets

    switch (config) #

     

    Troubleshooting

    • In some cases, New-NetQosPolicy commands should be executed with "-PolicyStore activestore" flag, which makes the configuration not persistent. In such cases, additional booting script is needed (refer to the WinOF User Manual for additional information). Make sure that the configuration is saved in the active store as written in the example above.
    • RDMA expected performance is related to many parameters, such as CPU, Disk/SSD side and speed and other parameters.
    • In case you need to "play" with the configuration, to remove network QoS policies, run:

    PS C:\> Remove-NetQosTrafficClass

    PS C:\> Remove-NetQosPolicy -Confirm:$False

    • In case RDMA is running, no traffic will be seen in the Task Manager->Performance on the selected interface. The reason is that the RDMA traffic bypasses the OS and goes directly to the adapter card.
      In the figure below, you can see RDMA traffic running on the Performance Monitoring tool and 0Kbps on the Task Manager->Performance.

     

    12 - troubleshooting.PNG

     

    • In case RDMA Activity counters do not show up, delete the counters and re-add them.
    • Make sure that the PFC is configured correctly on the switch. In case the host is configured with global pause, "RX unknown control opcode" counter will raise on the switch (MLNX-OS). Refer to HowTo Troubleshoot Mellanox Ethernet Switches via Port Counters for more information.
    • In some cases NetBIOS (TCP port 139) protocol will be established instead of SMB (TCP port 445). In order to make sure that SMB protocol will be enabled always, it is recommended to turn off the NetBIOS option.
      To do that, open the properties of the interface (right click), select "Internet Protocol Version 4 (TCP/IPv4)", click on Properties and then click on "Advanced...". On the new window, select the WINS tab, and then select "Disable NetBIOS over TCP/IP".
      Click here for more info- https://support.microsoft.com/en-us/kb/204279
      .

    netbios.PNG

     

     

    Mellanox Adapter Counters

     

    For additional debugging purposes, it is recommended to turn on the following counters on the Performance Monitor tool:
    - Mellanox Adapter Diagnostic Counters
    - Mellanox Adapter QoS Counters
    - Mellanox Adapter Traffic Counters

     

    Note: In general, when handling counters, it is recommended to clean them and start fresh.

    To clean the counters, restart the driver.

     

    a. Mellanox Adapter Diagnostic Counters

    Add the following counters of the proper interface (in normal situations, expect them to be equal zero):

        - Responder CQE Errors

        - Responder Duplicate Request Received

        - Responder Out-Of-Order Sequence Received

     

    In the example below, you can see Responder CQE Errors and Responder Duplicate Request Received counters are not equal zero, which implies that the RDMA connectivity is deficient.

     

    You may want to re-check your configuration.

     

     

     

     

    14 - counters 1.PNG

     

    Note: "Responder CQE Error" counter may raise at an end of SMB direct tear-down session. Details: RDMA receivers need to post receive WQEs to handle incoming messages, if the application does not know how many messages are expected to be received (e.g. by maintaining high level message credits) they may post more receive WQEs than will actually be used. On application tear-down, if the application did not use up all of it’s receive WQEs the device will issue completion with error for these WQEs to indicate HW does not plan to use them, this is done with a clear syndrome indication of “Flushed with error”.

     

    b. Mellanox Adapter QoS Counters

    These lists of counters are done per priority per interface. You can monitor specific priority that it used (for example, priority 3 for SMB traffic and priority 0 for other traffic).

    Make sure that the PFC is working on the SMB traffic (e.g. priority 3) via the counters:

       -  Rcv Pause Duration (The total duration that far-end port was requested to pause for the transmission of packets in microseconds).

       -  Rcv Pause Frames (The total number of pause frames received).

     

    In this example, you can see that those counters are used in priority 3 (PFC is enabled) and not in priority 0 (PFC is disabled).

    14 - counters 2.PNG

     

    c. Mellanox Adapter Traffic Counters

    These counters are per interface.

    You can monitor the total amount of ingress frames and egress frames, as well as discard frames and errors.

    Make sure that the following counters are equal to zero:

    - Packet Received Symbol Error

    - Packet Received Errors

    Those counters imply a physical link error. It is recommended to change the cable and retry again.

    The Packet Received Discarded counter implies that good packets were dropped due to insufficient resources. If you run only SMB traffic (RDMA), make sure that PFC is working well (check that pause frames are being sent).

    14 - counters 3.PNG

     

    Powershell commands to get counters:

    Use the following command examples to get counters by powershell command:

    1. Get specific counter:

    PS C:\Users\Administrator> get-counter -counter "\\localhost\Mellanox Adapter Diagnostic Counters(_total)\responder cqe errors" -SampleInterval 1 -MaxSamples 3

     

     

    Timestamp                 CounterSamples

    ---------                 --------------

    2/4/2015 11:08:22 PM      \\localhost\mellanox adapter diagnostic counters(_total)\responder cqe errors :                           21792

    2/4/2015 11:08:23 PM      \\localhost\mellanox adapter diagnostic counters(_total)\responder cqe errors :                           21792

    2/4/2015 11:08:24 PM      \\localhost\mellanox adapter diagnostic counters(_total)\responder cqe errors :                           21792

    2.  Get all counters:

    PS C:\Users\Administrator> get-counter -counter "\\localhost\Mellanox Adapter Diagnostic Counters(*)\*" -SampleInterval

    1 -MaxSamples 3

     

    3. More examples:

     

    PS C:\Users\Administrator> get-counter

     

    Timestamp                 CounterSamples

    ---------                 --------------

    2/4/2015 11:15:44 PM      \\gen-l-vrt-001\network interface(intel[r] i350 gigabit network connection)\bytes total/sec    :            0

                              \\gen-l-vrt-001\network interface(intel[r] i350 gigabit network connection _2)\bytes total/sec :            2729.76796928876

                              \\gen-l-vrt-001\network interface(mellanox connectx-3 pro ethernet adapter _3)\bytes total/sec :            0

                              \\gen-l-vrt-001\network interface(mellanox connectx-3 pro ethernet adapter _4)\bytes total/sec :            0

                              \\gen-l-vrt-001\network interface(isatap.mtl.labs.mlnx labs.mlnx lab.mtl.com mtl.com)\bytes total/sec :     0

                              \\gen-l-vrt-001\network interface(teredo tunneling pseudo-interface)\bytes total/sec :                      0

                              \\gen-l-vrt-001\network interface(isatap.{d0dd06f0-6d83-4bd0-b3e7-3db11d3620d7})\bytes total/sec :          0

                              \\gen-l-vrt-001\processor(_total)\% processor time :                                                        0.799322693526816

                              \\gen-l-vrt-001\memory\% committed bytes in use :                                                           2.9524232498405

                              \\gen-l-vrt-001\memory\cache faults/sec :                                                                   0

                              \\gen-l-vrt-001\physicaldisk(_total)\% disk time :                                                          0

                              \\gen-l-vrt-001\physicaldisk(_total)\current disk queue length :                                            0

     

    4. For help run:

    PS C:\Users\Administrator> get-help get-counter

     

    NAME

        Get-Counter

     

    SYNOPSIS

        Gets performance counter data from local and remote computers.

     

    SYNTAX

        Get-Counter [[-Counter] <String[]>] [-ComputerName <String[]>] [-Continuous] [-MaxSamples <Int64>]

        [-SampleInterval <Int32>] [<CommonParameters>]

     

        Get-Counter [-ListSet] <String[]> [-ComputerName <String[]>] [<CommonParameters>]

     

    DESCRIPTION

        The Get-Counter cmdlet gets live, real-time performance counter data directly from the performance monitoring

        instrumentation in Windows.  You can use it to get performance data from the local or remote computers at the

        sample interval that you specify.

        Without parameters, a "Get-Counter" command gets counter data for a set of system counters.

        You can use the parameters of Get-Counter to specify one or more computers, to list the performance counter sets

        and the counters that they contain, and to set the sample size and interval.

     

    RELATED LINKS

        Online Version: http://go.microsoft.com/fwlink/p/?linkid=289625

        Export-Counter

        Import-Counter

     

    REMARKS

        To see the examples, type: "get-help Get-Counter -examples".

        For more information, type: "get-help Get-Counter -detailed".

        For technical information, type: "get-help Get-Counter -full".

        For online help, type: "get-help Get-Counter -online"