HowTo Configure SMB Direct over IP networks (RoCEv2) on Windows 2012 Server

Version 8

    This post is based on HowTo Configure SMB Direct (RoCE) over PFC on Windows 2012 Server but focuses on the network setup and the configuration required to enable SMB Direct connection over RoCE v2 accross IP networks.

    There are several ways to create lossless network in case it is an IP network (L3). This post is focused on L2 priority preservation between networks while using PFC on all links.

    This post is considered to be an advance post. It is recommended to read and get experienced with basic RoCE configuration,PFC configuration for SMB direct and BGP/OSPF. References are listed below.

     

    References

     

    Setup

    The setup that was tested in this post is based on the setup in HowTo Configure BGP on Mellanox Switches. It is possible also to test it via one switch configured as router.

    In this setup, four Mellanox Ethernet switches (SX1036) are configured in VMS like topology, with BGP enabled. Each server is connected to a different leaf via one port.

     

    In this setup four Mellanox Ethernet switches and two servers are used in VMS topology, as follows :

    Mellanox SX1036 ToR switches - sx03, sx04

    Mellanox SX1036 Spine switches - sx05, sx06

    Two servers (S1, S2) , each installed with Windows 2012 servers equipped with Mellanox ConnectX-3 adapter cards.

     

    Before starting, make sure that L3 connectivity exists between the servers, that configured on different networks/subnets.

    1%3Fauth_token%3Df9f9d266884547829953b0f66b5237dda9ab7a01

     

    Technical Information

    Here are the high level network and host plans to preserve L2 priority between the networks while using PFC.

     

    Host plan for lossless network (high level):

    1. Enable VLAN tag on the interface.

    2. Enable RoCE v2.

    3. Enable PFC and QoS on the interface.

    4. Mark SMB traffic with L2 priority 3 for the egress traffic.

     

    Network plan for lossless network (high level):

    1. To be able to pass the L2 priority, all related interfaces should be configured with VLAN tag (trunks) .

    2. Enable PFC on all related interfaces for priority 3.

    3. The routers are configured to preserve the L2 priority from one network and to pass it to the next network (mark it on the packet).

     

    L2/L3 Network Configuration

    L3 OSPF or BGP configuration is out of the scope for this document, refer to HowTo Configure BGP on Mellanox Switches or HowTo Configure OSPF on Mellanox Switches (Running-Config) to achieve that.

     

    1. All interfaces should be configured as trunks, with the proper VLAN tag (depends on the network).

    For example:

    switch (config) # interface ethernet 1/1 switchport mode trunk

    switch (config) # interface ethernet 1/1 switchport trunk allowed-vlan 1

     

    2. PFC should be enabled on the switch and mapped on the all relevant interfaces. Refer to HowTo Enable PFC on Mellanox Switches (SwitchX) for more info.

    For example:

    switch (config) # dcb priority-flow-control enable force

    switch (config) # dcb priority-flow-control priority 3 enable

    switch (config) # interface ethernet 1/1 dcb priority-flow-control mode on force

     

    3. Configure the router to preserve the PCP between networks. This is a global configuration.

    switch (config) # qos map dscp-to-pcp preserve-pcp

    Note: If you won’t set this command to preserve the PCP value across networks, the egress router interface will map the ingress DSCP field to egress PCP value on the egress VLAN tag. The DSCP to PCP mapping is the default behavior.

     

    L2/L3 Host Configuration

    The hosts should be installed with Windows 2012 and latest WinOF driver.

     

    1. The relevant adapter port should be configured with IP address and VLAN tag. This is basic configuration, refer to HowTo Configure RoCE in Windows Environment (Global Pause) for examples.

    Make sure to configure route to the far-end server. For example:

    # route -p add 12.12.0.0 mask 255.255.0.0 12.12.5.2

    Note: the '-p' flag is used to preserve the route after restart.

     

    2. Enable RoCE V2 by setting the RoCE mode with the following command: 

    PS c:\> Set-MlnxDriverCoreSetting –RoceMode 2

     

    3. Set QoS and PFC as mentioned in HowTo Configure SMB Direct (RoCE) over PFC on Windows 2012 Server for the relevant interface.

     

    Verification

    Verification tests procedures can be taken from the verification chapter in HowTo Configure SMB Direct (RoCE) over PFC on Windows 2012 Server.

     

    Troubleshooting

    In general, if something is not working, it is recommended to split the problem and add complexity step by step.

     

    Problem: There is no L3 ping between the hosts.

    1. Re-check your L2/L3 switch configuration.

    2. All ports should configured in trunk mode.

    3. BGP (or OSPF or just static routing) should be configured properly and verified. Check BGP neighbors.

    4. IP addreses and route should be configured on the hosts.

    5. Refer also to the troubleshooting section of HowTo Configure BGP on Mellanox Switches.

     

    Problem: There is a ping, but RDMA fails to create a connection.

    1. Make sure PFC is configured properly across the network

    2. Re-check port priority counters in the switches. Refer to HowTo Enable PFC on Mellanox Switches (SwitchX).

    3. Make sure that the server is configured properly with PFC, refer to HowTo Configure SMB Direct (RoCE) over PFC on Windows 2012 Server.

    4. Check Performance monitoring counters, refer to HowTo Configure SMB Direct (RoCE) over PFC on Windows 2012 Server.

     

     

    Running Configuration

     

    SX03

    Connectivity:

    - SX03 is connected to server S1 on port 1/31

    - SX03 is connected to switch sx05  on port 1/1

    - SX03 is connected to switch sx06  on port 1/2

     

    ## DCBX PFC configuration (enabled on all ports)

       dcb priority-flow-control enable force

       dcb priority-flow-control priority 3 enable

       interface ethernet 1/1 dcb priority-flow-control mode on force

       interface ethernet 1/2 dcb priority-flow-control mode on force

       interface ethernet 1/31 dcb priority-flow-control mode on force

     

    ## Interface Ethernet configuration

       interface ethernet 1/1 switchport mode trunk

       interface ethernet 1/2 switchport mode trunk

       interface ethernet 1/31 switchport mode trunk

     

    ## VLAN configuration

       vlan 2-6

       interface ethernet 1/1 switchport trunk allowed-vlan none

       interface ethernet 1/2 switchport trunk allowed-vlan none

       interface ethernet 1/31 switchport trunk allowed-vlan none

       interface ethernet 1/1 switchport trunk allowed-vlan add 1

       interface ethernet 1/2 switchport trunk allowed-vlan 2

       interface ethernet 1/31 switchport trunk allowed-vlan 5

     

    ## General - STP and LLDP

    no spanning-tree

    lldp

     

     

    ## L3 configuration

       ip routing

       interface vlan 1

       interface vlan 2

       interface vlan 5

       interface vlan 1 ip address 12.12.1.1 255.255.255.0

       interface vlan 2 ip address 12.12.2.1 255.255.255.0

       interface vlan 5 ip address 12.12.5.2 255.255.255.0

     

    ## QoS switch configuration

       qos map dscp-to-pcp preserve-pcp

     

    ## BGP configuration (peer group example)

       protocol bgp

       router bgp 100

       router bgp 100 bgp fast-external-fallover

       router bgp 100 maximum-paths 2

       router bgp 100 bestpath as-path multipath-relax

       router bgp 100 neighbor test peer-group

       router bgp 100 neighbor test allowas-in 1

       router bgp 100 network 12.12.5.0 /24

       router bgp 100 bgp listen range 12.12.0.0 /16 peer-group test remote-as 200

     

     

    SX04

    Connectivity:

    - SX04 is connected to server S2 on port 1/31

    - SX04 is connected to switch sx05  on port 1/1

    - SX04 is connected to switch sx06  on port 1/2

     

    ## DCBX PFC configuration

       dcb priority-flow-control enable force

       dcb priority-flow-control priority 3 enable

       interface ethernet 1/1 dcb priority-flow-control mode on force

       interface ethernet 1/2 dcb priority-flow-control mode on force

       interface ethernet 1/31 dcb priority-flow-control mode on force

     

    ##

    ## Interface Ethernet configuration

       interface ethernet 1/1 switchport mode trunk

       interface ethernet 1/2 switchport mode trunk

       interface ethernet 1/31 switchport mode trunk

       interface ethernet 1/30 shutdown

     

    ## VLAN configuration

       vlan 2-6

       interface ethernet 1/1 switchport trunk allowed-vlan none

       interface ethernet 1/2 switchport trunk allowed-vlan none

       interface ethernet 1/31 switchport trunk allowed-vlan none

       interface ethernet 1/1 switchport trunk allowed-vlan 3

       interface ethernet 1/2 switchport trunk allowed-vlan 4

       interface ethernet 1/31 switchport trunk allowed-vlan 6

     

    ## General - STP and LLDP configuration

    no spanning-tree

    lldp

     

    ## L3 configuration

       ip routing

       interface vlan 3

       interface vlan 4

       interface vlan 6

       interface vlan 3 ip address 12.12.3.1 255.255.255.0

       interface vlan 4 ip address 12.12.4.1 255.255.255.0

       interface vlan 6 ip address 12.12.6.2 255.255.255.0

     

    ## QoS switch configuration

       qos map dscp-to-pcp preserve-pcp

     

    ## BGP configuration (static neighbor configuration example)

       protocol bgp

       router bgp 100

       router bgp 100 bgp fast-external-fallover

       router bgp 100 maximum-paths 2

       router bgp 100 bestpath as-path multipath-relax

       router bgp 100 neighbor 12.12.3.2 remote-as 200

       router bgp 100 neighbor 12.12.4.2 remote-as 200

       router bgp 100 neighbor 12.12.3.2 allowas-in 1

       router bgp 100 neighbor 12.12.4.2 allowas-in 1

       router bgp 100 network 12.12.6.0 /24

     

    SX05

    Connectivity:

    - SX05 is connected to switch sx03  on port 1/1

    - SX05 is connected to switch sx04  on port 1/2

     

    ##

    ## DCBX PFC configuration

       dcb priority-flow-control enable force

       dcb priority-flow-control priority 3 enable

       interface ethernet 1/1 dcb priority-flow-control mode on force

       interface ethernet 1/2 dcb priority-flow-control mode on force

     

    ## Interface Ethernet configuration

       interface ethernet 1/1 switchport mode trunk

       interface ethernet 1/2 switchport mode trunk

     

    ## VLAN configuration

       vlan 2-4

       interface ethernet 1/1 switchport trunk allowed-vlan none

       interface ethernet 1/2 switchport trunk allowed-vlan none

       interface ethernet 1/1 switchport trunk allowed-vlan 1

       interface ethernet 1/2 switchport trunk allowed-vlan 3

     

    ## STP and LLDP configuration

    no spanning-tree

    lldp

     

    ## L3 configuration

       ip routing

       interface vlan 1

       interface vlan 3

       interface vlan 1 ip address 12.12.1.2 255.255.255.0

       interface vlan 3 ip address 12.12.3.2 255.255.255.0

     

    ## QoS switch configuration

       qos map dscp-to-pcp preserve-pcp

     

    ## BGP configuration

       protocol bgp

       router bgp 200

       router bgp 200 bgp fast-external-fallover

       router bgp 200 neighbor 12.12.1.1 remote-as 100

       router bgp 200 neighbor 12.12.3.1 remote-as 100

     

    SX06

    Connectivity:

    - SX06 is connected to switch sx03  on port 1/1

    - SX06 is connected to switch sx04  on port 1/2

     

    ## DCBX PFC configuration

       dcb priority-flow-control enable force

       dcb priority-flow-control priority 3 enable

       interface ethernet 1/1 dcb priority-flow-control mode on force

       interface ethernet 1/2 dcb priority-flow-control mode on force

     

    ## Interface Ethernet configuration

       interface ethernet 1/1 switchport mode trunk

       interface ethernet 1/2 switchport mode trunk

     

     

    ## VLAN configuration

       vlan 2-4

       interface ethernet 1/1 switchport trunk allowed-vlan none

       interface ethernet 1/2 switchport trunk allowed-vlan none

       interface ethernet 1/1 switchport trunk allowed-vlan 2

       interface ethernet 1/2 switchport trunk allowed-vlan 4

     

    ## STP and LLDP configuration

    no spanning-tree

    lldp

     

    ## L3 configuration

       ip routing

       interface vlan 2

       interface vlan 4

       interface vlan 2 ip address 12.12.2.2 255.255.255.0

       interface vlan 4 ip address 12.12.4.2 255.255.255.0

     

    ## QoS switch configuration

       qos map dscp-to-pcp preserve-pcp

     

    ## BGP configuration

       protocol bgp

       router bgp 200

       router bgp 200 bgp fast-external-fallover

       router bgp 200 neighbor 12.12.2.1 remote-as 100

       router bgp 200 neighbor 12.12.4.1 remote-as 100

     

    Server S1 and S2 Configuration:

     

    Both servers share the same configuration besides the IP Address, VLAN and default gateway for the route.

    Server 1 uses 12.12.5.11/24 on VLAN 5, default gateway 12.12.5.2

    Server 2 uses 12.12.6.12/24 on VLAN 6, default gateway 12.12.6.2

     

    1. Net QoS Policy

    PS C:\Users\Administrator> Get-Netqospolicy -PolicyStore ActiveStore

     

    Name           : smb

    Owner          : PowerShell / WMI

    NetworkProfile : All

    Precedence     : 127

    NetDirectPort  : 445

    PriorityValue  : 3

    2. Net QoS Flow Control

    PS C:\Users\Administrator> Get-NetQosFlowControl

     

     

    Priority   Enabled

    --------   -------

    0          False

    1          False

    2          False

    3          True

    4          False

    5          False

    6          False

    7          False

    3. Net Adapter QoS

    PS C:\Users\Administrator> Get-NetAdapterQoS

     

    Name                       : Ethernet 16

    Enabled                    : True

    Capabilities               :                       Hardware     Current

                                                       --------     -------

                                 MacSecBypass        : NotSupported NotSupported

                                 DcbxSupport         : None         None

                                 NumTCs(Max/ETS/PFC) : 8/8/8        8/8/8

     

     

    OperationalTrafficClasses  : TC TSA    Bandwidth Priorities

                                 -- ---    --------- ----------

                                  0 ETS    50%       0-2,4-7

                                  1 ETS    50%       3

     

    OperationalFlowControl     : Priority 3 Enabled

    OperationalClassifications : Protocol  Port/Type Priority

                                 --------  --------- --------

                                 NetDirect 445       3

    4. Mellanox Driver Core Setting

    PS C:\Users\Administrator> Get-MlnxDriverCoreSetting

     

    Caption               : DriverCoreSettingData 'mlx4_bus'

    Description           : Mellanox Driver Option Settings

    ElementName           : mlx4_bus

    InstanceID            : mlx4_bus

    Name                  : mlx4_bus

    Source                : 3

    SystemName            : GEN-L-VRT-001

    LogMttsPerSeg         : 3

    LogNumCq              : 16

    LogNumMac             : 7

    LogNumMcg             : 13

    LogNumMpt             : 19

    LogNumMtt             : 20

    LogNumQp              : 21

    LogNumRdmaRc          : 4

    LogNumSrq             : 16

    LogNumVlan            : 7

    MaximumWorkingThreads : 4

    RoceMode              : 2.0

    Set4kMtu              : True

    SriovEnable           : False

    SriovPort1NumVFs      :

    SriovPort2NumVFs      :

    SriovPortMode         :

    PSComputerName        :

    5. Get Net Adapter:

    PS C:\Users\Administrator> Get-NetAdapter

     

    Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed

    ----                      --------------------                    ------- ------       ----------             ---------

     

    Ethernet 16               Mellanox ConnectX-3 Pro Ethernet A...#4      27 Up           F4-52-14-17-1F-92        40 Gbps

     

    6. Net Adapter Advance Properties (for Ethernet 16)

    Note: The output here is from Server S1 - VLAN ID =5.

    PS C:\Users\Administrator> Get-NetAdapterAdvancedProperty "Ethernet 16"

     

    Name                      DisplayName                    DisplayValue                   RegistryKeyword RegistryValue

    ----                      -----------                    ------------                   --------------- -------------

    Ethernet 16               Encapsulated Task Offload      Enabled                        *Encapsulate... {1}

    Ethernet 16               Flow Control                   Rx & Tx Enabled                *FlowControl    {3}

    Ethernet 16               Header Data Split              Disabled                       *HeaderDataS... {0}

    Ethernet 16               Interrupt Moderation           Enabled                        *InterruptMo... {1}

    Ethernet 16               IPV4 Checksum Offload          Rx & Tx Enabled                *IPChecksumO... {3}

    Ethernet 16               Jumbo Packet                   1514                           *JumboPacket    {1514}

    Ethernet 16               Large Send Offload V2 (IPv4)   Enabled                        *LsoV2IPv4      {1}

    Ethernet 16               Large Send Offload V2 (IPv6)   Enabled                        *LsoV2IPv6      {1}

    Ethernet 16               Maximum number of RSS Proce... 8                              *MaxRssProce... {8}

    Ethernet 16               NetworkDirect Functionality    Enabled                        *NetworkDirect  {1}

    Ethernet 16               Preferred NUMA node            Default Settings               *NumaNodeId     {65535}

    Ethernet 16               Maximum Number of RSS Queues   8                              *NumRSSQueues   {8}

    Ethernet 16               Priority & Vlan Tag            Priority & VLAN Enabled        *PriorityVLA... {3}

    Ethernet 16               Quality Of Service             Enabled                        *QOS            {1}

    Ethernet 16               Receive Buffers                512                            *ReceiveBuffers {512}

    Ethernet 16               Receive Side Scaling           Enabled                        *RSS            {1}

    Ethernet 16               RSS Base Processor Number      0                              *RssBaseProc... {0}

    Ethernet 16               RSS Maximum Processor Number   63                             *RssMaxProcN... {63}

    Ethernet 16               RSS load balancing Profile     ClosestProcessor               *RSSProfile     {1}

    Ethernet 16               SR-IOV                         Enabled                        *Sriov          {1}

    Ethernet 16               TCP/UDP Checksum Offload (I... Rx & Tx Enabled                *TCPUDPCheck... {3}

    Ethernet 16               TCP/UDP Checksum Offload (I... Rx & Tx Enabled                *TCPUDPCheck... {3}

    Ethernet 16               Send Buffers                   2048                           *TransmitBuf... {2048}

    Ethernet 16               Virtual Machine Queues         Enabled                        *VMQ            {1}

    Ethernet 16               VMQ VLAN Filtering             Enabled                        *VMQVlanFilt... {1}

    Ethernet 16               Locally Administered Address   --                             NetworkAddress  {--}

    Ethernet 16               Transmit Control Blocks        16                             NumTcb          {16}

    Ethernet 16               Receive Completion Method      Adaptive                       RecvCompleti... {1}

    Ethernet 16               R/RoCE Max Frame Size          Auto                           RoceMaxFrame... {0}

    Ethernet 16               Rx Interrupt Moderation Type   Adaptive                       RxIntModeration {2}

    Ethernet 16               Rx Interrupt Moderation Pro... Moderate                       RxIntModerat... {1}

    Ethernet 16               Number of Polls on Receive     10000                          ThreadPoll      {10000}

    Ethernet 16               Tx Throughput Port Arbiter     Best Effort (Default)          TxBwPrecedence  {0}

    Ethernet 16               Tx Interrupt Moderation Pro... Moderate                       TxIntModerat... {1}

    Ethernet 16               VLAN ID                        5                              VlanID          {5}

    7. Route:

    Make sure you see the static route to the default Gateway.

    This example was take from Server S1 while the default gateway is 12.12.5.2 on sx03.

    PS C:\Users\Administrator> route print -4

    ...

    IPv4 Route Table

    ===========================================================================

    Active Routes:

    Network Destination        Netmask          Gateway       Interface  Metric

    ...  

            12.12.0.0      255.255.0.0        12.12.5.2       12.12.5.11      6

            12.12.5.0    255.255.255.0         On-link        12.12.5.11    261

           12.12.5.11  255.255.255.255         On-link        12.12.5.11    261

          12.12.5.255  255.255.255.255         On-link        12.12.5.11    261

    ...

    Persistent Routes:

      Network Address          Netmask  Gateway Address  Metric

            12.12.0.0      255.255.0.0        12.12.5.2       1

              0.0.0.0          0.0.0.0        12.12.5.2  Default

    ===========================================================================