Reference Deployment Guide of Windows Server 2016 Hyper-Converged Cluster over Mellanox Ethernet Solution

Version 19

    Purpose

    This document provides guidelines on how to evaluate the Hyper-Converged cluster solution available with Windows Server 2016. In particular, it focuses on using System Center Virtual Machine Manager (SCVMM) 2016 to deploy Hyper-Converged cluster with All-Flash Storage Space Direct on four physical servers.

    Scope

    In this document we will demonstrate deployment procedure of Microsoft Hyper-Converged virtualization cluster based on Storage Spaces Direct and Mellanox end-to-end Ethernet solution.

    Definitions/Abbreviation

    Table 1: Abbreviation

    Definitions/Abbreviation

    Description

    RoCE

    RDMA over Converged Ethernet

    S2D

    Storage Space Direct

     

    Related Documentation

    Table 2: Related Documentation

    Document Title

    Description

    Hyper-converged solution using Storage Spaces Direct in Windows Server 2016

    https://technet.microsoft.com/windows-server-docs/storage/storage-spaces/hyper-converged-solution-using-storage-spaces-direct

    Deploy, and manage SCVMM 2016

    https://technet.microsoft.com/en-us/system-center-docs/vmm/vmm

     

    Introduction

    This document provides guidelines on how to evaluate the Hyper-Converged cluster solution available with Windows Server 2016. In particular, it focuses on using System Center Virtual Machine Manager (SCVMM) 2016 to deploy Hyper-Converged cluster with All-Flash Storage Space Direct on a four physical servers.

    This guide includes instructions on how to install and configure the components of a Hyper-Converged system using SCVMM 2016. The act of deploying a Hyper-Converged system can be divided into three high level phases:

      • Deploy Windows Server
      • Configure the network with SCVMM 2016
      • Configure Storage Spaces Direct Cluster

    For further information on what Hyper-converged cluster with Storage Space Direct is, and how to plan your infrastructure and deployment, please refer to the following link:
    Hyper-converged solution using Storage Spaces Direct in Windows Server 2016
    Deployment of SCVMM 2016 is not described in this guide. For information of how to deploy, and manage SCVMM 2016, please refer to the following link: https://technet.microsoft.com/en-us/system-center-docs/vmm/vmm.

     

    Solution overview

    In the Hyper-Converged configuration described in this guide, Storage Spaces Direct seamlessly integrates with the features you know today that make up the Windows Server software defined storage stack, including Clustered Shared Volume File System (CSVFS), Storage Spaces and Failover Clustering.

    The Hyper-Converged deployment scenario has the Hyper-V (compute) and Storage Spaces Direct (storage) components on the same cluster and the Virtual Machine files are stored on local CSVs. As compute components we use two socket servers, and as storage components we use 8 NVMe drives in each server, connected 4 to each NUMA.

    Mellanox ConnectX-4/ConnectX-4 Lx adapter cards and SN2700 switches enable the solution with high performance and low latency Ethernet network with RoCE enabled. This dramatically improve S2D performance, VM live migration and data transfer rate between Virtual Machines.

    RoCE technology doubles the throughput, cuts the latency by 50%, and increases the CPU efficiency by more than 33% (https://www.mellanox.com/blog/2015/05/storage-spaces-direct-if-not-rdma-then-what-if-not-mellanox-then-who/)

      

    Figure 1: Hyper-converged cluster with configured for Storage Spaces Direct and the hosting of virtual machines

     

    Solution Configuration

    Network Components

    Network Design

    The following illustration shows an example configuration. This example does not cover the router’s configuration and its connectivity to TOR switches.

     

    Figure 2: Solution high-level design

     

    Each server connected to both SN2700 switches by copper cable to allow for redundant network links in the event of a network port or external switch failure.

    Switch port connectivity in our case:

        • 1st port – connected to router
        • 16th port – connected to Management server
        • 22,24,26 and 28 ports – connected to each node in cluster
        • 31 and 32 ports – IPL interconnect between two switches

     

    Detailed switch configuration examples provided in Appendix A.

    Logical Networks Configuration

    The table below shows the configuration of networks with VLAN ID for SN2700 switches.

    Table 3: Cluster Networks

    Network Name

    Subnet

    Mask

    VLAN ID

    Gateway

    DNS server

    Deploy

    172.21.0.0

    16

    Native(621)

    172.21.1.254

    172.16.1.251

    MGMT

    172.16.0.0

    16

    616

    172.16.254.253

    172.16.1.251

    Cluster

    172.17.0.0

    16

    617

    172.17.1.253

    No

    LiveMigration

    172.18.0.0

    16

    618

    172.18.1.253

    No

    Storage

    172.19.0.0

    16

    619

    172.19.1.253

    No

     

    Servers Network Configuration

    The table below shows the server names and their network configuration.

    Table 4: Server names and network configuration

    Server type

    Server name

    IP and NICS

    Deploy network

    MGMT network

    DC (AD DS, DNS, DHCP)

    clx-wrd-dc

    Not part of this network

    IP:1 72.16.1.251/16

    GW: 172.16.254.253

    DNS: 172.16.1.251

    SCVMM

    clx-wrd-vmm

    Not part of this network

    IP:172.16.1.250/16

    GW:172.16.254.253

    DNS: 172.16.1.251

    Compute, Storage

    clx-wrd-s1

    By DHCP from DC

    From Pool via SCVMM

    Compute, Storage

    clx-wrd-s2

    By DHCP from DC

    From Pool via SCVMM

    Compute, Storage

    clx-wrd-s3

    By DHCP from DC

    From Pool via SCVMM

    Compute, Storage

    clx-wrd-s4

    By DHCP from DC

    From Pool via SCVMM

     

    DHCP Service Configuration

    Domain Controller (DC) will provide the DHCP service for the Deploy network. The DHCP Deploy scope should be configured according to the information below:

        • IP: 172.21.1.1-100/16
        • GW: 172.21.1.254
        • DNS: 172.16.1.251

     

    Rack Design and Hardware components

             

    Figure 3: Solution rack configuration using Mellanox SN2700 switches

         The figure above shows the rack configuration and the components of 4 nodes cluster.

    • IPMI switch
    • Management server: 2 socket server with 500GB local storage to host VMs
      • One Mellanox ConnectX-4 dual port adapter
      • DC VM with installed and configured AD DS, DNS and DHCP services
      • SCVMM VM in a standalone configuration
    • Two Mellanox SN2700 switches with IPL interconnect
      • 32 ports 100Gbps QSFP28
    • Four Compute Nodes, each containing:
      • Two Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz
      • 128 GB memory
      • One Mellanox ConnectX-4 dual port adapter
      • Storage Components
      • Two SSD drives in RAID-1 for OS
      • Eight NVMe 1.2TB SSDs for S2D storage services
    • Mellanox LinkX MCP1600-Cxxx series cables to connect the servers to switches.

     

    System Deployment

    Configure Domain Firewall Policy

    Open GPMC and edit Default Domain Policy (Default Domain Policy -> Computer configuration -> Policies -> Administrative Templates -> Network -> Network Connection -> Domain Profile) and enable “Windows Firewall: Allow inbound file and printer sharing exception”.

     

     

    Deploy Windows Server on Compute nodes

    On each Compute Node (clx-wrd-S1, clx-wrd-S2, clx-wrd-S3 and clx-wrd-S4) install Windows Server 2016 using the Setup wizard with “Windows Server 2016 (Server with Desktop Experience)” option chosen.

    Install the latest vendor and Mellanox WinOF-2 drivers.

    Join all hosts to Domain (in our case – wrd.clx)

    Install Windows features on the compute hosts from the Domain Controller by running the following PowerShell script.

     

    $nodes = ("clx-wrd-S1", "clx-wrd-S2", "clx-wrd-S3", "clx-wrd-S4")

    Invoke-Command $nodes {Install-WindowsFeature Data-Center-Bridging}

    Invoke-Command $nodes {Install-WindowsFeature Multipath-IO}

    Invoke-Command $nodes {Install-WindowsFeature Failover-Clustering -IncludeAllSubFeature -IncludeManagementTools}

    Invoke-Command $nodes {Install-WindowsFeature -Name Hyper-V -IncludeManagementTools -Restart}

     

    Enable Network Quality of Service (QoS) in the Domain Controller by running the following PowerShell script.

     

    $nodes = ("clx-wrd-S1", "clx-wrd-S2", "clx-wrd-S3", "clx-wrd-S4")

    Invoke-Command $nodes {Get-NetAdapter  | ? InterfaceDescription -Match "Mellanox*" | Sort-Object number |% {$_ | Set-NetAdapterAdvancedProperty -RegistryKeyword "*JumboPacket" -RegistryValue 9000}}

    Invoke-Command $nodes {New-NetQosPolicy “SMB” –NetDirectPortMatchCondition 445 –PriorityValue8021Action 3}

    Invoke-Command $nodes {Enable-NetQosFlowControl –Priority 3}

    Invoke-Command $nodes {Disable-NetQosFlowControl –Priority 0,1,2,4,5,6,7}

    Invoke-Command $nodes {Get-NetAdapter | ? InterfaceDescription -Match "Mellanox" | Enable-NetAdapterQos}

    Invoke-Command $nodes {New-NetQosTrafficClass "SMB" -Priority 3 -BandwidthPercentage 50 -Algorithm ETS}

     

     

    Configure the network with SCVMM 2016

    Setup Network Settings in SCVMM

    Open SCVMM Console, and go to Settings -> Network Settings, Uncheck “Create logical networks automatically” and press “Finish”.

     

    Set up host group

    Create a dedicated host group for Hyper-V hosts in order to simplify management in future.
    Open VMM Console, click “Fabric” and click on the “Servers > All Hosts”. Right click the “All Hosts” and select “Create Host Group” and type name “S2D”.

    Add compute hosts to SCVMM

    Right click the “All Hosts” and select “Add Hyper-V Hosts and Clusters”

     

    Select “Windows Server computer in a trusted Active Directory domain”.

     

        

     

    In the Credentials, select user an existing Run As Account, click on Browse and page click on “Create Run As Account” to create a “Domain Admin” account – WRD\Administrator (for our case).

          

    Type “CLX” in the “Computer names” text field then press next to search the hosts

    You will see clx-wrd-s1 to clx-wrd-s4 displayed on the “Target Resources” page.

    Select hosts clx-wrd-s1 to clx-wrd-s4 then click “Next”.

          

    Click “Next”

    On Summary page confirm the settings and click “Finish”.

    All 4 hosts will be added to the “S2D” group when the jobs are completed as the following.

          

     

    Creating logical networks (MGMT, Cluster, LiveMigration and Storage)

    Below we explain two ways how-to create logical networks with SCVMM server

    Automated Logical Networks Creation by PowerShell Script

    You can use  PowerShell script attached (see in the bottom of the post)

    Change script parameters accordingly with your infrastructure and run it in PowerShell session from SCVMM server.

     

    Manual Logical Networks Creation with SCVMM GUI

    Repeat logical network and IP Address Pool creation for All Logical Networks accordingly with below example:

         Create logical network

    Click Fabric > Networking. Right-click Logical Networks > Create Logical Network.

      1. Specify “MGMT” as a Name and optional Description.
      2. In Settings select One Connected Network. All management networks need to have routing and connectivity between all hosts in that network. Select Create a VM network with the same name to allow virtual machines to access this logical network directly to automatically create a VM network for your management network.
      3. Click Network Site > Add. Select the host group for the hosts that will be managed by the network controller. Insert your management network IP subnet details. This network should already exist and be configured in your physical switch.
      4. Review the Summary information and click “Finish” to complete.

     

    Create an IP address pool for logical network

    In order to allocate static IP addresses to compute hosts, create an IP address pool in the management logical network. If you're using DHCP you can skip this step.

      1. In the VMM console, right-click the management logical network in > Create IP Pool.
      2. Provide “MGMT_Pool” as a Name and optional description for the pool and ensure that the management network is selected for the logical network.
      3. In Network Site panel, select the subnet that this IP address pool will service.
      4. In IP Address range panel, type the starting and ending IP addresses.
      5. To use an IP as Cluster IP, type one of the IP addresses from the specified range in IP addresses to be reserved for other uses box.
        • Note: Don't use the first three IP addresses of your available subnet. For example, if your available subnet is from .1 to .254, start your range at .4 or greater.

      6. Specify the default gateway address, DNS address and optionally configure WINS settings.
      7. In Summary page, review the settings and click Finish to complete the wizard.

     

        After the completion of the logical networks and IP pools, you should see the following:

     

    Create a logical switch

      • Click Create Logical Switch on the ribbon in the VMM Console.
      • Review the Getting Started information and click Next.
      • Provide a “MLNX-LSW” as the name and optional Description. For the Uplink mode, be sure to select Embedded Team. Click Next.

             

      • For Minimum Bandwidth mode, choose the Weight option. Click Next.
      • Uncheck all the switch extensions in the Extensions.
      • On the Virtual Port tab add port classifications.

             

      • On the Uplinks tab, click Add and then either create a new Uplink Port Profile or use an existing Uplink Port Profile if you already have one configured. For this example, create a new Uplink Port Profile. For the new Uplink Port Profile:
        1. Use the defaults for Load Balancing algorithm and Teaming Mode.
        2. Select the Uplink Port Profile you created and click New virtual network adapter. This adds a host virtual network adapter (vNIC) to your logical switch and uplink port profile, so when you add the logical switch to your hosts the vNICs get added automatically.
        3. Provide a name for the vNIC. Verify that the Management VM network is listed under the Connectivity section.
        4. Select This network adapter will be used for host management ( only for MGMT).
        5. Select a port classification and virtual port profile.
        6. Click Next.
        7. Repeat steps b-f for Cluster, LiveMigration, Storage01 and Storage02 logical networks.
      • Review the Summary information and click “Finish” to complete the wizard.

     

         Since Windows Server 2016 we can add two SMB (Storage) NICs to the same network. It’s possible due to implementation a new feature Simplified SMB Multichannel and Multi-NIC Cluster Networks.

     

    Deploying a logical switch on compute hosts

     

      • Click VMs and Services.
      • Right click on HOST machine under “All Hosts” group and click on “Properties”
      • Click on “Virtual Switches” on the left and click on “New Virtual Switch” then Choose the Logical Switch you just created

             

      • Add both ports of Mellanox NIC in “Physical Adapters” with created Uplink Port Profile before.
      • Review the screen provided information and click “OK” to start logical switch deployment.

     

    Repeat steps 1-5 for all hosts.

     

    Disabling regular Flow Control and set affinity between a vNIC and a pNIC

    After deploying the Logical switch on the all hosts, disable the regular Flow Control on the Mellanox adapters, since Priority Flow Control (PFC) and Flow Control cannot operate simultaneously on the same interface.
    Creating an affinity between a vNIC and a pNIC ensures that the traffic from a given vNIC on the host (storage vNIC) uses a particular pNIC to send traffic so that it passes through the shorter path.
    Please execute the following PowerShell script from the Domain controller to make all these changes quickly and consistently.

     

    $nodes = ("clx-wrd-S1", "clx-wrd-S2", "clx-wrd-S3", "clx-wrd-S4")
    Invoke-Command $nodes {Set-NetAdapterAdvancedProperty -InterfaceDescription "*Mellanox*" -RegistryKeyword "*FlowControl" -RegistryValue 0}
    Invoke-Command $nodes {(Get-NetAdapter -InterfaceDescription 'Mellanox ConnectX-4 VPI Adapter').name | ? {Set-VMNetworkAdapterTeamMapping -VMNetworkAdapterName 'Storage01' -ManagementOS -PhysicalNetAdapterName $_}}
    Invoke-Command $nodes {(Get-NetAdapter -InterfaceDescription 'Mellanox ConnectX-4 VPI Adapter #2').name | ? {Set-VMNetworkAdapterTeamMapping -VMNetworkAdapterName 'Storage02' -ManagementOS -PhysicalNetAdapterName $_}}

    Invoke-Command -ComputerName $nodes {Restart-Computer -Force}

     

    Create and Configure Storage Spaces Direct Cluster

    The following steps are run in PowerShell session from SCVMM with administrative permissions.

      • Run cluster validation

              

    $nodes = ("clx-wrd-S1", "clx-wrd-S2", "clx-wrd-S3", "clx-wrd-S4")
    Test-Cluster -Node $nodes -Include "Storage Spaces Direct","Inventory","Network","System Configuration"

      • Create a cluster

    The –NoStorage parameter is important to be added to the cmdlet, otherwise disks may be automatically added to the cluster and you will need to remove them before enabling Storage Spaces Direct otherwise they will not be included in the Storage Spaces Direct storage pool.

             

    $ClusterIP = (Get-SCStaticIPAddressPool -LogicalNetworkDefinition (Get-SCLogicalNetworkDefinition -LogicalNetwork "MGMT")).IPAddressReservedSet
    New-Cluster -Name CLX-S2D -Node $nodes –NoStorage -IgnoreNetwork 172.17.0.0/16,172.19.0.0/16,172.18.0.0/16 -StaticAddress $ClusterIP

      • Enable Storage Spaces Direct in all-flash mode. The -CacheState Disabled parameter is important to be added to the cmdlet, because we using only single tier of devices, such as an all NVMe.

               Enable-ClusterStorageSpacesDirect -CacheState Disabled -AutoConfig:0 -SkipEligibilityChecks 

      • Show clustered disks and create pool – “S2D-Pool”.

    Get-StorageSubsystem *cluster* | Get-PhysicalDisk | ? bustype -eq NVMe
    New-StoragePool -StorageSubSystemFriendlyName *Cluster* -FriendlyName S2D-Pool -ProvisioningTypeDefault Fixed -PhysicalDisk (Get-PhysicalDisk | ? BusType -eq NVMe)

      • Add Storage Provider for Spaces Direct in SCVMM.

    Open VMM console and navigate to the Fabric workspace, select Add Resources | Storage Devices and select Windows-based File Server as shown in the following screenshot. Click Next.

             

    Provide the IP address or the Cluster FQDN and check the checkbox if the cluster is in other domain. Provide the Run As Account and click Next.
    On nest screen select the
    Storage Device and click Next.
    To finish Storage pool configuration Pool Classification must be provided.
    To create Pool Classification open SCVMM console, go to
    Fabric | Storage | Classification and Pools and push on icon “Create Storage Classification” on the ribbon in the SCVMM Console.

             

    Provide name and click “Add”

    • Specifying a classification for the created pool.

    Open VMM console, go to Fabric | Storage | Arrays, right-click the Storage Spaces Direct Cluster created, and then click Manage Pools.
    Select the storage pool and click Edit.
    Choice Pool classification then click OK.

     

    • Create a Cluster Shared Volume

           To create a Cluster Shared Volume, right-click on the cluster that has been created, click Properties, select the Shared Volumes option, Click Add and then follow the steps to create the volume.

    Appendix A. Switch configuration examples

     

    • TOR-1 configuration example

     

              Switch configuration provided in TOR-1C.txt.zip file below.
     

    • TOR-2 configuration example

             Switch configuration provided in TOR-2C.txt.zip file below.

    Appendix B. PowerShell script for create Logical networks in SCVMM

     

              Script provided in create_S2D_networks.ps1.zip file below.

     

     

    Appendix C. VM Fleet Cluster Performance Test

     

    For the cluster performance tests the VMFleet utility is used. The results provided below are based a four node Hyper-Converged cluster where each node was equipped with Mellanox ConnectX-4 VPI 100GbE dual port adapter card. Detailed NIC information provided on screenshot below.

     

              

     

    VMFleet with up to 36 virtual machines per node was used for a total of 144 Virtual Machines. Each Virtual Machine was configured with 1vCPU and 2GB RAM. Then, VMFleet was used to run DISKSPD in each of the 144 Virtual Machines for the two performance tests:

    • First test - 1 thread, 4KiB sequential read with 32 outstanding IO.

             

         This test enabled us to hit over 1.8M IOPS in aggregate throughput into the Virtual Machines.

     

    • Second test - 1 thread, 512KiB sequential read with 4 outstanding IO.

     

               

        This test enabled us to hit about 69GB/s in aggregate throughput into the Virtual Machines.