1 of 7

Supported Clouds

This section discusses the supported cloud environment for DIGIT services. It provides information on where and how DIGIT is deployed. Further, it offers guidelines on estimating the infrastructural requirements for cloud support.

Supported Cloud List

Google Cloud
Azure
AWS
VSphere
SDC
NIC

Google Cloud

Compute Engine API

For access to the Compute Engine API, it has to be enabled at the Google APIs console.

User Roles

The user for the Google Service Account that has to be created has to have three roles:

Compute Admin: roles/compute.admin
Service Account User: roles/iam.serviceAccountUser
Viewer: roles/viewer

If the gcloud CLI is installed, a service account can be created like follow:

# create new service account
gcloud iam service-accounts create k8c-cluster-provisioner

# get your service account id
gcloud iam service-accounts list
# get your project id
gcloud projects list

# create policy binding
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID --member 'serviceAccount:YOUR_SERVICE_ACCOUNT_ID' --role='roles/compute.admin'
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID --member 'serviceAccount:YOUR_SERVICE_ACCOUNT_ID' --role='roles/iam.serviceAccountUser' 
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID --member 'serviceAccount:YOUR_SERVICE_ACCOUNT_ID' --role='roles/viewer'

Google Service Account

A Google Service Account for the platform has to be created, see Creating and managing service accounts. The result is a JSON file containing the fields

type
project_id
private_key_id
private_key
client_email
client_id
auth_uri
token_uri
auth_provider_x509_cert_url
client_x509_cert_url

The private key is BASE64 containing the newlines as non-escaped strings "\n”. So to avoid the resulting troubles the machine controller expects the whole service account encoded in BASE64.

# create a new json key for your service account
gcloud iam service-accounts keys create --iam-account YOUR_SERVICE_ACCOUNT k8c-cluster-provisioner-sa-key.json
# create base64 encoded secret
base64 -w 0 ./k8c-cluster-provisioner-sa-key.json

Passing the Google Service Account

The base64 encoded secret of the service account will be passed in the field serviceAccount of the cloudProviderSpec of the machine deployment. The encoded secret can be entered in the UI field Service Account:

Azure

Prepare Azure Environment

For provisioning Kubernetes clusters with the Azure cloud provider Kubermatic needs a service account with (at least) the Azure role Contributor. Please follow the following steps to create a matching service account.

This command will open in your default browser a window where you can authenticate. After you succefully logged in get your subscription ID.

az account show --query id -o json

********-****-****-****-************

Get your Tenant ID

az account show --query tenantId -o json

********-****-****-****-************

create a new app with

az ad sp create-for-rbac --role="Contributor" --scopes="/subscriptions/********-****-****-****-************"

Retrying role assignment creation: 1/36
Retrying role assignment creation: 2/36
Retrying role assignment creation: 3/36
{
  "appId": "********-****-****-****-************",
  "displayName": "azure-cli-2018-11-25-08-01-39",
  "name": "http://azure-cli-2018-11-25-08-01-39",
  "password": "********-****-****-****-************",
  "tenant": "********-****-****-****-************"
}

Enter provider credentials using the values from step “Prepare Azure Environment” into Kubermatic Dashboard:

Client ID: Take the value of appId
Client Secret: Take the value of password
Tenant ID: your tenant ID
Subscription ID: your subscription ID

AWS

Overview

This page discusses the provisioning of the Kubernetes cluster which is an abstracted infrastructure requirement for DIGIT to be deployed. Learn how to provision infra-as-code on AWS using terraform.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "iam:GetInstanceProfile",
                "iam:ListInstanceProfiles"
            ],
            "Resource": "arn:aws:iam::YOUR_ACCOUNT_ID:instance-profile/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:CreateRole",
                "iam:DeleteRole",
                "iam:DeleteRolePolicy",
                "iam:GetRole",
                "iam:ListAttachedRolePolicies",
                "iam:ListRolePolicies",
                "iam:PassRole",
                "iam:PutRolePolicy"
            ],
            "Resource": "arn:aws:iam::YOUR_ACCOUNT_ID:role/kubernetes-*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:AddRoleToInstanceProfile",
                "iam:CreateInstanceProfile",
                "iam:DeleteInstanceProfile",
                "iam:GetInstanceProfile",
                "iam:RemoveRoleFromInstanceProfile"
            ],
            "Resource": "arn:aws:iam::YOUR_ACCOUNT_ID:instance-profile/kubernetes-*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:*",
                "elasticloadbalancing:CreateListener",
                "elasticloadbalancing:CreateRule",
                "elasticloadbalancing:CreateTargetGroup",
                "elasticloadbalancing:CreateLoadBalancer",
                "elasticloadbalancing:ConfigureHealthCheck",
                "elasticloadbalancing:DeleteListener",
                "elasticloadbalancing:DeleteRule",
                "elasticloadbalancing:DeleteTargetGroup",
                "elasticloadbalancing:DeleteLoadBalancer",
                "elasticloadbalancing:DeregisterTargets",
                "elasticloadbalancing:DescribeListeners",
                "elasticloadbalancing:DescribeRules",
                "elasticloadbalancing:DescribeTargetGroupAttributes",
                "elasticloadbalancing:DescribeTargetGroups",
                "elasticloadbalancing:DescribeTargetHealth",
                "elasticloadbalancing:DescribeLoadBalancers",
                "elasticloadbalancing:DescribeLoadBalancerAttributes",
                "elasticloadbalancing:ModifyListener",
                "elasticloadbalancing:ModifyRule",
                "elasticloadbalancing:ModifyTargetGroup",
                "elasticloadbalancing:ModifyTargetGroupAttributes",
                "elasticloadbalancing:ModifyLoadBalancerAttributes",
                "elasticloadbalancing:RegisterTargets",
                "elasticloadbalancing:RegisterInstancesWithLoadBalancer",
                "elasticloadbalancing:RemoveListenerCertificates",
                "elasticloadbalancing:SetIpAddressType",
                "elasticloadbalancing:SetRulePriorities",
                "elasticloadbalancing:SetSecurityGroups",
                "elasticloadbalancing:SetSubnets",
                "elasticloadbalancing:SetWebAcl",
                "sts:GetFederationToken"
            ],
            "Resource": "*"
        }
    ]
}

VSphere

Overview

The Kubernetes vSphere driver contains bugs related to detaching volumes from offline nodes. See the Volume detach bug section for more details.

VM Images

When creating worker nodes for a user cluster, the user can specify an existing image. Defaults may be set in the datacenters.yaml.

Supported operating systems

Ubuntu 18.04 ova
CoreOS ova
CentOS 7 qcow2

Importing the OVA

Go into the VSphere WebUI, select your data centre, right-click onto it and choose “Deploy OVF Template”
Fill in the “URL” field with the appropriate URL
Click through the dialogue until “Select storage”
Select the same storage you want to use for your machines
Select the same network you want to use for your machines
Leave everything in the “Customize Template” and “Ready to complete” dialogue as it is
Wait until the VM got fully imported and the “Snapshots” => “Create Snapshot” button is not greyed out anymore.
The template VM must have the disk.enable UUID flag set to 1, this can be done using the govc tool with the following command:

govc vm.change -e="disk.enableUUID=1" -vm='/PATH/TO/VM'

Importing the QCOW2

Convert it to vmdk: qemu-img convert -f qcow2 -O vmdk CentOS-7-x86_64-GenericCloud.qcow2 CentOS-7-x86_64-GenericCloud.vmdk
Upload it to a Datastore of your vSphere installation
Create a new virtual machine that uses the uploaded vmdk as rootdisk.

Modifications

Modifications like Network, disk size, etc. must be done in the ova template before creating a worker node from it. If user clusters have dedicated networks, all user clusters, therefore, need a custom template.

VM Folder

During the creation of a user cluster Kubermatic creates a dedicated VM folder in the root path on the Datastore (Defined in the datacenters.yaml). That folder will contain all worker nodes of a user cluster.

Credentials / Cloud-Config

Kubernetes needs to talk to the vSphere to enable Storage inside the cluster. For this, kubernetes needs a config called cloud-config. This config contains all details to connect to a vCenter installation, including credentials.

As this Config must also be deployed onto each worker node of a user cluster, its recommended to have individual credentials for each user cluster.

Permissions

The VSphere user must have the following permissions on the correct resources

Seed Cluster

Role k8c-storage-vmfolder-propagate
- Granted at VM Folder and Template Folder, propagated
- Permissions
  - Virtual machine
    Change Configuration
    Add existing disk
    Add new disk
    Add or remove the device
    Remove disk
  - Folder
    Create folder
    Delete folder
Role k8c-storage-datastore-propagate
- Granted at Datastore, propagated
- Permissions
  - Datastore
    Allocate space
    Low-level file operations
Role Read-only (predefined)
- Granted at …, not propagated
  - Datacenter

User Cluster

Role k8c-user-vcenter
- Granted at vcentre level, not propagated
- Needed to customize VM during provisioning
- Permissions
  - VirtualMachine
    Provisioning
    Modify customization specification
    Read customization specifications
Role k8c-user-datacenter
- Granted at datacentre level, not propagated
- Needed for cloning the template VM (obviously this is not done in a folder at this time)
- Permissions
  - Datastore
    Allocate space
    Browse datastore
    Low-level file operations
    Remove file
  - vApp
    vApp application configuration
    vApp instance configuration
  - Virtual Machine
    Change CPU count
    Memory
    Settings
  - Inventory
    Create from existing
Role k8c-user-cluster-propagate
- Granted at the cluster level, propagated
- Needed for upload of cloud-init.iso (Ubuntu and CentOS) or defining the Ignition config into Guestinfo (CoreOS)
- Permissions
  - Host
    Configuration
    System Management
    Local operations
    Reconfigure virtual machine
  - Resource
    Assign virtual machine to the resource pool
    Migrate powered off the virtual machine
    Migrate powered-on virtual machine
  - vApp
    vApp application configuration
    vApp instance configuration
Role k8s-network-attach
- Granted for each network that should be used
- Permissions
  - Network
    Assign network
Role k8c-user-datastore-propagate
- Granted at datastore/datastore cluster level, propagated
- Permissions
  - Datastore
    Allocate space
    Browse datastore
    Low-level file operations
Role k8c-user-folder-propagate
- Granted at VM Folder and Template Folder level, propagated
- Needed for managing the node VMs
- Permissions
  - Folder
    Create folder
    Delete folder
  - Global
    Set custom attribute
  - Virtual machine
    Change Configuration
    Edit Inventory
    Guest operations
    Interaction
    Provisioning
    Snapshot management

The described permissions have been tested with vSphere 6.7 and might be different for other vSphere versions.

Volume Detach Bug

After a node is powered-off, the Kubernetes vSphere driver doesn’t detach disks associated with PVCs mounted on that node. This makes it impossible to reschedule pods using these PVCs until the disks are manually detached in vCenter.

Upstream Kubernetes has been working on the issue for a long time now and tracking it under the following tickets:

SDC

State Data Centres with On-Premise Kubernetes Clusters

What to know when deploying Kubernetes on SDC

Running Kubernetes on-premise gives a cloud-native experience or SDC becomes cloud-agnostic when it comes to the experience of Deploying DIGIT.

Whether States have their own on-premise data centre, have decided to forego the various managed cloud solutions, there are few things one should know when getting started with on-premise K8s.

One should be familiar with Kubernetes and one should know that the control plane consists of the Kube-apiserver, Kube-scheduler, Kube-controller-manager and an ETCD datastore. For managed cloud solutions like Google’s Kubernetes Engine (GKE) or Azure’s Kubernetes Service (AKS) it also includes the cloud-controller-manager. This is the component that connects the cluster to the external cloud services to provide networking, storage, authentication, and other feature support.

To successfully deploy a bespoke Kubernetes cluster and achieve a cloud-like experience on SDC, one need to replicate all the same features you get with a managed solution. At a high-level this means that we probably want to:

Automate the deployment process
Choose a networking solution
Choose a storage solution
Handle security and authentication

Let us look at each of these challenges individually, and we’ll try to provide enough of an overview to aid you in getting started.

Automating the deployment process

Using a tool like an ansible can make deploying Kubernetes clusters on-premise trivial.

When deciding to manage your own Kubernetes clusters, we need to set up a few proof-of-concept (PoC) clusters to learn how everything works, perform performance and conformance tests, and try out different configuration options.

After this phase, automating the deployment process is an important if not necessary step to ensure consistency across any clusters you build. For this, you have a few options, but the most popular are:

****kubeadm: a low-level tool that helps you bootstrap a minimum viable Kubernetes cluster that conforms to best practices
kubespray: an ansible playbook that helps deploy production- ready clusters

If you already using ansible, kubespray is a great option otherwise we recommend writing automation around kubeadm using your preferred playbook tool after using it a few times. This will also increase your confidence and knowledge in the tooling surrounding Kubernetes.

Choosing a network solution

When designing clusters, choosing the right container networking interface (CNI) plugin can be the hardest part. This is because choosing a CNI that will work well with an existing network topology can be tough. Do you need BGP peering capabilities? Do you want an overlay network using vxlan? How close to bare-metal performance are you trying to get?

There are a lot of articles that compare the various CNI provider solutions (calico, weave, flannel, kube-router, etc.) that are must-reads like the benchmark results of Kubernetes network plugins article. We usually recommend Project Calico for its maturity, continued support, and large feature set or flannel for its simplicity.

For ingress traffic, you’ll need to pick a load-balancer solution. For a simple configuration, you can use MetalLB, but if you’re lucky enough to have F5 hardware load-balancers available we recommend checking out the K8s F5 BIG-IP Controller. The controller supports connecting your network plugin to the F5 either through either vxlan or BGP peering. This gives the controller full visibility into pod health and provides the best performance.

Choosing a storage solution

Kubernetes provides a number of included storage volume plugins. If you’re going on-premise you’ll probably want to use network-attached storage (NAS) option to avoid forcing pods to be pinned to specific nodes.

For a cloud-like experience, you’ll need to add a plugin to dynamically create persistent volume objects that match the user’s persistent volume claims. You can use dynamic provisioning to reclaim these volume objects after a resource has been deleted.

Pure Storage has a great example helm chart, the Pure Service Orchestrator (PSO), that provides smart provisioning although it only works for Pure Storage products.

Handle security and authentication

As anyone familiar with security knows, this is a rabbit-hole. You can always make your infrastructure more secure and should be investing in continual improvements.

Including different Kubernetes plugins can help build a secure, cloud-like experience for your users

When designing on-premise clusters you’ll have to decide where to draw the line. To really harden your cluster’s security you can add plugins like:

istio: provides the underlying secure communication channel, and manages authentication, authorization, and encryption of service communication at scale
gVisor: is a user-space kernel, written in Go, that implements a substantial portion of the Linux system surface
vault: secure, store and tightly control access to tokens, passwords, certificates, encryption keys for protecting secrets and other sensitive data

For user authentication, we recommend checking out guard which will integrate with an existing authentication provider. If you’re already using Github teams to then this could be a no-brainer.

Other Considerations

Hope this has given you a good idea of deploying, networking, storage, and security for you to take the leap into deploying your own on-premise Kubernetes clusters. Like we mentioned above, the team will want to build proof-of-concept clusters, run conformance and performance tests, and really become experts on Kubernetes if you’re going to be using it to run DIGIT on production.

We’ll leave you with a few other things the team should be thinking of:

Externally backing up Kubernetes YAML, namespaces, and configuration files
Running applications across clusters in an active-active configuration to allow for zero-downtime updates
Running game days like deleting the CNI to measure and improve time-to-recovery

NIC

National Informatica Cloud

Details coming soon...

SDC

State Data Centres with On-Premise Kubernetes Clusters

What to know when deploying Kubernetes on SDC

Running Kubernetes on-premise gives a cloud-native experience or SDC becomes cloud-agnostic when it comes to the experience of Deploying DIGIT.

Whether States have their own on-premise data centre, have decided to forego the various managed cloud solutions, there are few things one should know when getting started with on-premise K8s.

Automate the deployment process
Choose a networking solution
Choose a storage solution
Handle security and authentication

Let us look at each of these challenges individually, and we’ll try to provide enough of an overview to aid you in getting started.

Automating the deployment process

Using a tool like an ansible can make deploying Kubernetes clusters on-premise trivial.

****kubeadm: a low-level tool that helps you bootstrap a minimum viable Kubernetes cluster that conforms to best practices
kubespray: an ansible playbook that helps deploy production- ready clusters

Choosing a network solution

Choosing a storage solution

Pure Storage has a great example helm chart, the Pure Service Orchestrator (PSO), that provides smart provisioning although it only works for Pure Storage products.

Handle security and authentication

As anyone familiar with security knows, this is a rabbit-hole. You can always make your infrastructure more secure and should be investing in continual improvements.

Including different Kubernetes plugins can help build a secure, cloud-like experience for your users

When designing on-premise clusters you’ll have to decide where to draw the line. To really harden your cluster’s security you can add plugins like:

istio: provides the underlying secure communication channel, and manages authentication, authorization, and encryption of service communication at scale
gVisor: is a user-space kernel, written in Go, that implements a substantial portion of the Linux system surface
vault: secure, store and tightly control access to tokens, passwords, certificates, encryption keys for protecting secrets and other sensitive data

For user authentication, we recommend checking out guard which will integrate with an existing authentication provider. If you’re already using Github teams to then this could be a no-brainer.

Other Considerations

We’ll leave you with a few other things the team should be thinking of:

Externally backing up Kubernetes YAML, namespaces, and configuration files
Running applications across clusters in an active-active configuration to allow for zero-downtime updates
Running game days like deleting the CNI to measure and improve time-to-recovery

VSphere

Overview

The Kubernetes vSphere driver contains bugs related to detaching volumes from offline nodes. See the Volume detach bug section for more details.

VM Images

When creating worker nodes for a user cluster, the user can specify an existing image. Defaults may be set in the datacenters.yaml.

Supported operating systems

Ubuntu 18.04 ova
CoreOS ova
CentOS 7 qcow2

Importing the OVA

Go into the VSphere WebUI, select your data centre, right-click onto it and choose “Deploy OVF Template”
Fill in the “URL” field with the appropriate URL
Click through the dialogue until “Select storage”
Select the same storage you want to use for your machines
Select the same network you want to use for your machines
Leave everything in the “Customize Template” and “Ready to complete” dialogue as it is
Wait until the VM got fully imported and the “Snapshots” => “Create Snapshot” button is not greyed out anymore.
The template VM must have the disk.enable UUID flag set to 1, this can be done using the govc tool with the following command:

govc vm.change -e="disk.enableUUID=1" -vm='/PATH/TO/VM'

Importing the QCOW2

Convert it to vmdk: qemu-img convert -f qcow2 -O vmdk CentOS-7-x86_64-GenericCloud.qcow2 CentOS-7-x86_64-GenericCloud.vmdk
Upload it to a Datastore of your vSphere installation
Create a new virtual machine that uses the uploaded vmdk as rootdisk.

Modifications

VM Folder

Credentials / Cloud-Config

As this Config must also be deployed onto each worker node of a user cluster, its recommended to have individual credentials for each user cluster.

Permissions

The VSphere user must have the following permissions on the correct resources

Seed Cluster

Role k8c-storage-vmfolder-propagate
- Granted at VM Folder and Template Folder, propagated
- Permissions
  - Virtual machine
    Change Configuration
    Add existing disk
    Add new disk
    Add or remove the device
    Remove disk
  - Folder
    Create folder
    Delete folder
Role k8c-storage-datastore-propagate
- Granted at Datastore, propagated
- Permissions
  - Datastore
    Allocate space
    Low-level file operations
Role Read-only (predefined)
- Granted at …, not propagated
  - Datacenter

User Cluster

Role k8c-user-vcenter
- Granted at vcentre level, not propagated
- Needed to customize VM during provisioning
- Permissions
  - VirtualMachine
    Provisioning
    Modify customization specification
    Read customization specifications
Role k8c-user-datacenter
- Granted at datacentre level, not propagated
- Needed for cloning the template VM (obviously this is not done in a folder at this time)
- Permissions
  - Datastore
    Allocate space
    Browse datastore
    Low-level file operations
    Remove file
  - vApp
    vApp application configuration
    vApp instance configuration
  - Virtual Machine
    Change CPU count
    Memory
    Settings
  - Inventory
    Create from existing
Role k8c-user-cluster-propagate
- Granted at the cluster level, propagated
- Needed for upload of cloud-init.iso (Ubuntu and CentOS) or defining the Ignition config into Guestinfo (CoreOS)
- Permissions
  - Host
    Configuration
    System Management
    Local operations
    Reconfigure virtual machine
  - Resource
    Assign virtual machine to the resource pool
    Migrate powered off the virtual machine
    Migrate powered-on virtual machine
  - vApp
    vApp application configuration
    vApp instance configuration
Role k8s-network-attach
- Granted for each network that should be used
- Permissions
  - Network
    Assign network
Role k8c-user-datastore-propagate
- Granted at datastore/datastore cluster level, propagated
- Permissions
  - Datastore
    Allocate space
    Browse datastore
    Low-level file operations
Role k8c-user-folder-propagate
- Granted at VM Folder and Template Folder level, propagated
- Needed for managing the node VMs
- Permissions
  - Folder
    Create folder
    Delete folder
  - Global
    Set custom attribute
  - Virtual machine
    Change Configuration
    Edit Inventory
    Guest operations
    Interaction
    Provisioning
    Snapshot management

The described permissions have been tested with vSphere 6.7 and might be different for other vSphere versions.

Volume Detach Bug

Upstream Kubernetes has been working on the issue for a long time now and tracking it under the following tickets:

Supported Clouds

Google Cloud

Compute Engine API

User Roles

Google Service Account

Passing the Google Service Account

Azure

Prepare Azure Environment

Login to Azure and Get Basic Information

AWS

Overview

VSphere

Overview

VM Images

Importing the OVA

Importing the QCOW2

Modifications

VM Folder

Credentials / Cloud-Config

Permissions

Seed Cluster

User Cluster

Volume Detach Bug

SDC

What to know when deploying Kubernetes on SDC

Automating the deployment process

Choosing a network solution

Choosing a storage solution

Handle security and authentication

Other Considerations

NIC

Supported Clouds

Azure

Prepare Azure Environment

Login to Azure and Get Basic Information

SDC

What to know when deploying Kubernetes on SDC

Automating the deployment process

Choosing a network solution

Choosing a storage solution

Handle security and authentication

Other Considerations

AWS

Overview

NIC

Google Cloud

Compute Engine API

User Roles

Google Service Account

Passing the Google Service Account

VSphere

Overview

VM Images

Importing the OVA

Importing the QCOW2

Modifications

VM Folder

Credentials / Cloud-Config

Permissions

Seed Cluster

User Cluster

Volume Detach Bug