Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
For provisioning Kubernetes clusters with the Azure cloud provider Kubermatic needs a service account with (at least) the Azure role Contributor
. Please follow the following steps to create a matching service account.
Login to Azure with Azure CLI az
.
This command will open in your default browser a window where you can authenticate. After you succefully logged in get your subscription ID.
Get your Tenant ID
create a new app with
Enter provider credentials using the values from step “Prepare Azure Environment” into Kubermatic Dashboard:
Client ID
: Take the value of appId
Client Secret
: Take the value of password
Tenant ID
: your tenant ID
Subscription ID
: your subscription ID
State Data Centres with On-Premise Kubernetes Clusters
Running Kubernetes on-premise gives a cloud-native experience or SDC becomes cloud-agnostic when it comes to the experience of Deploying DIGIT.
Whether States have their own on-premise data centre, have decided to forego the various managed cloud solutions, there are few things one should know when getting started with on-premise K8s.
One should be familiar with Kubernetes and one should know that the control plane consists of the Kube-apiserver, Kube-scheduler, Kube-controller-manager and an ETCD datastore. For managed cloud solutions like Google’s Kubernetes Engine (GKE) or Azure’s Kubernetes Service (AKS) it also includes the cloud-controller-manager. This is the component that connects the cluster to the external cloud services to provide networking, storage, authentication, and other feature support.
To successfully deploy a bespoke Kubernetes cluster and achieve a cloud-like experience on SDC, one need to replicate all the same features you get with a managed solution. At a high-level this means that we probably want to:
Automate the deployment process
Choose a networking solution
Choose a storage solution
Handle security and authentication
Let us look at each of these challenges individually, and we’ll try to provide enough of an overview to aid you in getting started.
Using a tool like an ansible can make deploying Kubernetes clusters on-premise trivial.
When deciding to manage your own Kubernetes clusters, we need to set up a few proof-of-concept (PoC) clusters to learn how everything works, perform performance and conformance tests, and try out different configuration options.
After this phase, automating the deployment process is an important if not necessary step to ensure consistency across any clusters you build. For this, you have a few options, but the most popular are:
****kubeadm: a low-level tool that helps you bootstrap a minimum viable Kubernetes cluster that conforms to best practices
kubespray: an ansible playbook that helps deploy production- ready clusters
If you already using ansible, kubespray is a great option otherwise we recommend writing automation around kubeadm using your preferred playbook tool after using it a few times. This will also increase your confidence and knowledge in the tooling surrounding Kubernetes.
When designing clusters, choosing the right container networking interface (CNI) plugin can be the hardest part. This is because choosing a CNI that will work well with an existing network topology can be tough. Do you need BGP peering capabilities? Do you want an overlay network using vxlan? How close to bare-metal performance are you trying to get?
There are a lot of articles that compare the various CNI provider solutions (calico, weave, flannel, kube-router, etc.) that are must-reads like the benchmark results of Kubernetes network plugins article. We usually recommend Project Calico for its maturity, continued support, and large feature set or flannel for its simplicity.
For ingress traffic, you’ll need to pick a load-balancer solution. For a simple configuration, you can use MetalLB, but if you’re lucky enough to have F5 hardware load-balancers available we recommend checking out the K8s F5 BIG-IP Controller. The controller supports connecting your network plugin to the F5 either through either vxlan or BGP peering. This gives the controller full visibility into pod health and provides the best performance.
Kubernetes provides a number of included storage volume plugins. If you’re going on-premise you’ll probably want to use network-attached storage (NAS) option to avoid forcing pods to be pinned to specific nodes.
For a cloud-like experience, you’ll need to add a plugin to dynamically create persistent volume objects that match the user’s persistent volume claims. You can use dynamic provisioning to reclaim these volume objects after a resource has been deleted.
Pure Storage has a great example helm chart, the Pure Service Orchestrator (PSO), that provides smart provisioning although it only works for Pure Storage products.
As anyone familiar with security knows, this is a rabbit-hole. You can always make your infrastructure more secure and should be investing in continual improvements.
Including different Kubernetes plugins can help build a secure, cloud-like experience for your users
When designing on-premise clusters you’ll have to decide where to draw the line. To really harden your cluster’s security you can add plugins like:
istio: provides the underlying secure communication channel, and manages authentication, authorization, and encryption of service communication at scale
gVisor: is a user-space kernel, written in Go, that implements a substantial portion of the Linux system surface
vault: secure, store and tightly control access to tokens, passwords, certificates, encryption keys for protecting secrets and other sensitive data
For user authentication, we recommend checking out guard which will integrate with an existing authentication provider. If you’re already using Github teams to then this could be a no-brainer.
Hope this has given you a good idea of deploying, networking, storage, and security for you to take the leap into deploying your own on-premise Kubernetes clusters. Like we mentioned above, the team will want to build proof-of-concept clusters, run conformance and performance tests, and really become experts on Kubernetes if you’re going to be using it to run DIGIT on production.
We’ll leave you with a few other things the team should be thinking of:
Externally backing up Kubernetes YAML, namespaces, and configuration files
Running applications across clusters in an active-active configuration to allow for zero-downtime updates
Running game days like deleting the CNI to measure and improve time-to-recovery
National Informatica Cloud
Details coming soon...
For access to the Compute Engine API, it has to be enabled at the Google APIs console.
The user for the Google Service Account that has to be created has to have three roles:
Compute Admin: roles/compute.admin
Service Account User: roles/iam.serviceAccountUser
Viewer: roles/viewer
If the gcloud
CLI is installed, a service account can be created like follow:
A Google Service Account for the platform has to be created, see Creating and managing service accounts. The result is a JSON file containing the fields
type
project_id
private_key_id
private_key
client_email
client_id
auth_uri
token_uri
auth_provider_x509_cert_url
client_x509_cert_url
The private key is BASE64 containing the newlines as non-escaped strings "\n”. So to avoid the resulting troubles the machine controller expects the whole service account encoded in BASE64.
The base64 encoded secret of the service account will be passed in the field serviceAccount
of the cloudProviderSpec
of the machine deployment. The encoded secret can be entered in the UI field Service Account
:
The Kubernetes vSphere driver contains bugs related to detaching volumes from offline nodes. See the Volume detach bug section for more details.
When creating worker nodes for a user cluster, the user can specify an existing image. Defaults may be set in the datacenters.yaml.
Supported operating systems
Go into the VSphere WebUI, select your data centre, right-click onto it and choose “Deploy OVF Template”
Fill in the “URL” field with the appropriate URL
Click through the dialogue until “Select storage”
Select the same storage you want to use for your machines
Select the same network you want to use for your machines
Leave everything in the “Customize Template” and “Ready to complete” dialogue as it is
Wait until the VM got fully imported and the “Snapshots” => “Create Snapshot” button is not greyed out anymore.
The template VM must have the disk.enable UUID flag set to 1, this can be done using the govc tool with the following command:
Convert it to vmdk: qemu-img convert -f qcow2 -O vmdk CentOS-7-x86_64-GenericCloud.qcow2 CentOS-7-x86_64-GenericCloud.vmdk
Upload it to a Datastore of your vSphere installation
Create a new virtual machine that uses the uploaded vmdk as rootdisk.
Modifications like Network, disk size, etc. must be done in the ova template before creating a worker node from it. If user clusters have dedicated networks, all user clusters, therefore, need a custom template.
During the creation of a user cluster Kubermatic creates a dedicated VM folder in the root path on the Datastore (Defined in the datacenters.yaml). That folder will contain all worker nodes of a user cluster.
Kubernetes needs to talk to the vSphere to enable Storage inside the cluster. For this, kubernetes needs a config called cloud-config
. This config contains all details to connect to a vCenter installation, including credentials.
As this Config must also be deployed onto each worker node of a user cluster, its recommended to have individual credentials for each user cluster.
The VSphere user must have the following permissions on the correct resources
Role k8c-storage-vmfolder-propagate
Granted at VM Folder and Template Folder, propagated
Permissions
Virtual machine
Change Configuration
Add existing disk
Add new disk
Add or remove the device
Remove disk
Folder
Create folder
Delete folder
Role k8c-storage-datastore-propagate
Granted at Datastore, propagated
Permissions
Datastore
Allocate space
Low-level file operations
Role Read-only
(predefined)
Granted at …, not propagated
Datacenter
Role k8c-user-vcenter
Granted at vcentre level, not propagated
Needed to customize VM during provisioning
Permissions
VirtualMachine
Provisioning
Modify customization specification
Read customization specifications
Role k8c-user-datacenter
Granted at datacentre level, not propagated
Needed for cloning the template VM (obviously this is not done in a folder at this time)
Permissions
Datastore
Allocate space
Browse datastore
Low-level file operations
Remove file
vApp
vApp application configuration
vApp instance configuration
Virtual Machine
Change CPU count
Memory
Settings
Inventory
Create from existing
Role k8c-user-cluster-propagate
Granted at the cluster level, propagated
Needed for upload of cloud-init.iso
(Ubuntu and CentOS) or defining the Ignition config into Guestinfo (CoreOS)
Permissions
Host
Configuration
System Management
Local operations
Reconfigure virtual machine
Resource
Assign virtual machine to the resource pool
Migrate powered off the virtual machine
Migrate powered-on virtual machine
vApp
vApp application configuration
vApp instance configuration
Role k8s-network-attach
Granted for each network that should be used
Permissions
Network
Assign network
Role k8c-user-datastore-propagate
Granted at datastore/datastore cluster level, propagated
Permissions
Datastore
Allocate space
Browse datastore
Low-level file operations
Role k8c-user-folder-propagate
Granted at VM Folder and Template Folder level, propagated
Needed for managing the node VMs
Permissions
Folder
Create folder
Delete folder
Global
Set custom attribute
Virtual machine
Change Configuration
Edit Inventory
Guest operations
Interaction
Provisioning
Snapshot management
The described permissions have been tested with vSphere 6.7 and might be different for other vSphere versions.
After a node is powered-off, the Kubernetes vSphere driver doesn’t detach disks associated with PVCs mounted on that node. This makes it impossible to reschedule pods using these PVCs until the disks are manually detached in vCenter.
Upstream Kubernetes has been working on the issue for a long time now and tracking it under the following tickets: