Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This page discusses the infrastructure requirements for DIGIT services. It also explains why DIGIT services are containerised and deployed on Kubernetes.
DIGIT Infra is abstracted to **[Kubernetes](https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/) which is an open-source containers orchestration platform that helps in abstracting variety of infra types that are being available across each state, like Physical, VMs, on-premises clouds(VMware, OpenStack, Nutanix, etc.), commercial clouds (Google, AWS, Azure, etc.), SDC and NIC into a standard infra type. Essentially it unifies various infra types into a standard and single type of infrastructure and thus DIGIT becomes multi-cloud supported, portable, extensible, high-performant and scalable** containerized workloads and services. This facilitates both declarative configuration and automation. Kubernetes services, eco-system, support and tools are widely available.
Kubernetes as such is a set of components that designated jobs of scheduling, controlling, monitoring
3 or more machines running one of:
Ubuntu 16.04+
Debian 9
CentOS 7
RHEL 7
Container Linux (tested with 1576.4.0)
4 GB or more of RAM per machine (any less will leave little room for your apps)
2 CPUs or more
3 or more machines running one of:
Ubuntu 16.04+
Debian 9
CentOS 7
RHEL 7
Container Linux (tested with 1576.4.0)
2 GB or more of RAM per machine (any less will leave little room for your apps)
2 CPUs or more
Full network connectivity between all machines in the cluster (public or private network is fine)
Unique hostname, MAC address, and product_uuid for every node. Click here for more details.
Certain ports are open on your machines. See below for more details
Swap disabled. You MUST disable swap in order for the Kubelet to work properly
product_uuid
Are Unique for Every NodeYou can get the MAC address of the network interfaces using the command ip link
or ifconfig -a
The product_uuid can be checked by using the command sudo cat /sys/class/dmi/id/product_uuid
It is very likely that hardware devices will have unique addresses, although some virtual machines may have identical values. Kubernetes uses these values to uniquely identify the nodes in the cluster. If these values are not unique to each node, the installation process may fail.
If you have more than one network adapter, and your Kubernetes components are not reachable on the default route, we recommend you add IP route(s) so Kubernetes cluster addresses go via the appropriate adapter.
Protocol
Direction
Port Range
Purpose
TCP
Inbound
6443*
Kubernetes API server
TCP
Inbound
2379-2380
etcd server client API
TCP
Inbound
10250
kubelet API
TCP
Inbound
10251
kube-scheduler
TCP
Inbound
10252
kube-controller-manager
TCP
Inbound
10255
Read-only kubelet API
Protocol
Direction
Port Range
Purpose
TCP
Inbound
10250
kubelet API
TCP
Inbound
10255
Read-only kubelet API
TCP
Inbound
30000-32767
NodePort Services**
** Default port range for NodePort Services.
Any port numbers marked with * are overridable, so you will need to ensure any custom ports you provide are also open.
For provisioning Kubernetes clusters with the Azure cloud provider Kubermatic needs a service account with (at least) the Azure role Contributor
. Please follow the following steps to create a matching service account.
Login to Azure with Azure CLI az
.
This command will open in your default browser a window where you can authenticate. After you succefully logged in get your subscription ID.
Get your Tenant ID
create a new app with
Enter provider credentials using the values from step “Prepare Azure Environment” into Kubermatic Dashboard:
Client ID
: Take the value of appId
Client Secret
: Take the value of password
Tenant ID
: your tenant ID
Subscription ID
: your subscription ID
This page explains why Kubernetes is required. It deep dives into the key benefits of using Kubernetes to run a large containerized platform like DIGIT in production environments.
Kubernetes project started in the year 2014 with . Kubernetes has now become the de facto standard for deploying containerized applications at scale in private, public and hybrid cloud environments. The largest public cloud platforms , , , and now provide managed services for Kubernetes. A few years back RedHat, Mesosphere, Pivotal, VMware, Nutanix completely redesigned their implementation with Kubernetes and collaborated with the Kubernetes community for implementing the next generation container platform with incorporated key features of Kubernetes such as container grouping, overlay networking, layer 4 routing, secrets, etc. Today many organizations & technology providers adapting kubernetes at a rapid phase.
One of the fundamental design decisions which have been taken by this impeccable cluster manager is its ability to deploy existing applications that run on VMs without any changes to the application code. On the high level, any application that runs on VMs can be deployed on Kubernetes by simply containerizing its components. This is achieved by its core features; container grouping, container orchestration, overlay networking, container-to-container routing with layer 4 virtual IP based routing system, service discovery, support for running daemons, deploying stateful application components, and most importantly the ability to extend the container orchestrator for supporting complex orchestration requirements.
On very high-level Kubernetes provides a set of dynamically scalable hosts for running workloads using containers and uses a set of management hosts called masters for providing an API for managing the entire container infrastructure.
That's just a glimpse of what Kubernetes provides out of the box. In the next few sections will go through its core features and explain how it can help applications to be deployed on it in no time.
A containerized application can be deployed on Kubernetes using a deployment definition by executing a simple CLI command as follows:
One of the key features of Kubernetes is its service discovery and internal routing model provided using SkyDNS and layer 4 virtual IP based routing system. These features provide internal routing for application requests using services. A set of pods created via a replica set can be load balanced using a service within the cluster network. The services get connected to pods using selector labels. Each service will get assigned a unique IP address, a hostname derived from its name and route requests among the pods in round-robin manner. The services will even provide IP-hash based routing mechanism for applications which may require session affinity. A service can define a collection of ports and the properties defined for the given service will apply to all the ports in the same way. Therefore, in a scenario where session affinity is only needed for a given port where all the other ports required to use round-robin based routing, multiple services may need to be used.
Kubernetes services have been implemented using a component called kube-proxy. A kube-proxy instance runs in each node and provides three proxy modes: Userspace, iptables and IPVS. The current default is iptables.
In the first proxy mode: userspace, kube-proxy itself will act as a proxy server and delegate requests accepted by an iptable rule to the backend pods. In this mode, kube-proxy will operate in the userspace and will add an additional hop to the message flow.
In the second proxy mode: iptables, the kube-proxy will create a collection of iptable rules for forwarding incoming requests from the clients directly to the ports of backend pods on the network layer without adding an additional hop in the middle. This proxy mode is much faster than the first mode because of operating in the kernel space and not adding an additional proxy server in the middle.
Kubernetes services can be exposed to external networks in two main ways. The first is using node ports by exposing dynamic ports on the nodes that forward traffic to the service ports. The second is using a load balancer configured via an ingress controller which can delegate requests to the services by connecting to the same overlay network. An ingress controller is a background process which may run in a container which listens to the Kubernetes API, dynamically configure and reloads a given load balancer according to a given set of ingresses. An ingress defines the routing rules based on hostnames and context paths using services.
Once an application is deployed on Kubernetes using kubectl run
command, it can be exposed to the external network via a load balancer as follows:
The above command will create a service of load balancer type and map it to the pods using the same selector label created when the pods were created. As a result, depending on how the Kubernetes cluster has been configured a load balancer service on the underlying infrastructure will get created for routing requests for the given pods either via the service or directly.
Applications that require persisting data on the filesystem may use volumes for mounting storage devices to ephemeral containers similar to how volumes are used with VMs. Kubernetes has properly designed this concept by loosely coupling physical storage devices with containers by introducing an intermediate resource called persistent volume claims (PVCs). A PVC defines the disk size, disk type (ReadWriteOnce, ReadOnlyMany, ReadWriteMany) and dynamically links a storage device to a volume defined against a pod. The binding process can either be done in a static way using PVs or dynamically by using a persistent storage provider. In both approaches, a volume will get linked to a PV one to one and depend on the configuration given data will be preserved even if the pods get terminated. According to the disk type used multiple pods will be able to connect to the same disk and read/write.
Kubernetes provides a resource called DaemonSets for running a copy of a pod in each Kubernetes node as a daemon. Some of the use cases of DaemonSets are as follows:
A cluster storage daemon such as glusterd
, ceph
to be deployed on each node for providing persistence storage.
A log collection daemon such as fluentd
or logstash
to be run on every node for collecting container and Kubernetes component logs.
An ingress controller pod to be run on a collection of nodes for providing external routing.
One of the most difficult tasks of containerizing applications is the process of designing the deployment architecture of stateful distributed components. Stateless components can be easily containerized as they may not have a predefined startup sequence, clustering requirements, point to point TCP connections, unique network identifiers, graceful startup and termination requirements, etc. Systems such as databases, big data analysis systems, distributed key/value stores, and message brokers, may have complex distributed architectures that may require above features. Kubernetes introduced StatefulSets resource for supporting such complex requirements.
On high-level StatefulSets are similar to ReplicaSets except that it provides the ability to handle the startup sequence of pods, uniquely identify each pod for preserving its state while providing the following characteristics:
Stable, unique network identifiers.
Stable, persistent storage.
Ordered, graceful deployment and scaling.
Ordered, graceful deletion and termination.
Ordered, automated rolling updates
Containers generally use environment variables for parameterizing their runtime configurations. However, typical enterprise applications use a considerable amount of configuration files for providing static configurations required for a given deployment. Kubernetes provides a fabulous way of managing such configuration files using a simple resource called ConfigMaps without bundling them into the container images. ConfigMaps can be created using directories, files or literal values using following CLI command:
Once a ConfigMap is created, it can be mounted to a pod using a volume mount. With this loosely coupled architecture, configurations of an already running system can be updated seamlessly just by updating the relevant ConfigMap and executing a rolling update process which I will explain in one of the next sections. I might be important to note that currently ConfigMaps does not support nested folders, therefore if there are configuration files available in a nested directory structure of the application, a ConfigMap would need to be created for each directory level.
Similar to ConfigMaps Kubernetes provides another valuable resource called Secrets for managing sensitive information such as passwords, OAuth tokens, and ssh keys. Otherwise updating that information on an already running system might require rebuilding the container images.
A secret can be created for managing basic auth credentials using the following way:
Once a secret is created, it can be read by a pod either using environment variables or volume mounts. Similarly, any other type of sensitive information can be injected into pods using the same approach.
The above animated-image illustrates how application updates can be rolled out for an already running application using blue/green deployment method without having to take a system downtime. This is another invaluable feature of Kubernetes which allows applications to seamlessly roll out security updates and backwards compatible changes without much effort. If the changes are not backwards compatible, a manual blue/green deployment might need to be executed using a separate deployment definition.
This approach allows a rollout to be executed for updating a container image using a simple CLI command:
Once a rollout is executed, the status of the rollout process can be checked as follows:
Using the same CLI command kubectl set image deployment
an update can be rolled back to a previous state.
Figure 10: Kubernetes Pod Autoscaling Model
Kubernetes allows pods to be manually scaled either using ReplicaSets or Deployments. The following CLI command can be used for this purpose:
Figure 11: Helm and Kubeapps Hub
The Kubernetes community initiated a separate project for implementing a package manager for Kubernetes called Helm. This allows Kubernetes resources such as deployments, services, config maps, ingresses, etc to be templated and packaged using a resource called chart and allow them to be configured at the installation time using input parameters. More importantly, it allows existing charts to be reused when implementing installation packages using dependencies. Helm repositories can be hosted in public and private cloud environments for managing application charts. Helm provides a CLI for installing applications from a given Helm repository into a selected Kubernetes environment.
Kubernetes has been designed with over a decade of experience on running containerized applications at scale at Google. It has been already adopted by the largest public cloud vendors, technology providers and currently being embraced by most of the software vendors and enterprises as this article is written. It has even lead to the inception of the Cloud Native Computing Foundation (CNCF) in the year 2015, was the first project to graduate under CNCF, and started streamlining the container ecosystem together with other container-related projects such as CNI, Containers, Envoy, Fluentd, gRPC, Jagger, Linkerd, Prometheus, RKT and Vitess. The key reasons for its popularity and to be endorsed at such level might be its flawless design, collaborations with industry leaders, making it open-source, always being open to ideas and contributions.
The above figure illustrates the high-level application deployment model on Kubernetes. It uses a resource called for orchestrating containers. A ReplicaSet can be considered as a YAML or a JSON based metadata file which defines the container images, ports, the number of replicas, activation health checks, liveness health checks, environment variables, volume mounts, security rules, etc required for creating and managing the containers. Containers are always created on Kubernetes as groups called which is again a Kubernetes metadata definition or a resource. Each pod allows sharing the file system, network interfaces, operating system users, etc among the containers using Linux namespaces, cgroups, and other kernel features. The ReplicaSets can be managed by another high-level resource called for providing features for rolling out updates and handling their rollbacks.
The third proxy mode was which is much similar to the second proxy mode and it makes use of an based virtual server for routing requests without using iptable rules. IPVS is a transport layer load balancing feature which is available in the Linux kernel based on Netfilter and provides a collection of load balancing algorithms. The main reason for using IPVS over iptables is the performance overhead of syncing proxy rules when using iptables. When thousands of services are created, updating iptable rules takes a considerable amount of time compared to a few milliseconds with IPVS. Moreover, IPVS uses a hash table for looking up the proxy rules over sequential scans with iptables. More information on the introduction of IPVS proxy mode can be found in “” presentation done by Huawei at KubeCon 2017.
Disks that support ReadWriteOnce will only be able to connect to a single pod and will not be able to share among multiple pods at the same time. However, disks that support ReadOnlyMany will be able to share among multiple pods at the same time in read-only mode. In contrast, as the name implies disks with ReadWriteMany support can be connected to multiple pods for sharing data in read and write mode. Kubernetes provides for supporting storage services available on public cloud platforms such as AWS EBS, GCE Persistent Disk, Azure File, Azure Disk and many other well-known storage systems such as NFS, Glusterfs, iSCSI, Cinder, etc.
A node monitoring daemon such as to be run on every node for monitoring the container hosts.
In the above, stable refers to preserving the network identifiers and persistent storage across pod rescheduling. Unique network identifiers are provided by using headless services as shown in the above figure. Kubernetes has provided examples of StatefulSets for deploying , and in a distributed manner.
In addition to ReplicaSets and StatefulSets Kubernetes provides two additional controllers for running workloads in the background called and . The difference between Jobs and CronJobs is that Jobs execute once and terminates whereas CronJobs get executed periodically by a given time interval similar to standard Linux cron jobs.
Deploying databases on container platforms for production usage would be a slightly difficult task than deploying applications due to their requirements for clustering, point to point connections, replication, shading, managing backups, etc. As mentioned previously StatefulSets has been designed specifically for supporting such complex requirements and there are a couple of options for running , and clusters on Kubernetes today. YouTube’s database clustering system which is now a CNCF project would be a great option for running MySQL at scale on Kubernetes with shading. By saying that it would be better to note that those options are still at very early stages and if an existing production-grade database system is available on the given infrastructure such as RDS on AWS, Cloud SQL on GCP, or on-premise database cluster it might be better to choose one of those options considering the installation complexity and maintenance overhead.
As shown in the above figure this functionality can be extended by adding another resource called against a deployment for dynamically scaling the pods based on their actual resource usage. The HPA will monitor the resource usage of each pod via the resource metrics API and inform the deployment to change the replica count of the ReplicaSet accordingly. Kubernetes uses an upscale delay and a downscale delay for avoiding thrashing which could occur due to frequent resource usage fluctuations in some situations. Currently, HPA only provides support for scaling based on CPU usage. If needed custom metrics can also be plugged in via the depending on the nature of the application.
A wide range of stable Helm charts for well-known software applications can be found in it’s and also in the central Helm server: .
[1] What is Kubernetes:
[2] Borg, Omega and Kubernetes:
[3] Kubernetes Components:
[4] Kubernetes Services:
[5] IPVS (IP Virtual Server)
[6] Introduction of IPVS Proxy Mode:
[7] Kubernetes Persistent Volumes:
[8] Kubernetes Configuration Best Practices:
[9] Customer Resources & Custom Controllers:
[10] Understanding Vitess:
[11] Skaffold, CI/CD for Kubernetes:
[12] Kaniko, Build Container Images in Kubernetes:
[13] Apache Spark 2.3 with Native Kubernetes Support
[14] Deploying Apache Kafka using StatefulSets:
[15] Deploying Apache Zookeeper using StatefulSets:
For access to the Compute Engine API, it has to be enabled at the Google APIs console.
The user for the Google Service Account that has to be created has to have three roles:
Compute Admin: roles/compute.admin
Service Account User: roles/iam.serviceAccountUser
Viewer: roles/viewer
If the gcloud
CLI is installed, a service account can be created like follow:
A Google Service Account for the platform has to be created, see Creating and managing service accounts. The result is a JSON file containing the fields
type
project_id
private_key_id
private_key
client_email
client_id
auth_uri
token_uri
auth_provider_x509_cert_url
client_x509_cert_url
The private key is BASE64 containing the newlines as non-escaped strings "\n”. So to avoid the resulting troubles the machine controller expects the whole service account encoded in BASE64.
The base64 encoded secret of the service account will be passed in the field serviceAccount
of the cloudProviderSpec
of the machine deployment. The encoded secret can be entered in the UI field Service Account
:
Details coming soon...
An overview of the prerequisites to setup DIGIT and some of the key capabilities to understand before provisioning the infra and deploy DIGIT.
DIGIT is the largest urban governance platform built for billions-and-billions of transactions between citizens and the state govt through various municipal services/integration. The platform is built with key capabilities like scale, speed, integration, configurable, customizable, extendable, multi-tenanted, security, etc. Here, we shall discuss the key requirements and capabilities.
Before proceeding to set up DIGIT, it is essential to know some of the key technical details about DIGIT, like architecture, tech stack, how it is packaged and deployed on various infrastructure. Some of these details are explained in the previous sections. Below are some of the key capabilities to know about DIGIT as a platform.
DIGIT is a collection of various services built as a RESTFul APIs with OpenAPI standard
DIGIT is built as a MSA (Microservices Architecture)
DIGIT Services are Packaged as containers and Deployed as Docker Images.
DIGIT is deployed on Kubernetes which abstracts any Cloud/Infra suitable and standardised for DIGIT deployment.
DIGIT Deployment configuration, customization is done through Helm Charts.
Kubernetes cluster setup is done through code like terraform/ansible suitably.
The OpenAPI Specification (OAS) defines a standard, programming language-agnostic interface description for REST APIs, which allows both humans and computers to discover and understand the capabilities of a service without requiring access to source code, additional documentation, or inspection of network traffic. When properly defined via OpenAPI, a consumer can understand and interact with the remote service with a minimal amount of implementation logic. Similar to what interface descriptions have done for lower-level programming, the OpenAPI Specification removes guesswork in calling a service.
Microservices are nothing but breaking big beast into smaller units that can independently be developed, enhanced and scaled as a categorized and layered stack that gives better control over each component of an application that exists in its own container, independently managed and updated. This means that developers can build applications from multiple components and program each component in the language best suited to its function, rather than having to choose a single less-than-ideal language to use for everything. Optimizing software all the way down to the components of the application helps you increase the quality of your products. No time and resources are wasted managing the effects updating one application have on another.
Comparatively the best infra choice for running a microservices application architecture is application containers. Containers encapsulate a lightweight runtime environment for the application, presenting a consistent environment that can follow the application from the developer's desktop to testing to final production deployment, and you can run containers on cloud infra with physical or virtual machines.
As most modern software developers can attest, containers have provided us with dramatically more flexibility for running cloud-native applications on physical and virtual infrastructure. Kubernetes allows you to deploy cloud-native applications anywhere and manage them exactly as you like everywhere. For more details refer the above link that explains various advantages of kubernetes.
For being successful in the DIGIT Setup, here are certain requirements that need to be ascertained:
On-premise/private cloud accounts
Interface to access and provision required infra
In the case of SDC, NIC or private DC, it'll be VPN to an allocated VLAN
SSH access to the VMs/machines
Infra Skills
Public cloud
Managed Kubernetes service like AKS or EKS or GKE on Azure, AWS and GCP respectively
Private Clouds (SDC, NIC)
Clouds like VMware, OpenStack, Nutanix and more, may or may not have Kubernetes as a managed service. If yes we may have to estimate only the worker nodes depending on the number of ULBs and DIGIT's municipal services that you opt.
In the absence of the above, you have to provision Kubernetes cluster from the plain VMs as per the general Kubernetes setup instruction and add worker nodes.
Operations Skills
Understanding of Linux, containers, VM Instances, Load Balancers, Security Groups/Firewalls, nginx, DB Instance, Data Volumes
Experience of Kubernetes, Docker, Jenkins, Helm, Infra-as-code, Terraform
Experience on DevOps/SRE practice on a Microservices and modern infrastructure
ZooKeeper
Kafka
Elastic Search
Setting up the Postgres DB
On a public cloud, provision a Postgres RDS like instance.
Private cloud, provision a Postgres DB on a VM with the backup, HA/DRS
K8s Secrets
K8s ConfigMaps
Environment variables of each microservices
Deploy the stable released version of DIGIT and Required services
Setting up Jenkins Job to build, bake images and deploy the components for the rolling updates
This section discusses the supported cloud environment for DIGIT services. It provides information on where and how DIGIT is deployed. Further, it offers guidelines on estimating the infrastructural requirements for cloud support.
Supported Cloud List
The Kubernetes vSphere driver contains bugs related to detaching volumes from offline nodes. See the section for more details.
When creating worker nodes for a user cluster, the user can specify an existing image. Defaults may be set in the .
Supported operating systems
Ubuntu 18.04
CoreOS
CentOS 7
Go into the VSphere WebUI, select your data centre, right-click onto it and choose “Deploy OVF Template”
Fill in the “URL” field with the appropriate URL
Click through the dialogue until “Select storage”
Select the same storage you want to use for your machines
Select the same network you want to use for your machines
Leave everything in the “Customize Template” and “Ready to complete” dialogue as it is
Wait until the VM got fully imported and the “Snapshots” => “Create Snapshot” button is not greyed out anymore.
The template VM must have the disk.enable UUID flag set to 1, this can be done using the with the following command:
Convert it to vmdk: qemu-img convert -f qcow2 -O vmdk CentOS-7-x86_64-GenericCloud.qcow2 CentOS-7-x86_64-GenericCloud.vmdk
Upload it to a Datastore of your vSphere installation
Create a new virtual machine that uses the uploaded vmdk as rootdisk.
Modifications like Network, disk size, etc. must be done in the ova template before creating a worker node from it. If user clusters have dedicated networks, all user clusters, therefore, need a custom template.
Kubernetes needs to talk to the vSphere to enable Storage inside the cluster. For this, kubernetes needs a config called cloud-config
. This config contains all details to connect to a vCenter installation, including credentials.
As this Config must also be deployed onto each worker node of a user cluster, its recommended to have individual credentials for each user cluster.
The VSphere user must have the following permissions on the correct resources
Role k8c-storage-vmfolder-propagate
Granted at VM Folder and Template Folder, propagated
Permissions
Virtual machine
Change Configuration
Add existing disk
Add new disk
Add or remove the device
Remove disk
Folder
Create folder
Delete folder
Role k8c-storage-datastore-propagate
Granted at Datastore, propagated
Permissions
Datastore
Allocate space
Low-level file operations
Role Read-only
(predefined)
Granted at …, not propagated
Datacenter
Role k8c-user-vcenter
Granted at vcentre level, not propagated
Needed to customize VM during provisioning
Permissions
VirtualMachine
Provisioning
Modify customization specification
Read customization specifications
Role k8c-user-datacenter
Granted at datacentre level, not propagated
Needed for cloning the template VM (obviously this is not done in a folder at this time)
Permissions
Datastore
Allocate space
Browse datastore
Low-level file operations
Remove file
vApp
vApp application configuration
vApp instance configuration
Virtual Machine
Change CPU count
Memory
Settings
Inventory
Create from existing
Role k8c-user-cluster-propagate
Granted at the cluster level, propagated
Needed for upload of cloud-init.iso
(Ubuntu and CentOS) or defining the Ignition config into Guestinfo (CoreOS)
Permissions
Host
Configuration
System Management
Local operations
Reconfigure virtual machine
Resource
Assign virtual machine to the resource pool
Migrate powered off the virtual machine
Migrate powered-on virtual machine
vApp
vApp application configuration
vApp instance configuration
Role k8s-network-attach
Granted for each network that should be used
Permissions
Network
Assign network
Role k8c-user-datastore-propagate
Granted at datastore/datastore cluster level, propagated
Permissions
Datastore
Allocate space
Browse datastore
Low-level file operations
Role k8c-user-folder-propagate
Granted at VM Folder and Template Folder level, propagated
Needed for managing the node VMs
Permissions
Folder
Create folder
Delete folder
Global
Set custom attribute
Virtual machine
Change Configuration
Edit Inventory
Guest operations
Interaction
Provisioning
Snapshot management
The described permissions have been tested with vSphere 6.7 and might be different for other vSphere versions.
After a node is powered-off, the Kubernetes vSphere driver doesn’t detach disks associated with PVCs mounted on that node. This makes it impossible to reschedule pods using these PVCs until the disks are manually detached in vCenter.
Upstream Kubernetes has been working on the issue for a long time now and tracking it under the following tickets:
Kubernetes, the popular container orchestration system, is used extensively. However, it can become complex: you have to handle all of the objects (ConfigMaps, pods, etc.), and would also have to manage the releases. Both can be accomplished with . It is a Kubernetes package manager designed to easily package, configure, and deploy applications and services onto Kubernetes clusters in a standard way, this helps the ecosystem to adopt the standard way of deployment and customization.
in any of the
or
(SDC) or
(NIC)
Setting up the to attach to DIGIT backbone like
Preparing Deployment configuration for required DIGIT services using Templates from the like the following
Preparing DIGIT Service to deploy on Kubernetes cluster
Setup , ,
During the creation of a user cluster Kubermatic creates a dedicated VM folder in the root path on the Datastore (Defined in the ). That folder will contain all worker nodes of a user cluster.
DevOps
Scheduled Job Handling
API Gateway
Container Management
Resource and Storage Handling
Fault Tolerance
Load Balancing
Distributed Metrics
Application Runtime and Packaging
App Deployment
Configuration Management
Service Discovery
CI / CD
Virtualization
Hardware & Storage
OS & Networking
SSL Configuration
Infra-as-code
Dockers
DNS Configuration
GitOps
SecOps
State Data Centres with On-Premise Kubernetes Clusters
Running Kubernetes on-premise gives a cloud-native experience or SDC becomes cloud-agnostic when it comes to the experience of Deploying DIGIT.
Whether States have their own on-premise data centre, have decided to forego the various managed cloud solutions, there are few things one should know when getting started with on-premise K8s.
One should be familiar with Kubernetes and one should know that the control plane consists of the Kube-apiserver, Kube-scheduler, Kube-controller-manager and an ETCD datastore. For managed cloud solutions like Google’s Kubernetes Engine (GKE) or Azure’s Kubernetes Service (AKS) it also includes the cloud-controller-manager. This is the component that connects the cluster to the external cloud services to provide networking, storage, authentication, and other feature support.
To successfully deploy a bespoke Kubernetes cluster and achieve a cloud-like experience on SDC, one need to replicate all the same features you get with a managed solution. At a high-level this means that we probably want to:
Automate the deployment process
Choose a networking solution
Choose a storage solution
Handle security and authentication
Let us look at each of these challenges individually, and we’ll try to provide enough of an overview to aid you in getting started.
Using a tool like an ansible can make deploying Kubernetes clusters on-premise trivial.
When deciding to manage your own Kubernetes clusters, we need to set up a few proof-of-concept (PoC) clusters to learn how everything works, perform performance and conformance tests, and try out different configuration options.
After this phase, automating the deployment process is an important if not necessary step to ensure consistency across any clusters you build. For this, you have a few options, but the most popular are:
****kubeadm: a low-level tool that helps you bootstrap a minimum viable Kubernetes cluster that conforms to best practices
kubespray: an ansible playbook that helps deploy production- ready clusters
If you already using ansible, kubespray is a great option otherwise we recommend writing automation around kubeadm using your preferred playbook tool after using it a few times. This will also increase your confidence and knowledge in the tooling surrounding Kubernetes.
When designing clusters, choosing the right container networking interface (CNI) plugin can be the hardest part. This is because choosing a CNI that will work well with an existing network topology can be tough. Do you need BGP peering capabilities? Do you want an overlay network using vxlan? How close to bare-metal performance are you trying to get?
There are a lot of articles that compare the various CNI provider solutions (calico, weave, flannel, kube-router, etc.) that are must-reads like the benchmark results of Kubernetes network plugins article. We usually recommend Project Calico for its maturity, continued support, and large feature set or flannel for its simplicity.
For ingress traffic, you’ll need to pick a load-balancer solution. For a simple configuration, you can use MetalLB, but if you’re lucky enough to have F5 hardware load-balancers available we recommend checking out the K8s F5 BIG-IP Controller. The controller supports connecting your network plugin to the F5 either through either vxlan or BGP peering. This gives the controller full visibility into pod health and provides the best performance.
Kubernetes provides a number of included storage volume plugins. If you’re going on-premise you’ll probably want to use network-attached storage (NAS) option to avoid forcing pods to be pinned to specific nodes.
For a cloud-like experience, you’ll need to add a plugin to dynamically create persistent volume objects that match the user’s persistent volume claims. You can use dynamic provisioning to reclaim these volume objects after a resource has been deleted.
Pure Storage has a great example helm chart, the Pure Service Orchestrator (PSO), that provides smart provisioning although it only works for Pure Storage products.
As anyone familiar with security knows, this is a rabbit-hole. You can always make your infrastructure more secure and should be investing in continual improvements.
Including different Kubernetes plugins can help build a secure, cloud-like experience for your users
When designing on-premise clusters you’ll have to decide where to draw the line. To really harden your cluster’s security you can add plugins like:
istio: provides the underlying secure communication channel, and manages authentication, authorization, and encryption of service communication at scale
gVisor: is a user-space kernel, written in Go, that implements a substantial portion of the Linux system surface
vault: secure, store and tightly control access to tokens, passwords, certificates, encryption keys for protecting secrets and other sensitive data
For user authentication, we recommend checking out guard which will integrate with an existing authentication provider. If you’re already using Github teams to then this could be a no-brainer.
Hope this has given you a good idea of deploying, networking, storage, and security for you to take the leap into deploying your own on-premise Kubernetes clusters. Like we mentioned above, the team will want to build proof-of-concept clusters, run conformance and performance tests, and really become experts on Kubernetes if you’re going to be using it to run DIGIT on production.
We’ll leave you with a few other things the team should be thinking of:
Externally backing up Kubernetes YAML, namespaces, and configuration files
Running applications across clusters in an active-active configuration to allow for zero-downtime updates
Running game days like deleting the CNI to measure and improve time-to-recovery
National Informatica Cloud
Details coming soon...
In Kubernetes, an Ingress is an object that allows access to your Kubernetes services from outside the Kubernetes cluster. You configure access by creating a collection of rules that define which inbound connections reach which services.
This lets you consolidate your routing rules into a single resource. For example, you might want to send requests to example.com/api/v1/ to an api-v1 service, and requests to example.com/api/v2/ to the api-v2 service. With an Ingress, you can easily set this up without creating a bunch of LoadBalancers or exposing each service on the Node.
An API object that manages external access to the services in a cluster, typically HTTP.
Ingress may provide load balancing, SSL termination and name-based virtual hosting.
For clarity, this guide defines the following terms:
Node: A worker machine in Kubernetes, part of a cluster.
Cluster: A set of Nodes that run containerized applications managed by Kubernetes. For this example, and in most common Kubernetes deployments, nodes in the cluster are not part of the public internet.
Edge router: A router that enforces the firewall policy for your cluster. This could be a gateway managed by a cloud provider or a physical piece of hardware.
Ideally, all Ingress controllers should fit the reference specification. In reality, the various Ingress controllers operate slightly differently.
An Ingress resource example:
Each HTTP rule contains the following information:
A list of paths (for example, /testpath), each of which has an associated backend defined with a service.name and a service.port.name or service.port.number. Both the host and path must match the content of an incoming request before the load balancer directs traffic to the referenced Service.
A defaultBackend is often configured in an Ingress controller to service any requests that do not match a path in the spec.
This section contains architectural details about DIGIT deployment. It discusses the various activities in a sequence of steps to provision required infra and deploy DIGIT.
Every code commit is well-reviewed and squash merge to branches through Pull Requests.
Trigger the CI Pipeline that ensures code quality, vulnerability assessments, CI tests before building the artefacts.
Artefact is version controlled based on Semantic versioning based on the nature of the change.
After successful CI, Jenkins bakes the Docker Images with the versioned artefacts and pushes the baked docker image to Docker Registry.
Deployment Pipeline pulls the built Image and pushes to the corresponding Env.
DIGIT has built helm charts to using the standard helm approach to ease managing the service-specific configs, customisations, switch/toggle, secrets, etc.
Golang base Deployment script that reads the values from the helm charts template and deploys into the cluster.
Each env will have one master yaml template that will have the definition of all the services to be deployed, their dependencies like Config, Env, Secrets, DB Credentials, Persistent Volumes, Manifest, Routing Rules, etc..
Kubernetes has changed the way organizations deploy and run their applications, and it has created a significant shift in mindsets. While it has already gained a lot of popularity and more and more organizations are embracing the change, running Kubernetes in production requires care.
Although Kubernetes is open source and does it have its share of vulnerabilities, making the right architectural decision can prevent a disaster from happening.
You need to have a deep level of understanding of how Kubernetes works and how to enforce the best practices so that you can run a secure, highly available, production-ready Kubernetes cluster.
Although Kubernetes is a robust container orchestration platform, the sheer level of complexity with multiple moving parts overwhelms all administrators.
That is the reason why Kubernetes has a large attack surface, and, therefore, hardening of the cluster is an absolute must if you are to run Kubernetes in production.
There are a massive number of configurations in K8s, and while you can configure a few things correctly, the chances are that you might misconfigure a few things.
I will describe a few best practices that you can adopt if you are running Kubernetes in production. Let’s find out.
If you are running your Kubernetes cluster in the cloud, consider using a managed Kubernetes cluster such as or .
A managed cluster comes with some level of hardening already in place, and, therefore, there are fewer chances to misconfigure things. A managed cluster also makes upgrades easy, and sometimes automatic. It helps you manage your cluster with ease and provides monitoring and alerting out of the box.
Since Kubernetes is open source, vulnerabilities appear quickly and security patches are released regularly. You need to ensure that your cluster is up to date with the latest security patches and for that, add an upgrade schedule in your standard operating procedure.
Having a CI/CD pipeline that runs periodically for executing rolling updates for your cluster is a plus. You would not need to check for upgrades manually, and rolling updates would cause minimal disruption and downtime; also, there would be fewer chances to make mistakes.
That would make upgrades less of a pain. If you are using a managed Kubernetes cluster, your cloud provider can cover this aspect for you.
It goes without saying that you should patch and harden the operating system of your Kubernetes nodes. This would ensure that an attacker would have the least attack surface possible.
You should upgrade your OS regularly and ensure that it is up to date.
Kubernetes post version 1.6 has role-based access control (RBAC) enabled by default. Ensure that your cluster has this enabled.
You also need to ensure that legacy attribute-based access control (ABAC) is disabled. Enforcing RBAC gives you several advantages as you can now control who can access your cluster and ensure that the right people have the right set of permissions.
RBAC does not end with securing access to the cluster by Kubectl clients but also by pods running within the cluster, nodes, proxies, scheduler, and volume plugins.
Only provide the required access to service accounts and ensure that the API server authenticates and authorizes them every time they make a request.
Running your API server on plain HTTP in production is a terrible idea. It opens your cluster to a man in the middle attack and would open up multiple security holes.
Always use transport layer security (TLS) to ensure that communication between Kubectl clients and the API server is secure and encrypted.
Be aware of any non-TLS ports you expose for managing your cluster. Also ensure that internal clients such as pods running within the cluster, nodes, proxies, scheduler, and volume plugins use TLS to interact with the API server.
While it might be tempting to create all resources within your default namespace, it would give you tons of advantages if you use namespaces. Not only will it be able to segregate your resources in logical groups but it will also enable you to define security boundaries to resources in namespaces.
Namespaces logically behave as a separate cluster within Kubernetes. You might want to create namespaces based on teams, or based on the type of resources, projects, or customers depending on your use case.
After that, you can do clever stuff like defining resource quotas, limit ranges, user permissions, and RBAC on the namespace layer.
Avoid binding ClusterRoles to users and service accounts, instead provide them namespace roles so that users have access only to their namespace and do not unintentionally misconfigure someone else’s resources.
Cluster Role and Namespace Role Bindings
You can use Kubernetes network policies that work as firewalls within your cluster. That would ensure that an attacker who gained access to a pod (especially the ones exposed externally) would not be able to access other pods from it.
You can create Ingress and Egress rules to allow traffic from the desired source to the desired target and deny everything else.
Kubernetes Network Policy
Do not share this file within your team, instead, create a separate user account for every user and only provide the right accesses to them. Bear in mind that Kubernetes does not maintain an internal user directory, and therefore, you need to ensure that you have the right solution in place to create and manage your users.
Once you create the user, you can generate a private key and a certificate signing request for the user, and Kubernetes would sign and generate a CA cert for the user.
You can then securely share the CA certificate with the user. The user can then use the certificate within kubectl to authenticate with the API server securely.
Configuring User Accounts
You can provide granular access to user and service accounts with RBAC. Let us consider a typical organization where you can have multiple roles, such as:
Application developers — These need access only to a namespace and not the entire cluster. Ensure that you provide them with access only to deploy their applications and troubleshoot them only within their namespace. You might want application developers with access to spin only ClusterIP services and might wish to grant permissions only to network administrators to define ingresses for them.
Network administrators — You can provide network admins access to networking features such as ingresses, and privileges to spin up external services.
Cluster administrators — These are sysadmins whose main job is to administer the entire cluster. These are the only people that should have cluster-admin access and only the amount that is necessary for them to do their roles.
The above is not etched in stone, and you can have a different organization policy and different roles, but the only thing to keep in mind here is that you need to enforce the principle of least privilege.
That means that individuals and teams should have only the right amount of access they need to perform their job, nothing less and nothing more.
It does not stop with just issuing separate user accounts and using TLS to authenticate with the API server. It is an absolute must that you frequently rotate and issue credentials to your users.
Set up an automated system that periodically revokes the old TLS certificates and issues new ones to your user. That helps as you don’t want attackers to get hold of a TLS cert or a token and then make use of it indefinitely.
Imagine a scenario where an externally exposed web application is compromised, and someone has gained access to the pod. In that scenario, they would be able to access the secrets (such as private keys) and target the entire system.
The way to protect from this kind of attack is to have a sidecar container that stores the private key and responds to signing requests from the main container.
In case someone gets access to your login microservice, they would not be able to gain access to your private key, and therefore, it would not be a straightforward attack, giving you valuable time to protect yourself.
Partitioned Approach
The last thing you would want as a cluster-admin is a situation where a poorly written microservice code that has a memory leak can take over a cluster node causing the Kubernetes cluster to crash. That is an extremely important and generally ignored area.
You can add a resource limit and requests on the pod level as a developer or the namespace as an administrator. You can use resource quotas to limit the amount of CPU, memory, or persistent disk a namespace can allocate.
It can also allow you to limit the number of pods, volumes, or services you can spin within a namespace. You can also make use of limit ranges that provide you with a minimum and maximum size of resources every unit of the cluster within the namespace can request.
That will limit users from seeking an unusually large amount of resources such as memory and CPU.
Specifying a default resource limit and request on a namespace level is generally a good idea as developers aren’t perfect. If they forget to specify a limit, then the default limit and requests would protect you from resource overrun.
The ETCD datastore is the primary source of data for your Kubernetes cluster. That is where all cluster information and the expected configuration is stored.
If someone gains access to your ETCD database, all security measures will go down the drain. They will have full control of your cluster, and they can do what they want by modifying state in your ETCD datastore.
You should always ensure that only the API server can communicate with the ETCD datastore and only through TLS using a secure mutual auth. You can put your ETCD nodes behind a firewall and block all traffic except the ones originating from the API server.
Do not use the master ETCD for any other purpose but for managing your Kubernetes cluster and do not provide any other component access to the ETCD cluster.
Enable encryption of your secret data at rest. That is extremely important so that if someone gets access to your ETCD cluster, they should not be able to view your secrets by just doing a hex dump of your secrets.
Containers run on nodes and therefore have some level of access to the host file system, however, the best way to reduce the attack surface is to architect your application in such a way that containers do not need to run as root.
Use pod security policies to restrict the pod to access HostPath volumes as that might result in getting access to the host filesystem. Administrators can use a restrictive pod policy so that anyone who gained access to one pod should not be able to access another pod from there.
Audit loggers are now a beta feature in Kubernetes, and I recommend you make use of it. That would help you troubleshoot and investigate what happened in case of an attack.
As a cluster-admin dealing with a security incident, the last thing you would want is that you are unaware of what exactly happened with your cluster and who has done what.
Thank you for reading. I hope you enjoyed the story.
Remember that the above are just some general best practices and they are not exhaustive. You are free to adjust and make changes based on your use case and ways of working for your team.
Cluster network: A set of links, logical or physical, that facilitate communication within a cluster according to the Kubernetes .
Service: A Kubernetes that identifies a set of Pods using selectors. Unless mentioned otherwise, Services are assumed to have virtual IPs only routable within the cluster network.
exposes HTTP and HTTPS routes from outside the cluster to within the cluster. Traffic routing is controlled by rules defined on the Ingress resource.
An Ingress may be configured to give Services externally-reachable URLs, load balance traffic, terminate SSL / TLS, and offer name based virtual hosting. An is responsible for fulfilling the Ingress, usually with a load balancer, though it may also configure your edge router or additional frontends to help handle the traffic.
An Ingress does not expose arbitrary ports or protocols. Exposing services other than HTTP and HTTPS to the internet typically uses a service of type or .
You must have an to satisfy an Ingress. Only creating an Ingress resource has no effect.
You may need to deploy an Ingress controller such as . You can choose from a number of .
As with all other Kubernetes resources, an Ingress needs apiVersion, kind, and metadata fields. The name of an Ingress object must be a valid . For general information about working with config files, see , , . Ingress frequently uses annotations to configure some options depending on the Ingress controller, an example of which is the . Different support different annotations. Review the documentation for your choice of Ingress controller to learn which annotations are supported.
The Ingress has all the information needed to configure a load balancer or proxy server. Most importantly, it contains a list of rules matched against all incoming requests. Ingress resource only supports rules for directing HTTP(S) traffic.
An optional host. In this example, no host is specified, so the rule applies to all inbound HTTP traffic through the IP address specified. If a host is provided (for example, ), the rules apply to that host.
A backend is a combination of Service and port names as described in the or a by way of a . HTTP (and HTTPS) requests to the Ingress that matches the host and path of the rule are sent to the listed backend.
Learn about the
Learn about
As all the DIGIT services that are containerized and deployed on Kubernetes, we need to prepare deployment manifests. The same can be found .
By default, when you boot your cluster through , you get access to the kubernetes-admin
config file which is the superuser for performing all activities within your cluster.
A bootstrap token, for example, needs to be revoked as soon as you finish with your activity. You can also make use of a credential management system such as which can issue you with credentials when you need them and revoke them when you finish with your work.
This page provides information on how to deploy DIGIT services on Kubernetes, prepare deployment manifests for various services along with its configurations, secrets. etc. It also discusses the maintenance of environment-specific changes.
Once the cluster is ready and healthy you can start deploying backbones services.
Deploy configuration and deployment in the following Services Lists
Backbone (Redis, ZooKeeper-v2, Kafka-v2,elasticsearch-data-v1, elasticsearch-client-v1, elasticsearch-master-v1)
Gateway (Zuul, nginx-ingress-controller)
Understanding of VM Instances, LoadBalancers, SecurityGroups/Firewalls, nginx, DB Instance, Data Volumes.
Experience of Kubernetes, Docker, Jenkins, helm, golang, Infra-as-code.
Deploy configuration and deployment backbone services:
Clone the git repo https://github.com/egovernments/eGov-infraOps . Copy existing dev.yaml and dev-secrets.yaml with new environment name (eg..yaml and-secrets.yaml)
Modify the global domain and set namespaces create to true
Modify the below-mentioned changes for each backbone services:
Eg. For Kafka-v2 If you are using AWS as cloud provider, change the respective volume id’s and zone’s
(You will get the volume id’s and zone details from either remote state bucket or from AWS portal)
Eg. Kafka-v2 If you are using Azure cloud provider, change the diskName and diskUri
(You will get the volume id’s and zone details from either remote state bucket or from Azure portal)
Eg. Kafka-v2 If you are using ISCSI , change the targetPortal and iqn.
Deploy the backbone services using go command
Modify the “dev” environment name with your respective environment name.
Flags:
e --- Environment name
p --- Print the manifest
c --- Enable Cluster Configs
Check the Status of pods