Glossary

k8s : Kubernetes replacing 8 letters ubernete by the number of letters
Rancher : orchestrator for orchestrator (for kubernetes, swarm, ...)
GKE AKS EKS : cloud public Google Azure AWS
CRI-O : alternative container runtime for kubernetes http://cri-o.io/

minikube & kubectl CLI

Install via chocolatey (windows) or apt or yum... then "initialize" minikube

minikube start --vm-driver=virtualbox
minikube stop

Add some vCPU to the minikube VM, from 2 to 6, via VirtualBox GUI or CLI

Get confy

sudo apt install -y zsh
minikube completion zsh >> .zshrc
kubectl completion zsh >> .zshrc

Get connected to the WebUI

minikube start
minikube dashboard # firefox http://$(minikube ip):30000

Get in the minikube VM via SSH, and check docker stuff

minikube ssh
docker ps
docker run --name whoami -d -p 80:8080 google/cadvisor
docker inspect whoami | jq '.[].NetworkSettings.Networks.bridge.IPAddress'

Last chance command, one can sort of tunnel the dashboard with kubectl proxy

Global command-line options kubectl options

Enable metrics so grafana isn't empty, then open grafana. Note, there are multiples options to filter output for a specific value, -o json | jq seems to be decent

minikube addons enable heapster
kubens kube-system
firefox http://$(minikube ip):$(kubectl get svc monitoring-grafana -o json | jq '.spec.ports[0].nodePort')

Cluster infos

kubectl config get-clusters
kubectl cluster-info

Other useful kubectl commands

kubectl create / apply # apply's a better idea
kubectl convert # update config file between k8s versions format
kubectl delete
kubectl get # -o wide to add columns IP & node name
kubectl describe # equivalent of docker inspect, ex kubectl describe pods --all-namespaces
kubectl exec
kubectl top
kubectl cordon / uncordon # block scheduling on a given node
kubectl drain # empty a node progressively

(Linux Kernel) Namespaces

The "Namespaces" used by (docker) containers are the kernel namespace. See man namespaces.

linux namespaces

Abusing linux kernel namespace : the following command is about specifying the same IPC (inter-process communication) namespace, network namespace, PID (processes IDs) namespace.

docker run -d --name whoami emilevauge/whoami
docker run -d --name whoami-sidekick --net=container:whoami --ipc=container:whoami --pid=container:whoami centos:7 sleep infinity
docker exec -ti shell-sidekick bash
[root@6546fa6203b7 /]# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 13:18 ?        00:00:04 /usr/bin/cadvisor -logtostderr
root        21     0  0 13:34 ?        00:00:00 sleep infinity
root        43     0  0 13:37 pts/0    00:00:00 bash
root        57    43  0 13:37 pts/0    00:00:00 ps -ef

The ps -ef command display processes from both container (--pid effect). Sharing the same network namespace (--net) also allow one to access tcp port, running traceroute 'whoami-ip' from sidekick would go localhost.

The above command's options "break" the isolation enforced by docker by adding the sidekick container to the sidekicked container's namespace.

Pods

A pod is a group of container sharing kernel namespace, by default the network namespace. Containers are always managed and localized in the same place. They can access the same volumes. They share the same IP, the same ports and can be reached via localhost.

Start a (faulty) pod and inspect / debug with describe

kubectl run faulty-whoami --image=emilevauge/whoami:nil --port 80 --generator=run-pod/v1
watch kubectl get pods faulty-whoami
kubectl describe pods faulty-whoami

kubectl expose : expose pod whoami via a nodePort Service
kubectl port-forward whoami 80 : proxify kubectl's machine port, a.k.a. forward local port to pod's port

In previous chapter we abused linux kernel namespace via docker CLI. Here's the same exercise via kubectl instead of minikube ssh / docker exec.

kubectl run whoami --image=emilevauge/whoami --port 80 --generator=run-pod/v1
kubectl describe pods whoami | grep IP
# answer IP: 172.17.0.8
kubectl run centos-shell --image=centos:7 --generator=run-pod/v1 --command -- sleep infinity
kubectl exec -ti centos-shell curl 172.17.0.8/api

One pod, one container ? Or more ?

Usually, one pod = one container. Otherwise you might have issue like, say you've nginx + tomcat in a pod, scaling nginx scale tomcat and vice versa. Multiple containers in a pod is about sidecar/sidekick container : service routing, metronomy, configuration hot reload on event trigger. Or service mesh.

Yaml (json) pod definition

Here's generating a definition from the run CLI. Note, -o json is another possibility

kubectl run yaml-pod --image=emilevauge/whoami --port=80 --generator=run-pod/v1 --dry-run -o yaml > pod_definition.yaml
kubectl create -f pod_definition.yml
kubectl create -f . # Get every yml in the current dir

Here's an example of pod definition with two containers : the app (whoami) and a sidecar (shell).

---
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: whoami
  name: whoami
spec:
  containers:
  - image: emilevauge/whoami
    imagePullPolicy: IfNotPresent
    name: whoami
    ports:
    - containerPort: 80
    resources: {}
  - args:
    - sleep
    - infinity
    image: centos:7
    imagePullPolicy: Always
    name: shell
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

Let's say you wanna change imagePullPolicy, get the doc and allowed values from kubectl explain pods.spec.containers.imagePullPolicy

One can define multiple elements (pods, deployments, services...) in a yaml file using --- separator

SSH in a pod's container

Here's an example about "login" in one of the pod's containers

➜  ~ kubectl exec -it whoami -c shell -- bash
[root@whoami /]# curl localhost/api
{"hostname":"whoami","ip":["127.0.0.1","172.17.0.6"],"headers":{"Accept":["*/*"],"User-Agent":["curl/7.29.0"]}}

kubectl create or apply ?

It might be a better idea to use kubectl apply from step 0 instead of kubectl create, create doesn't fill the kubectl.kubernetes.io/last-applied-configuration annotation, so updating this resource might break things. If the annotation is here, apply will merge the CLI given configuration with the one from the annotation.

But. Working with many peoples, merge after merge after merge, you don't know what happened exactly. So using create and replace with a versionned pod.yaml might be the way to go.

Init container

They'll start sequentialy before the pod's other containers, with a limited duration and no restart policy. They shares volumes with the other containers. You'll use them to ship tools are meant to initialize stuff, for example generate configuration via confd, or initialize database before letting the app run.

HealthChecks

livenessProbe

Checked by kubelet from a node, doing httpGet (GET return code = 200 < 400 OK), tcpSocket (tcp port open = OK) or exec (binary on container return code 0 = OK)

---
apiVersion: v1
kind: Pod
metadata:
  name: kubia-liveness
  annotations:
    i-will-crash: after-5-calls-to-slash
spec:
  containers:
    - image: luksa/kubia-unhealthy
      name: kubia
      livenessProbe:
        httpGet:
          path: /
          port: 8080
        periodSeconds: 5

Warning : one can destroy a cluster with bad healthcheck. Use different ports, set a proper delay, timeout etc. And monitor pod's RESTART counter.

Otherwise k8s only knows about container health, a.k.a running or not running, a.k.a. cmd process is ok. Which won't be ok if one uses supervisord or alike.

Be careful to use health probe with a small CPU/RAM usage, specificaly targeted at the pod functionnality. Example, a working microservice doesn't need to be restarted because its neighboor is dead.

readinessProbe

Readiness as in "are you ready to process traffic ?". Service's Endpoint won't serve the pod until readinessProbe is OK. But ReplicaSet won't kill a container if a ReadinessProbe is KO.

Labels and annotations

Labels

Labels are short (63 characters) fields used to tag, search, filter etc resources, to organize stuff

organisation_with_labels

Display labels for a pod, all pods, all nodes

kubectl get pods whoami --show-labels
kubectl get pods --show-labels
kubectl get nodes --show-labels

Get pods matching specific labels : without label release, with label release, with label release different than production, equal to production, with label run not equal to whoami or shell

kubectl get pods -l '!release'
kubectl get pods -l 'release'
kubectl get pods -l 'release!=production'
kubectl get pods -l 'release=production'
kubectl get pods -l 'run notin (whoami, shell)'

Add labels to pods whoami + whoamijson

kubectl label pods whoami whoami-json release=stable env=int

Change a label value

kubectl label --overwrite pods whoami env=production

Add and remove labels to all pods

kubectl label pods --all release=stable env=int
kubectl label pods --all release- env-

Pod's nodeSelector

Use label to choose where the pod will end up. Add the following nodeSelector

spec:
  nodeSelector:
    container-runtime: docker

Run kubectl apply -f pod.yaml then kubectl get pods : it's pending. kubectl describe says Warning FailedScheduling 0/1 nodes are available: 1 node(s) didn't match node selector.

Add the matching label

kubectl label node minikube container-runtime=docker

Wait a bit, it'll end up deployed

Annotations

A bit like labels, but more characters allowed and can't be used to filter & so on. It's more about informational metadata.

(Kubernetes) Namespaces and Contexts

Kubernetes namespaces has nothing to do with Linux Kernel namespaces used by containers. That's just the same name. Yeah, discutable choice from k8s developers

Context is the server. Local minikube, Azure cluster, ... A context is usually created via another CLI, like minikube cli, AWS cli, Azure cli.

Get, create namespace. Configure kubectl context to use a namespace. Delete namespace.

kubectl get ns
kubectl create ns my-sandbox
kubectl config set-context (contextName) --namespace=(namespaceName)
kubectl delete ns my-sandbox

Deleting a namespace delete all resources in the namespace

Here's a file produced with create --dry-run -o yaml and a label manual addition

apiVersion: v1
kind: Namespace
metadata:
  creationTimestamp: null
  name: my-sandbox
  labels:
    run: sandbox
spec: {}
status: {}

Tips: use kubens and kubectx https://github.com/ahmetb/kubectx to switch easilly between namespaces instead of kubectl config. Here's an installation script

sudo git clone https://github.com/ahmetb/kubectx /opt/kubectx
sudo ln -s /opt/kubectx/kubectx /usr/local/bin/kubectx
sudo ln -s /opt/kubectx/kubens /usr/local/bin/kubens
kubens kube-system # switch to namespace kube-system

Replica

ReplicaSets

Using labels, a ReplicaSets or RS make sure enought pods matching a selector are running on a set of cluster.

---
apiVersion: apps/v1 # Use 'apps/v1' for k8s > v1.9, otherwise apps/v1beta2
kind: ReplicaSet
metadata:
  name: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
      env: int
  template:
    metadata:
      name: nginx
      labels:
        app: nginx
        env: int
    spec:
      containers:
        - name: nginx
          image: 'nginx:alpine'
          ports:
            - containerPort: 80

If no labels are defined on the RS, it uses pod's labels.

A RS will manage any pods matching the selector, even created before the RS. Except pods managed by other RS. So one can manually create a new pod matching multiples RS's selectors, the "fastest" RS catch it, most likely kill it... Yeah that's stupid.

The usual commands create apply delete get work. Plus the 'specific' scale

kubectl scale rs (rsname) --replicas=9

Removing RS removes the pods, except if using --cascade=false

kubectl delete rs frontend --cascade=false

RS's selector can use in, notin, exists, notexists.

ReplicaController

~ Deprecated. Do not use.

Roughtly the same as ReplicaSets, but. The selector can't use the new operation like "label in (bla,bli,blu)". The selector isn't mandatory. And ReplicaController can deal with rolling update, which is not meant for them to do (ReplicaSets leave that to Deployment)

DaemonSet

Like ReplicaSet, except it's running one and only one pod over each node, or node selection. The use case is something like running a log and/or metric collector. For example SDD metric collector for nodes with SSD drive.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: ssd-monitor
spec:
  selector:
    matchLabels:
      app: ssd-monitor
  template:
    metadata:
      labels:
        app: ssd-monitor
    spec:
      nodeSelector:
        disk: ssd
      containers:
      - name: main
    image: luksa/ssd-monitor

Job

It's about running a job ~ batch until it's finished. Job & Pods aren't deleted so one can get the logs & co, but need to be cleaned up manually

Amongst the options, completions is about running a job multiple time, parallelism for more pods running the job, ...

Warning, you must set RestartPolicy to OnFailure or Never, with the default Alawys, it's meaningless. And a job should be idempotent.

CronJob

Same as Job but on a regular basis.

Services

Uses selector over labels to route traffic between pods. No dot in the service name.

Cluster IP

By default, a service type is ClusterIP, a.k.a. a VirtualIP accessible from anyone in the cluster, and no one outside. As in "internal".

One can create a gateway pod like centos:7 sleep infinity to be able to curl the IP from inside the cluster.

Here's an ugly scheme. Note the RS isn't really in the connexion, its only role is to create its pods.

                                                                         -->[RS Pod1]
 _o_                                                                     |
  |   --kubectl exec-->[Pod "Gateway"]--curl 80-->[Service]--8080--(RS)--|
 / \                                                                     |
                                                                         -->[RS Pod2]

A port can be named in RS template, then the numeric value changed without editing the services or other using it.

Service Discover: there's a DNS in the soup. A service's IP can be resolved using fqdn : [service name].[namespace].svc.cluster.local. Or short name, aka [service name] A pod also get the services informations as env variable at boot. But those informations aren't updated, so better rely on DNS. In the above example, the gateway wouldn't be reliable without being restarted regularly.

Note : there an endpoints between Service and Pods, which refresh the list of pods' IPs matching selector. The above scheme should include

svc --> endpoint --> pods

External service

For stuff outside of the cluster. Create a service without selector, and the endpoint manualy with an hardcoded list of IP. Or create a service with type ExternalService specifying a fqdn. It'll keep the mapping up-to-date by itself.

Expose kubernetes's stuff to Internet

HostPorts

Expose pod port to internet.

NodePort Service

k8s will save a port on all nodes to forward it on the defined service.

Internet --> (node)nodePort --> (service)port --> (pod)targetPort

Maybe that'll move from node 1 to node 2 to node 3 at each step... nodePort range is defined on k8s cluster level. If not specified, one will be randomly chosen.

Note, a ServiceIP will also be created for internal usage.

LoadBalancer Service

k8s will interact with the Cloud Provider API (GKE, AKS, EKS...) which will provide its internal LB (AWS's ELB/ALB, etc.)

Ingress

One IP into a Node into a node port given to Ingress into Services... The good point is we don't need to open tons of ports like we would with NodePort Service.

An Ingress Controler needs to be defined at the cluster level (haproxy, nginx, traefik, ...)

Volumes

emptyDir : Create a dir that'll be removed once the pod ends. So containers can talk via files.

hostPath, gitRepo, nfs, glusterfs, flocker, cephfs, gcePersistentDisk (Google), awsElasticBlockStore (AW$), azureDisk (M$)...

And ConfigMaps, Secrets, downwardAPI

Persistent Volume

PV's a volume definition, with quota, ACL, persistency...

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-hostpath-pv
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
    - ReadOnlyMany
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: /data/pv/pv003

Persistent Volume Claim

Pod uses PVC which find a PV matching their requirements

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: my-first-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Mi
  storageClassName: ""

Note, with the above two snippets, first claim will find my-hostpath-pv, the pod will end up with a 1Gi storage when it's asking via PVC for 500Mi

ConfigMap

About mapping keys and values so one doesn't need to redefine (env or other) variables for each pod.

Usage ex :
- inject all ConfigMap value in pod's env
- create a Volume from ConfigMap

Secrets

Not really useful since it isn't implemented properly : that's unproperly ciphered (base64). The only plus VS configMap is it isn't displayed on a kubectl describe...

Rolling update

Manual update fiddling with container tags and RS labels is a bad idea as always when the world manual is involved.

kubectl rolling-update frontend-v1 frontend-v2 --image=image:v2

It's okay but forget about it : use Deployment

Deployments

Akind of a ReplicaSet template.

Start a nginx deployment (default if no --generator=run-pod/v1 is specified) using nginx image from CLI

kubectl run nginx --image=nginx
kubectl get deployment
kubectl describe deployment nginx
kubectl get pods -o wide | grep nginx

Write down the deployment yaml, which may help getting started on a configuration file

kubectl run nginx --image=nginx --dry-run -o yaml | tee nginx.yaml

Here's an example file

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kubia
spec:
  replicas: 3
  minReadySeconds: 5
  revisionHistoryLimit: 5
  selector:
    matchLabels:
      app: kubia
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: kubia
    spec:
      containers:
      - name: nodejs
        image: luksa/kubia:v1
        ports:
          - name: main
            containerPort: 8080
        livenessProbe:
          httpGet:
            path: /.healthcheck
            port: 8080
          periodSeconds: 5
          initialDelaySeconds: 2
        readinessProbe:
          httpGet:
            path: /.readicheck
            port: 8080
          periodSeconds: 5
          initialDelaySeconds: 5

Then kubectl apply -f deployment.yml --record

--record will enable history for the deployment, hence reverting whenever a rolling update end up badly.

Deployment include pod, replicatset, probe, ... We still need a service!

kind: Service
apiVersion: v1
metadata:
  name: kubia-svc
spec:
  selector:
    app: kubia
  ports:
    - name: principal
      port: 80
      targetPort: 8080

A kubectl edit deployment kubia to edit v1 to v2 will trigger a rolling upgrade.

kubectl rollout history deployment kubia to get the rolling update history. kubectl rollout undo deployment kubia.

Pod name will look like [name]-[deployment rollout number]-[random pod hash] instead of [name]-[random pod hash] whenever pods are created via ReplicaSet (which we shouldn't do anymore)

maxSurge option : max number over the ReplicaSet's replicas option. If that's one, a rolling update V1 to V2 will ++ one V2 pod, then -- one V1 pod, etc until V2 is everywhere.

Architecture

All master nodes are the control plane. Which include Etcd, API Server, Scheduler, Controller Manager.

For each worker nodes

  • kubelet creates pods and their containers
  • kubernetes proxy is about redirections
  • container runtime is usually docker, or Rkt, or Cri-O

Check processes running on a node

kubectl get componentstatus

All components are plug and play and could be replaced. But most of the time that's the same base, the same which is "enforced" by kops and alike tools.

Requests and limits

Limitation for containers :

  • request is a soft limit : if node got CPU or Memory available, container'll be able to get it
  • limit is a hard limite : cgroup's gonna shutdown the container

Can be set on many scope : node, namespace, ...

Note : Java doesn't pay attention to CGROUP limitations. It's gonna try to take more than limits allowed, gonna get killed via cgroup, ... Gotta use Java recent versions with the propers options. See BreizhCamp conf "JVM & conteneurisation" from Yoan Rousseau

Auto-scaling

For RS, Deployment or ReplicationControler : adjust replicas number depending on metrics. It's a resource named HorizontalPodAutoscaler

kubectl autoscale deployment autosc --cpu-percent=70 --min=1 --max=5

VerticalPodAutoscaler is coming soon.

Applications Compatibles Kubernetes

Read https://kubernetes.io/blog/2018/03/principles-of-container-app-design/

Principles-of-container-based-application-design

Good practises

  • Use deployments
  • Set up ReadinessProbe and LivenessProbe
  • Use Init Containers
  • Think twice about ImagePullPolicy
  • Use Requests and Limits
  • Use Labels and Annotations
  • Use Kubernetes as soon as you can (aka use minikube or dev k8s on developers workstations)
  • Put yaml configurations files under code versioning.

Helm

k8s package manager
Bundle many resources in a single logical object named chart
A release is an instance of a chart
Helm will pull dependencies
Helm use a template engine, so vars end up in value.yaml. But the engine itself is golang's Sprig...
A repository is a bundle of charts

Questions

  • How to database ? Just don't, ask DBA to setup a good old VM(s) with DB (& replica & co)
  • What about storage, where to locate volumes ? Depends, cf volumes backend option.
  • Gitlab CI into minikube ? Never tried
  • ansible-container ? Not such a good idea. It's more like Dockerfile shouldn't be a mess : mount config files, use init container for bootstrap process, etc.
  • Overhead k8s vs Swarm vs Compose ? Quite heavy for k8s (minikube =~ 10 containers), a bit less for Swarm, no more than the container's for Compose
  • Updating k8s ?
  • "docker" namespace VS k8s namespaces (kubens) ? Same name but different
  • k8s & hardware ? nodeSelector = hereIsMyHardare ? Never tried but yes for GPU for ex

TODO

Setup metrics before deployment

Setup k8s the hard way https://github.com/kelseyhightower/kubernetes-the-hard-way
Try kompose http://kompose.io/
Try confd http://www.confd.io/

Use https://editorconfig.org/

Oh my zsh
- install plugin alias-tips git clone https://github.com/djui/alias-tips ~/.oh-my-zsh/plugins/alias-tips
- use kubectl & minikube & alias-tips

Watch https://www.youtube.com/watch?v=bHjsmxN4iPk

Bookmark
http://xip.io/
http://nip.io/
http://www.jamesbowman.me/post/cdlandscape/ContinuousDeliveryToolLandscape-fullsize.jpeg

Get started on AWS EC2 instance script

#!/bin/bash


curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl && chmod +x kubectl && sudo mv kubectl /usr/local/bin/

curl -Lo minikube https://storage.googleapis.com/minikube/releases/v0.27.0/minikube-linux-amd64 && chmod +x minikube && sudo mv minikube /usr/local/bin/

export MINIKUBE_WANTUPDATENOTIFICATION=false
export MINIKUBE_WANTREPORTERRORPROMPT=false
export MINIKUBE_HOME=$HOME
export CHANGE_MINIKUBE_NONE_USER=true
mkdir $HOME/.kube || true
touch $HOME/.kube/config
export KUBECONFIG=$HOME/.kube/config

sudo -E minikube start --bootstrapper localkube --vm-driver=none

source <(kubectl completion bash)

kubectl completion bash > ~/.kube/completion.bash.inc
printf "
  # Kubectl shell completion
  source '$HOME/.kube/completion.bash.inc'
  " >> $HOME/.bash_profile
source $HOME/.bash_profile