k8s_studay
1、容器运行时
容器在kubernetes中几种场景
- Docker 作为 k8s 容器运行时,调用关系如下:
- kubelet --> docker shim (在 kubelet 进程中) --> dockerd --> containerd
- Containerd 作为 k8s 容器运行时,调用关系如下:
- kubelet --> cri plugin(在 containerd 进程中) --> containerd
容器运行时跟k8s的版本的对应关系参考
https://github.com/containerd/containerd/blob/main/RELEASES.md#deprecated-features
https://github.com/containerd/containerd/blob/main/docs/getting-started.md
1.1、Kubernetes Support
The Kubernetes version matrix represents the versions of containerd which are recommended for a Kubernetes release. Any actively supported version of containerd may receive patches to fix bugs encountered in any version of Kubernetes, however, our recommendation is based on which versions have been the most thoroughly tested. See the Kubernetes test grid for the list of actively tested versions. Kubernetes only supports n-3 minor release versions and containerd will ensure there is always a supported version of containerd for every supported version of Kubernetes.
| Kubernetes Version | containerd Version | CRI Version |
|---|---|---|
| 1.26 | 1.7.0+, 1.6.15+ | v1, v1alpha2 ** |
| 1.27 | 1.7.0+, 1.6.15+ | v1 |
| 1.28 | 1.7.0+, 1.6.15+ | v1 |
| 1.29 | 1.7.11+, 1.6.27+ | v1 |
| 1.30 | 2.0(wip), 1.7.13+, 1.6.28+ | v1 |
| 1.31 | 2.0(wip), 1.7.20+, 1.6.34+ | v1 |
2、affinity亲和性
实现对Pod均衡的调度到某些节点
需要对节点添加污点
template:
metadata:
creationTimestamp: null
labels:
app: topo-dep
# 软性要求
# 如果节点上的pod标签存在满足app=topo-dep,也可以部署到节点上,尽可能先部署到其它节点,如果没有满足也可以部署到此节点
spec:
affinity:
podAntiAffinity: #pod反亲和
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100 #根据权重分配
podAffinityTerm:
labelSelector:
matchExpressions:
- key: "app" #对于app=topo-dep的pod尽量不调度到该节点
operator: In
values:
- topo-dep
topologyKey: "kubernetes.io/hostname"
以上是软性要求,可以配置硬性要求
# 硬性要求
# 如果节点上的pod标签存在满足app=nginx,则不能部署到节点上
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: "kubernetes.io/hostname"
2.1、food-match 案例讲解
先给服务器添加lable
kubec label nodes sanjiu-uat-0001 node_app=sanjiu
kubec label nodes sanjiu-uat-0002 node_app=sanjiu
kubec label nodes sanjiu-uat-0003 node_app=sanjiu
首先需要给业务在部署的时候添加label, 比如给deployment文件中 添加 app=food-match
1.集群新增三个节点,sanjiu-uat-0001,sanjiu-uat-0002,sanjiu-uat-0003,这三个节点不能让其他服务调度过来
方法: 添加污点
kubectl taint nodes sanjiu-uat-0001 name=sanjiu:NoSchedule
kubectl taint nodes sanjiu-uat-0002 name=sanjiu:NoSchedule
kubectl taint nodes sanjiu-uat-0003 name=sanjiu:NoSchedule
deploy文件参考如下
tolerations: #容忍污点
- key: name
value: sanjiu
effect: NoSchedule
nodeSelector: #选择node带标签的节点
node_app: sanjiu
affinity:
podAntiAffinity: #pod反亲和
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: "app" #对于app=food-match进行反亲和,就是一个节点尽量一个pod
operator: In
values:
- food-match
topologyKey: "kubernetes.io/hostname"
2.先调度到sanjiu-uat-0001,sanjiu-uat-0002,各一个Pod,但是不能调度到sanjiu-uat-0003
方法: 在sanjiu-uat-0003执行 kubec label nodes sanjiu-uat-0003 node_app=sanjiu-
deploy文件参考如下
tolerations:
- key: name
value: sanjiu
effect: NoSchedule
nodeSelector:
node_app: sanjiu
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- food-match
topologyKey: "kubernetes.io/hostname"
3.调度到sanjiu-uat-0001,sanjiu-uat-0002,sanjiu-uat-0003各一个Pod 方法: 在sanjiu-uat-0003执行 kubec label nodes sanjiu-uat-0003 node_app=sanjiu-
deploy文件参考如下
tolerations:
- key: name
value: sanjiu
effect: NoSchedule
nodeSelector:
node_app: sanjiu
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- food-match
topologyKey: "kubernetes.io/hostname"
#io.rancher.scheduler.global: 'true' #io.rancher.scheduler.affinity:host_label: node=foodmatch
2.2、亲和和反亲和的案例
ai-api/ai-admin
均衡的调度每个节点-pod反亲和(软)
spec:
nodeSelector:
node_app: sanjiu
tolerations:
- key: name
value: sanjiu
effect: NoSchedule
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- health-cloud-ai-activity-api
topologyKey: kubernetes.io/hostname
均衡的调度每个节点-pod反亲和(硬-节点上的pod标签存在满足app=health-cloud-ai-activity-api,则不能部署到节点上) 部署实例数需要小于匹配污点的固定节点。
spec:
nodeSelector:
node_app: sanjiu
tolerations:
- key: name
value: sanjiu
effect: NoSchedule
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- health-cloud-ai-activity-api
topologyKey: "kubernetes.io/hostname"
如果pod实例数跟需要绑定的节点数相同的情况下,第一次部署没有问题,如果需要更新升级,那么需要有多余匹配的节点数,否则会pending, 因为滚动更新不会首次删除节点,所以会额外的筛选资源,这种情况下可以通过修改更新策略来实现。
下面这个例子,就是保证打了污点的节点能够按照期望的值去部署一个节点,更新的时候,会先删除30%的pod,然后逐一更新
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
maxUnavailable: 30%
selector:
matchLabels:
app: topo-dep
template:
metadata:
creationTimestamp: null
labels:
app: topo-dep
spec:
nodeSelector:
node_app: sanjiu
tolerations:
- key: name
value: sanjiu
effect: NoSchedule
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- topo-dep
topologyKey: "kubernetes.io/hostname"
maxSurge: 指定升级期间存在的总Pod对象数量最多以超出期望值的个数,其值可以为0或者正整数,也可以是一个期望值的百分比:例如如果期望值是3,当前的属性值为1,则表示Pod对象的总数不能超过4个。 maxUnavailable:升级期间正常可用的Pod副本数(包括新旧版本)最多不能低于期望的个数、其值可以是0或者正整数。也可以是一个期望值的百分比,默认值为1;该值意味着如果期望值是3,那么在升级期间至少要有两个Pod对象处于正常提供服务的状态。
maxSurge和maxUnavailable的数量不能同时为0,否则Pod对象的复本数量在符合用户期望的数量后无法做出合理变动以进行滚动更新操作。
配置时,用户还可以使用 Deployment控制器的 spec.min.Ready.Seconds属性来控制应用升级的速度。新旧更替过程中,新创建的Pod对象一旦成功响应就绪探测即被视作可用,而后即可立即开始下一轮的替换操作。而 spec.minReady.Seconds能够定义在新的Pod对象创建后至少要等待多久才会将其视作就绪,在此期间,更新操作会被阻塞。因此,它可以用来让 Kubernetes在每次创建出Pod资源后都要等上一段时长后再开始下一轮的更替,这个时间长度的理想值是等到Pod对象中的应用已经可以接受并处理请求流量。事实上,一个精心设计的等待时长和就绪性探测能让 Kubernetes系统规避一部分因程序Bug而导致的升级故障。
3、镜像操作
3.1、拉取指定平台的镜像
docker pull --platform amd64 docker.io/calico/kube-controllers:v3.26.3
#docker 导出的tar包支持ctr导入
#清理没有使用的镜像
ctr -n k8s.io image prune --all
3.2、镜像导出导入
#导出镜像
ctr -n k8s.io image export pause.tar registry.k8s.io/pause:3.9
#导入镜像
ctr -n k8s.io image import feature.tar
[root@master nginx-demo]# kubectl patch deployments myapp-deployment -p '{"spec":{"minReadySeconds":8}}'
deployment.extensions/myapp-deployment patched (change)
[root@master nginx-demo]# kubectl set image deployments myapp-deployment myapp=ikubernetes/myapp:v2
deployment.extensions/myapp-deployment image updated
滚动更新时, myapp-deploy控制器会创建一个新的 ReplicaSe控制器对象来管控新版本的Pod对象,升级完成后,旧版本的 Replicase会保留在历史记录中,但其此前的管控Pod对象将会被删除。
root@k8s-master:~# kubectl get replicasets -l app=topo-dep
NAME DESIRED CURRENT READY AGE
topo-dep-54b55fb587 2 2 2 105m
topo-dep-6b66c5b6c7 0 0 0 103m
root@k8s-master:~# kubectl get pods -l app=topo-dep
NAME READY STATUS RESTARTS AGE
topo-dep-54b55fb587-74vxp 1/1 Running 0 42m
topo-dep-54b55fb587-vc2cp 1/1 Running 0 42m
3.3、镜像导出导入脚本
root@node4:~/img_node/img_node# cat img_export.sh
#!/bin/bash
images=$(ctr -n=k8s.io image ls -q|egrep -v 'sha256')
for image in $images; do
echo "Exporting $image..."
ctr -n=k8s.io image export "$(echo $image | tr '/' '_').tar" $image
done
root@node4:~/img_node/img_node# cat img_import.sh
#!/bin/bash
files=(*.tar)
for file in "${files[@]}"; do
echo "Importing $file..."
ctr -n=k8s.io image import "$file"
done
4、服务调度策略
4.1、回滚策略
kubectl rollout history 检查Deployment部署历史
kubectl rollout undo deployment/… --revision=2
暂停策略 kubectl roollout pause deployment/…
5、节点维护操作
5.1、污点添加
kubectl taint nodes sanjiu-uat-0001 name=sanjiu:NoSchedule
kubectl taint nodes sanjiu-uat-0002 name=sanjiu:NoSchedule
kubectl taint nodes sanjiu-uat-0003 name=sanjiu:NoSchedule
5.2、污点取消
kubectl taint nodes sanjiu-uat-0003 name=sanjiu:NoSchedule-
6、节点滚动更新
6.1 rollout restart
先创建出来,在删除老的
kubectl rollout restart deploy/tob-priv-xxxx -n tob-private --context=xxxxxxxxx
6.2 maxUnavailable
修改为0,表示不可用数为0,在升级期间,先创建出来,在删除老的
maxUnavailable: 0
7、存储
7.1、emptyDir
如果你在 Kubernetes 中运行 TGI,您还可以通过以下方式创建卷,将共享内存添加到容器中:
- name: shm
emptyDir:
medium: Memory
sizeLimit: 1Gi
将它挂载到 /dev/shm
最后,还可以使用 NCCL_SHM_DISABLE=1 环境变量禁用SHM共享。