这个文件一共 两个 play:
Play 1(localhost):只在控制机(你现在跑 ansible 的那台)做准备工作:生成/读取 SSH 公钥、写 ssh config、准备离线目录、尝试下载 k8s apt keyring(可选、失败不报错)。
Play 2(k8s_cluster):对所有 k8s 节点做“真正的装机 + Kubernetes 初始化/加入”流程:系统基础配置、离线 apt repo 挂载、安装 containerd/k8s 组件、VIP 负载均衡(haproxy+keepalived,仅 master)、kubeadm init(首个 master)、其余 master join、worker join。
Play 1:Prepare controller SSH key (localhost)
目标:让控制机具备 SSH key,并把公钥内容作为 fact 传给后续 play 使用。
关键点:
1.确保 ~/.ssh 存在
file 创建目录、权限 0700。
2.若缺失则生成 SSH key
ssh-keygen -t rsa -b 4096 -N "" -f "{{ controller_ssh_key }}"
creates: 保证幂等:文件存在则不再执行。
3.读取控制机公钥并 set_fact
slurp 会把文件内容 base64 读回来;再 b64decode 变成文本。
存到 controller_pubkey,供后续节点写入 authorized_keys。
4.可选:写入控制机的 ~/.ssh/config
对 192.168.30.* 统一指定:
用户 root
IdentityFile 使用 controller 生成的 key
StrictHostKeyChecking no / UserKnownHostsFile /dev/null(方便自动化,但安全性降低)
这一步不影响节点配置,只是让你控制机 ssh 更省事。
5.确保 files/offline 目录存在
后续用到离线包、keyring 都放在 files/offline。
6.尝试下载 kubernetes apt keyring(最佳努力,不失败)
failed_when: false + changed_when: false
有网就生成 kubernetes-apt-keyring.gpg,没网也继续往下跑。
随后 stat 检测是否存在,并设置 controller_has_k8s_keyring。
注意:你这里的离线 repo list 使用了 trusted=yes,所以 即使没有 keyring 也能装。但保留 keyring 逻辑可以让你以后切回在线源/或者取消 trusted=yes 更安全。
Play 2:Bootstrap all k8s nodes(核心)
2.1 vars:整套集群“参数中心”
ssh_user / ssh_user_home:依据 ansible_user 判断是 /root 还是 /home/<user>,用来给该用户写 authorized_keys。
VIP / keepalived / haproxy:
apiserver_vip=192.168.30.58
apiserver_vip_port=16443(VIP 对外暴露端口)
apiserver_bind_port=6443(kube-apiserver 实际监听端口)
apiserver_vip_iface:用默认网卡名(拿 facts)
keepalived_virtual_router_id/auth_pass:VRRP 参数
离线 repo:
offline_tools_repo_tar、offline_containerd_repo_tar、offline_k8s_repo_tar、offline_lb_repo_tar
以及对应解压目录、apt list 文件路径
k8s 版本:
k8s_pkg_version: 1.30.14-1.1
kubeadm_kubernetes_version: v1.30.14
kubeadm_image_repository:你用阿里镜像仓库,适合国内/离线镜像同步场景
containerd:
containerd_sandbox_image 指定 pause 镜像
SystemdCgroup=true
内核模块:overlay、br_netfilter、ipvs 一套 + nf_conntrack
集群网络:pod/service 子网、domain、cri socket
LB 节点选择逻辑(很关键):
lb_masters: "{{ (groups['k8s_cluster'] | select('search','master') | list | sort)[:2] }}"
从 inventory 里挑主机名含 master 的,排序后取前两个作为“做 VIP LB 的两台 master”
init_master: "{{ lb_masters[0] }}":第一台 master 作为 kubeadm init 节点
is_lb_master/is_init_master:根据当前 host 判断分支执行
注意:这个选择逻辑 强依赖你的 inventory 主机名里包含 master,且至少有 2 台;否则 haproxy 配置那里引用 lb_masters[1] 会出问题。
2.2 apt 源清理:避免离线环境 apt 卡死
注释 /etc/apt/sources.list 和 /etc/apt/sources.list.d/*.list 里 deb cdrom: 行
离线环境最常见问题就是 apt update 时尝试访问 cdrom 或不可达源导致报错/卡住,这里算是“保险丝”。
2.3 主机基础:hostname、/etc/hosts、SSH 信任
hostname 设置为 inventory_hostname
生成 k8s_hosts_block:把所有节点 IP + 主机名写进 blockinfile
写入 /etc/hosts(保证节点互相能用主机名解析)
写 authorized_keys:
给 ansible_user 和 root 都写入控制机公钥(让控制机免密登录节点)
给 root 再写入“所有节点之间互信”的 key(node<->node)
配 sshd drop-in:
PermitRootLogin prohibit-password(允许 root 使用公钥登录)
PasswordAuthentication no(禁用密码登录)
并触发 handler restart ssh
风险提示:如果你原本靠密码/其他方式登录,禁用密码可能把你锁在门外。好在你的流程先把公钥塞进去再禁用密码,逻辑上是对的,但仍建议谨慎在生产环境使用。
2.4 swap & 内核参数:Kubernetes 前置条件
swapoff -a + 注释 /etc/fstab 里 swap 行(避免重启恢复)
写 /etc/modules-load.d/k8s.conf 并 modprobe
写 /etc/sysctl.d/99-kubernetes-cri.conf 并 sysctl --system
包括桥接流量、ip_forward、nonlocal_bind(VIP 常用)
2.5 离线 apt repo:解压、自动定位 Packages.gz、写 file: 源
流程对每个 repo 都类似:
确保 /opt/offline-repos 存在
解压 tar.gz 到对应目录
find Packages.gz,取其所在目录当“repo root”
写 deb [trusted=yes] file:<repo_root> ./
apt update 刷缓存
安装 packages
trusted=yes 让 apt 不校验签名,离线很好用,但安全性降低;如果你已经有 keyring/签名,也可以改为不 trusted 并正确配 key。
2.6 containerd:配置 CRI、启动服务
安装 containerd/runc(来自离线 repo)
写 /etc/containerd/config.toml
sandbox_image 指定 pause
snapshotter=overlayfs
SystemdCgroup=true
registry mirrors 里 docker.io 指向 registry-1(若离线环境不出网,拉 docker.io 仍会失败——通常你会提前把镜像导入,或搭私有 registry)
systemd 启动并 enable
写 /etc/crictl.yaml 让 crictl 默认连 containerd
2.7 kubeadm/kubelet/kubectl:离线安装 + hold
可选复制 kubernetes apt keyring 到节点 /etc/apt/keyrings
安装固定版本 kubeadm/kubelet/kubectl + 依赖(kubernetes-cni、cri-tools…)
apt-mark hold 锁版本
启动 kubelet(此时可能还会报错是正常的,直到 kubeadm init/join 完成)
2.8 VIP LB(仅两台 master):haproxy + keepalived
两台 master 安装 haproxy、keepalived
haproxy:
监听 *:16443
后端转发到两台 master 的 :6443
keepalived:
check_haproxy.sh:只检查 haproxy 进程是否存在
两台都用 state BACKUP,用优先级决定谁抢到 VIP
virtual_ipaddress 配 VIP/24
track_script 绑定健康检查
启动并 enable;等待本地 16443 端口起来
这套结构就是:VIP(16443) -> haproxy -> master(6443)
kubeadm 的 controlPlaneEndpoint 指向 VIP:16443,所以集群内外都走 VIP。
2.9 kubeadm init(仅 init master)
先检查 /etc/kubernetes/admin.conf 是否存在,存在说明已经初始化过,避免重复 init
写 /root/kubeadm.yaml
apiVersion: kubeadm.k8s.io/v1beta3(你已经标注了“修复2:v1beta4 -> v1beta3”)
controlPlaneEndpoint: VIP:16443
imageRepository 指向阿里
apiServer.certSANs 包含 VIP、两台 master IP/hostname、localhost
InitConfiguration 里:
advertiseAddress 用本机 IP
bindPort 用 6443
nodeRegistration 设置 cri socket 和 node-ip
执行 kubeadm init(带 --upload-certs,并忽略 SystemVerification、Swap)
把 admin.conf 拷贝到 /root/.kube/config 方便 kubectl
生成 join 命令:
worker join command:kubeadm token create --print-join-command
控制面 join 需要 --control-plane --certificate-key <key>
kubeadm init phase upload-certs --upload-certs 输出里抓 64 位 hex key
把 join 命令写成脚本 /root/join-worker.sh、/root/join-controlplane.sh
最关键的一步:把 join 命令通过 delegate_to: localhost + delegate_facts: true 变成“全局事实”,让后续其他节点能引用:
hostvars['localhost'].global_join_worker
hostvars['localhost'].global_join_cp
2.10 其余节点 join
先看 /etc/kubernetes/kubelet.conf 是否存在(存在说明已 join)
第二台 master(is_lb_master 且 not is_init_master):
运行 global_join_cp 加入控制面
worker:
运行 global_join_worker
handlers:服务重启
containerd、haproxy、keepalived、ssh 的 restart 都集中在 handlers
由上面 tasks 的 notify 触发,符合 ansible 最佳实践
---
###############################################################################
# Play 1: 仅在控制机(localhost)执行
# 目的:
# 1) 生成/准备控制机 SSH key(用于免密登录所有节点)
# 2) 读取控制机公钥,保存为 fact,供后续 play 写入各节点 authorized_keys
# 3) 准备离线目录与(可选)kubernetes apt keyring 文件
###############################################################################
- name: Prepare controller SSH key (localhost)
hosts: localhost
gather_facts: false
tasks:
# 确保控制机 ~/.ssh 目录存在
- name: Ensure ~/.ssh exists on controller
ansible.builtin.file:
path: "{{ lookup('env','HOME') + '/.ssh' }}"
state: directory
mode: "0700"
# 若 controller_ssh_key 不存在则生成(幂等:creates 控制)
- name: Generate SSH key on controller if missing
ansible.builtin.command: >
ssh-keygen -t rsa -b 4096 -N "" -f "{{ controller_ssh_key }}"
args:
creates: "{{ controller_ssh_key }}"
# 读取控制机公钥(slurp 返回 base64)
- name: Read controller public key
ansible.builtin.slurp:
src: "{{ controller_ssh_pub }}"
register: controller_pubkey_raw
# 将 base64 解码成文本形式的公钥,保存为 controller_pubkey(供后续 hostvars['localhost'] 引用)
- name: Set controller_pubkey fact
ansible.builtin.set_fact:
controller_pubkey: "{{ controller_pubkey_raw.content | b64decode }}"
# 可选:写入控制机 ~/.ssh/config,方便你从控制机 ssh 到 192.168.30.* 网段
# 注意:StrictHostKeyChecking no 会降低安全性,但便于自动化环境
- name: Ensure controller ssh config includes cluster rule (optional but recommended)
ansible.builtin.blockinfile:
path: "{{ lookup('env','HOME') + '/.ssh/config' }}"
create: true
mode: "0600"
marker: "# {mark} ANSIBLE K8S CLUSTER SSH"
block: |
Host 192.168.30.*
User root
IdentityFile {{ controller_ssh_key }}
IdentitiesOnly yes
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
# 确保项目的离线文件目录存在(tar.gz、keyring 等都在这里)
- name: Ensure files/offline exists on controller
ansible.builtin.file:
path: "{{ playbook_dir }}/../files/offline"
state: directory
mode: "0755"
# 尝试在线下载 kubernetes apt keyring(最佳努力:失败不报错)
# 离线环境没网也没关系,你目录里已有 kubernetes-apt-keyring.gpg 的话同样可用
- name: Try to generate kubernetes apt keyring on controller if missing (best effort, no-fail)
ansible.builtin.shell: |
set -e
curl -fsSL --connect-timeout 5 --max-time 20 https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key \
| gpg --dearmor -o "{{ playbook_dir }}/../files/offline/kubernetes-apt-keyring.gpg"
args:
creates: "{{ playbook_dir }}/../files/offline/kubernetes-apt-keyring.gpg"
changed_when: false
failed_when: false
# 检测 keyring 是否存在
- name: Check kubernetes apt keyring exists on controller
ansible.builtin.stat:
path: "{{ playbook_dir }}/../files/offline/kubernetes-apt-keyring.gpg"
register: controller_k8s_keyring_stat
# 设置一个布尔 fact,供后续 play 决定要不要复制 keyring 到节点
- name: Set controller_has_k8s_keyring fact
ansible.builtin.set_fact:
controller_has_k8s_keyring: "{{ controller_k8s_keyring_stat.stat.exists | default(false) }}"
###############################################################################
# Play 2: 对所有 k8s 节点执行(hosts: k8s_cluster)
# 目的(大而全):
# - 系统基础:hostname、/etc/hosts、关闭 swap、内核模块与 sysctl
# - SSH:控制机 -> 节点免密;节点之间 root 互信(node<->node)
# - 离线安装:解压离线 repo,写 file: apt 源,apt 安装工具/容器运行时/k8s 组件
# - master VIP:haproxy + keepalived 提供 apiserver VIP 入口
# - kubeadm:init 首个 master;其余 master/worker join
###############################################################################
- name: Bootstrap all k8s nodes (hostname, /etc/hosts, SSH trust, offline tools, kernel modules, containerd, k8s pkgs, swapoff, apiserver VIP LB, kubeadm init/join)
hosts: k8s_cluster
become: true
gather_facts: true
vars:
# 当前 ansible 连接用户及其 home(用于写 authorized_keys)
ssh_user: "{{ ansible_user }}"
ssh_user_home: "{{ '/root' if ssh_user == 'root' else '/home/' ~ ssh_user }}"
# apiserver VIP(对外入口),以及 VIP 对外端口与 apiserver 实际 bind 端口
apiserver_vip: "192.168.30.58"
apiserver_vip_port: 16443
apiserver_bind_port: 6443
# keepalived 使用的网卡(默认取 facts 的默认网卡,否则 ens33)
apiserver_vip_iface: "{{ ansible_default_ipv4.interface | default('ens33') }}"
keepalived_virtual_router_id: 51
keepalived_auth_pass: "k8sVIP@2025"
# -------------------------
# 离线 repo:系统工具
# -------------------------
offline_tools_repo_tar: "{{ playbook_dir }}/../files/offline/os-tools-repo-ipvs.tar.gz"
offline_tools_repo_dir: "/opt/offline-repos/os-tools-ipvs"
offline_tools_repo_list: "/etc/apt/sources.list.d/offline-os-tools-ipvs.list"
offline_tools_packages:
- expect
- wget
- jq
- psmisc
- vim
- net-tools
- telnet
- lvm2
- git
- ntpdate
- chrony
- bind9-utils
- rsync
- unzip
- ipvsadm
- ipset
- sysstat
- conntrack
# -------------------------
# 离线 repo:containerd
# -------------------------
offline_containerd_repo_tar: "{{ playbook_dir }}/../files/offline/containerd-repo.tar.gz"
offline_containerd_repo_dir: "/opt/offline-repos/containerd"
offline_containerd_repo_list: "/etc/apt/sources.list.d/offline-containerd.list"
offline_containerd_packages:
- containerd
- runc
# -------------------------
# 离线 repo:haproxy/keepalived(仅 master 用)
# -------------------------
offline_lb_repo_tar: "{{ playbook_dir }}/../files/offline/nginx-keepalived-repo.tar.gz"
offline_lb_repo_dir: "/opt/offline-repos/nginx-keepalived"
offline_lb_repo_list: "/etc/apt/sources.list.d/offline-nginx-keepalived.list"
# -------------------------
# Kubernetes 版本与镜像仓库
# -------------------------
k8s_pkg_version: "1.30.14-1.1"
kubeadm_kubernetes_version: "v1.30.14"
kubeadm_image_repository: "registry.cn-hangzhou.aliyuncs.com/google_containers"
# containerd pause 镜像(pod sandbox)
containerd_sandbox_image: "registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.9"
containerd_config: "/etc/containerd/config.toml"
# -------------------------
# 离线 repo:Kubernetes apt 仓库
# -------------------------
offline_k8s_repo_tar: "{{ playbook_dir }}/../files/offline/k8s-repo-v1.30.14-1.1.tar.gz"
offline_k8s_repo_dir: "/opt/offline-repos/k8s-v1.30.14-1.1"
offline_k8s_repo_list: "/etc/apt/sources.list.d/offline-k8s-v1.30.14-1.1.list"
# Kubernetes keyring(如果控制机存在,就复制到节点)
offline_k8s_keyring_src: "{{ playbook_dir }}/../files/offline/kubernetes-apt-keyring.gpg"
offline_k8s_keyring_dest: "/etc/apt/keyrings/kubernetes-apt-keyring.gpg"
# k8s 组件及依赖(kubeadm/kubelet/kubectl 固定版本安装)
offline_k8s_packages:
- "kubeadm={{ k8s_pkg_version }}"
- "kubelet={{ k8s_pkg_version }}"
- "kubectl={{ k8s_pkg_version }}"
- kubernetes-cni
- cri-tools
- socat
- ebtables
- ethtool
- apt-transport-https
# ipvs 与 k8s 常用模块
ipvs_modules:
- ip_vs
- ip_vs_rr
- ip_vs_wrr
- ip_vs_sh
- nf_conntrack
k8s_modules:
- overlay
- br_netfilter
# 集群网络参数
pod_subnet: "10.244.0.0/16"
service_subnet: "10.96.0.0/12"
cluster_domain: "cluster.local"
cri_socket: "unix:///run/containerd/containerd.sock"
# 从 inventory 中挑选主机名包含 master 的节点,排序取前两台作为 LB master
# 第 1 台同时作为 kubeadm init 节点
lb_masters: "{{ (groups['k8s_cluster'] | select('search','master') | list | sort)[:2] }}"
is_lb_master: "{{ inventory_hostname in lb_masters }}"
init_master: "{{ lb_masters[0] }}"
is_init_master: "{{ inventory_hostname == init_master }}"
tasks:
# -------------------------
# apt 源清理:禁用 cdrom 源(离线环境常见坑)
# -------------------------
- name: Disable CDROM apt source in /etc/apt/sources.list (comment deb cdrom:)
ansible.builtin.replace:
path: /etc/apt/sources.list
regexp: '^deb\s+cdrom:'
replace: '# deb cdrom:'
failed_when: false
- name: Find .list files under /etc/apt/sources.list.d
ansible.builtin.find:
paths: /etc/apt/sources.list.d
patterns: "*.list"
file_type: file
register: apt_list_files
failed_when: false
- name: Disable CDROM apt source in sources.list.d files (comment deb cdrom:)
ansible.builtin.replace:
path: "{{ item.path }}"
regexp: '^deb\s+cdrom:'
replace: '# deb cdrom:'
loop: "{{ apt_list_files.files | default([]) }}"
failed_when: false
# -------------------------
# 主机名与 hosts 解析:确保节点互相能解析主机名
# -------------------------
- name: Set hostname
ansible.builtin.hostname:
name: "{{ inventory_hostname }}"
- name: Build hosts block for all cluster nodes
ansible.builtin.set_fact:
k8s_hosts_block: |
{% for h in groups['k8s_cluster'] | sort %}
{{ hostvars[h].ansible_default_ipv4.address }} {{ h }}
{% endfor %}
- name: Ensure /etc/hosts contains cluster nodes mapping
ansible.builtin.blockinfile:
path: /etc/hosts
marker: "# {mark} ANSIBLE K8S CLUSTER HOSTS"
block: "{{ k8s_hosts_block }}"
# -------------------------
# SSH 免密:控制机 -> 节点(ansible_user 与 root)
# -------------------------
- name: Ensure ansible user .ssh dir exists
ansible.builtin.file:
path: "{{ ssh_user_home }}/.ssh"
state: directory
mode: "0700"
owner: "{{ ssh_user }}"
group: "{{ ssh_user }}"
- name: Add controller pubkey to ansible user authorized_keys
ansible.builtin.lineinfile:
path: "{{ ssh_user_home }}/.ssh/authorized_keys"
create: true
mode: "0600"
owner: "{{ ssh_user }}"
group: "{{ ssh_user }}"
line: "{{ hostvars['localhost'].controller_pubkey | default('') }}"
when: (hostvars['localhost'].controller_pubkey | default('')) | length > 0
- name: Ensure root .ssh dir exists
ansible.builtin.file:
path: /root/.ssh
state: directory
mode: "0700"
- name: Add controller pubkey to root authorized_keys
ansible.builtin.lineinfile:
path: /root/.ssh/authorized_keys
create: true
mode: "0600"
line: "{{ hostvars['localhost'].controller_pubkey | default('') }}"
when: (hostvars['localhost'].controller_pubkey | default('')) | length > 0
# -------------------------
# SSHD 策略:允许 root 公钥登录,但不禁用密码登录
# -------------------------
- name: Ensure sshd drop-in dir exists
ansible.builtin.file:
path: /etc/ssh/sshd_config.d
state: directory
mode: "0755"
- name: Allow root login with publickey (drop-in) and keep password login enabled
ansible.builtin.copy:
dest: /etc/ssh/sshd_config.d/99-ansible-rootlogin.conf
mode: "0644"
content: |
PermitRootLogin prohibit-password
PubkeyAuthentication yes
PasswordAuthentication yes
notify: Restart ssh
# -------------------------
# 节点之间 root 互信:node <-> node
# 思路:每个节点生成自己的 /root/.ssh/id_rsa,然后把所有节点的公钥汇总写到每台的 authorized_keys
# -------------------------
- name: Generate node SSH key if missing
ansible.builtin.command: ssh-keygen -t rsa -b 4096 -N "" -f /root/.ssh/id_rsa
args:
creates: /root/.ssh/id_rsa
- name: Read node public key
ansible.builtin.slurp:
src: /root/.ssh/id_rsa.pub
register: node_pubkey_raw
- name: Set node_pubkey_text fact
ansible.builtin.set_fact:
node_pubkey_text: "{{ node_pubkey_raw.content | b64decode | trim }}"
- name: Add all nodes keys to every node authorized_keys (node <-> node)
ansible.builtin.lineinfile:
path: /root/.ssh/authorized_keys
create: true
mode: "0600"
line: "{{ hostvars[item].node_pubkey_text }}"
loop: "{{ groups['k8s_cluster'] | sort }}"
when: hostvars[item].node_pubkey_text is defined
# -------------------------
# swap:k8s 要求关闭 swap
# -------------------------
- name: Disable swap immediately
ansible.builtin.command: swapoff -a
changed_when: false
failed_when: false
- name: Comment swap in /etc/fstab
ansible.builtin.replace:
path: /etc/fstab
regexp: '^(\s*[^#\n]+\s+[^ \n]+\s+swap\s+[^ \n]+.*)$'
replace: '# \1'
failed_when: false
# -------------------------
# 内核模块与 sysctl:k8s + ipvs 常规前置
# -------------------------
- name: Ensure k8s modules-load file
ansible.builtin.copy:
dest: /etc/modules-load.d/k8s.conf
mode: "0644"
content: |
overlay
br_netfilter
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack
- name: Modprobe required modules
ansible.builtin.command: "modprobe {{ item }}"
loop: "{{ k8s_modules + ipvs_modules }}"
changed_when: false
failed_when: false
- name: Ensure sysctl for Kubernetes
ansible.builtin.copy:
dest: /etc/sysctl.d/99-kubernetes-cri.conf
mode: "0644"
content: |
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
net.ipv4.ip_nonlocal_bind = 1
- name: Apply sysctl
ansible.builtin.command: sysctl --system
changed_when: false
# -------------------------
# 离线 repo:目录准备
# -------------------------
- name: Ensure offline repos base dir exists
ansible.builtin.file:
path: /opt/offline-repos
state: directory
mode: "0755"
- name: Ensure offline repo dirs exist
ansible.builtin.file:
path: "{{ item }}"
state: directory
mode: "0755"
loop:
- "{{ offline_tools_repo_dir }}"
- "{{ offline_containerd_repo_dir }}"
- "{{ offline_k8s_repo_dir }}"
- "{{ offline_lb_repo_dir }}"
# -------------------------
# 离线 repo:系统工具 repo(解压 -> 自动定位 Packages.gz -> 写 file: 源)
# -------------------------
- name: Unpack offline tools repo
ansible.builtin.unarchive:
src: "{{ offline_tools_repo_tar }}"
dest: "{{ offline_tools_repo_dir }}"
- name: Find Packages.gz for offline tools repo (auto-detect repo root)
ansible.builtin.find:
paths: "{{ offline_tools_repo_dir }}"
patterns: "Packages.gz"
recurse: true
register: tools_pkg_index
- name: Set offline tools repo root
ansible.builtin.set_fact:
offline_tools_repo_root: "{{ (tools_pkg_index.files | first).path | dirname }}"
when: (tools_pkg_index.matched | int) > 0
- name: Write offline tools apt source list
ansible.builtin.copy:
dest: "{{ offline_tools_repo_list }}"
mode: "0644"
content: |
deb [trusted=yes] file:{{ offline_tools_repo_root | default(offline_tools_repo_dir) }} ./
# -------------------------
# 离线 repo:containerd repo(解压 -> 自动定位 Packages.gz -> 写 file: 源)
# -------------------------
- name: Unpack offline containerd repo
ansible.builtin.unarchive:
src: "{{ offline_containerd_repo_tar }}"
dest: "{{ offline_containerd_repo_dir }}"
- name: Find Packages.gz for offline containerd repo (auto-detect repo root)
ansible.builtin.find:
paths: "{{ offline_containerd_repo_dir }}"
patterns: "Packages.gz"
recurse: true
register: containerd_pkg_index
- name: Set offline containerd repo root
ansible.builtin.set_fact:
offline_containerd_repo_root: "{{ (containerd_pkg_index.files | first).path | dirname }}"
when: (containerd_pkg_index.matched | int) > 0
- name: Write offline containerd apt source list
ansible.builtin.copy:
dest: "{{ offline_containerd_repo_list }}"
mode: "0644"
content: |
deb [trusted=yes] file:{{ offline_containerd_repo_root | default(offline_containerd_repo_dir) }} ./
# -------------------------
# 离线 repo:k8s repo(解压 -> 自动定位 Packages.gz -> 写 file: 源)
# -------------------------
- name: Unpack offline kubernetes repo
ansible.builtin.unarchive:
src: "{{ offline_k8s_repo_tar }}"
dest: "{{ offline_k8s_repo_dir }}"
- name: Find Packages.gz for offline kubernetes repo (auto-detect repo root)
ansible.builtin.find:
paths: "{{ offline_k8s_repo_dir }}"
patterns: "Packages.gz"
recurse: true
register: k8s_pkg_index
- name: Set offline kubernetes repo root
ansible.builtin.set_fact:
offline_k8s_repo_root: "{{ (k8s_pkg_index.files | first).path | dirname }}"
when: (k8s_pkg_index.matched | int) > 0
- name: Write offline kubernetes apt source list
ansible.builtin.copy:
dest: "{{ offline_k8s_repo_list }}"
mode: "0644"
content: |
deb [trusted=yes] file:{{ offline_k8s_repo_root | default(offline_k8s_repo_dir) }} ./
# -------------------------
# 离线 repo:LB repo(仅 master,且 best effort)
# -------------------------
- name: Unpack offline LB repo (masters only, best effort)
ansible.builtin.unarchive:
src: "{{ offline_lb_repo_tar }}"
dest: "{{ offline_lb_repo_dir }}"
when: is_lb_master
failed_when: false
- name: Find Packages.gz for offline LB repo (auto-detect repo root)
ansible.builtin.find:
paths: "{{ offline_lb_repo_dir }}"
patterns: "Packages.gz"
recurse: true
register: lb_pkg_index
when: is_lb_master
failed_when: false
- name: Set offline LB repo root
ansible.builtin.set_fact:
offline_lb_repo_root: "{{ (lb_pkg_index.files | first).path | dirname }}"
when:
- is_lb_master
- lb_pkg_index is defined
- (lb_pkg_index.matched | default(0) | int) > 0
- name: Write offline LB apt source list (masters only, best effort)
ansible.builtin.copy:
dest: "{{ offline_lb_repo_list }}"
mode: "0644"
content: |
deb [trusted=yes] file:{{ offline_lb_repo_root | default(offline_lb_repo_dir) }} ./
when: is_lb_master
failed_when: false
# 配置完离线源后刷新 apt cache
- name: Update apt cache after configuring offline repos
ansible.builtin.apt:
update_cache: true
cache_valid_time: 3600
# 安装常用工具(失败不致命:可能某些包不在 repo 里)
- name: Install common offline tools packages
ansible.builtin.apt:
name: "{{ offline_tools_packages }}"
state: present
update_cache: false
failed_when: false
# -------------------------
# containerd 安装与配置
# -------------------------
- name: Ensure containerd is installed
ansible.builtin.apt:
name: "{{ offline_containerd_packages }}"
state: present
update_cache: false
# 写入 containerd 配置(包含 SystemdCgroup=true 等)
- name: Write containerd config.toml
ansible.builtin.copy:
dest: "{{ containerd_config }}"
mode: "0644"
content: |
version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
[grpc]
address = "/run/containerd/containerd.sock"
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "{{ containerd_sandbox_image }}"
[plugins."io.containerd.grpc.v1.cri".containerd]
snapshotter = "overlayfs"
default_runtime_name = "runc"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["https://registry-1.docker.io"]
notify: Restart containerd
- name: Enable & start containerd
ansible.builtin.systemd:
name: containerd
enabled: true
state: started
# 配置 crictl 默认连接 containerd socket
- name: Configure crictl
ansible.builtin.copy:
dest: /etc/crictl.yaml
mode: "0644"
content: |
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
# -------------------------
# k8s keyring(可选)与 k8s 组件安装
# -------------------------
- name: Ensure /etc/apt/keyrings exists
ansible.builtin.file:
path: /etc/apt/keyrings
state: directory
mode: "0755"
- name: Copy kubernetes apt keyring if exists on controller
ansible.builtin.copy:
src: "{{ offline_k8s_keyring_src }}"
dest: "{{ offline_k8s_keyring_dest }}"
mode: "0644"
when: hostvars['localhost'].controller_has_k8s_keyring | default(false)
- name: Install kubeadm/kubelet/kubectl and deps
ansible.builtin.apt:
name: "{{ offline_k8s_packages }}"
state: present
update_cache: false
# 锁定版本,避免被 apt upgrade 意外升级
- name: Hold kubeadm/kubelet/kubectl
ansible.builtin.command: "apt-mark hold kubeadm kubelet kubectl"
changed_when: false
failed_when: false
- name: Enable kubelet
ansible.builtin.systemd:
name: kubelet
enabled: true
state: started
# -------------------------
# VIP LB:haproxy + keepalived(仅两台 master)
# -------------------------
- name: Install haproxy and keepalived on masters
ansible.builtin.apt:
name:
- haproxy
- keepalived
state: present
update_cache: false
when: is_lb_master
# haproxy 将 VIP:16443 转发到 两台 master 的 6443
- name: Write haproxy config for apiserver VIP
ansible.builtin.copy:
dest: /etc/haproxy/haproxy.cfg
mode: "0644"
content: |
global
log /dev/log local0
log /dev/log local1 notice
daemon
maxconn 20000
defaults
log global
mode tcp
option tcplog
timeout connect 5s
timeout client 1m
timeout server 1m
frontend kube-apiserver
bind *:{{ apiserver_vip_port }}
default_backend kube-apiserver
backend kube-apiserver
option tcp-check
balance roundrobin
server {{ lb_masters[0] }} {{ hostvars[lb_masters[0]].ansible_default_ipv4.address }}:{{ apiserver_bind_port }} check
server {{ lb_masters[1] }} {{ hostvars[lb_masters[1]].ansible_default_ipv4.address }}:{{ apiserver_bind_port }} check
when: is_lb_master
notify: Restart haproxy
# 修复点:只在 master 写 keepalived 脚本,并确保目录存在
- name: Ensure /etc/keepalived exists (masters only)
ansible.builtin.file:
path: /etc/keepalived
state: directory
mode: "0755"
when: is_lb_master
# keepalived 健康检查脚本:haproxy 进程存在即认为健康
- name: Write keepalived health check script (masters only)
ansible.builtin.copy:
dest: /etc/keepalived/check_haproxy.sh
mode: "0755"
content: |
#!/usr/bin/env bash
pgrep haproxy >/dev/null 2>&1
when: is_lb_master
# keepalived VRRP:两台都 BACKUP,用 priority 决定谁持有 VIP
- name: Write keepalived config
ansible.builtin.copy:
dest: /etc/keepalived/keepalived.conf
mode: "0644"
content: |
global_defs {
router_id {{ inventory_hostname }}
}
vrrp_script chk_haproxy {
script "/etc/keepalived/check_haproxy.sh"
interval 2
fall 2
rise 2
}
vrrp_instance VI_1 {
state BACKUP
interface {{ apiserver_vip_iface }}
virtual_router_id {{ keepalived_virtual_router_id }}
priority {{ 150 if inventory_hostname == lb_masters[0] else 100 }}
advert_int 1
authentication {
auth_type PASS
auth_pass {{ keepalived_auth_pass }}
}
virtual_ipaddress {
{{ apiserver_vip }}/24
}
track_script {
chk_haproxy
}
}
when: is_lb_master
notify: Restart keepalived
- name: Enable & start haproxy/keepalived
ansible.builtin.systemd:
name: "{{ item }}"
enabled: true
state: started
loop:
- haproxy
- keepalived
when: is_lb_master
# 确认 haproxy 已经监听 VIP 端口(本地 127.0.0.1:16443)
- name: Wait haproxy port listening on masters
ansible.builtin.wait_for:
host: "127.0.0.1"
port: "{{ apiserver_vip_port }}"
timeout: 30
when: is_lb_master
# -------------------------
# kubeadm init(仅 init master)
# -------------------------
- name: Check if cluster already initialized
ansible.builtin.stat:
path: /etc/kubernetes/admin.conf
register: adminconf_stat
when: is_init_master
# 修复点:apiVersion 使用 v1beta3(与你的 kubeadm 版本匹配)
- name: Write kubeadm config
ansible.builtin.copy:
dest: /root/kubeadm.yaml
mode: "0644"
content: |
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: "{{ kubeadm_kubernetes_version }}"
imageRepository: "{{ kubeadm_image_repository }}"
controlPlaneEndpoint: "{{ apiserver_vip }}:{{ apiserver_vip_port }}"
networking:
podSubnet: "{{ pod_subnet }}"
serviceSubnet: "{{ service_subnet }}"
dnsDomain: "{{ cluster_domain }}"
apiServer:
certSANs:
- "{{ apiserver_vip }}"
- "{{ hostvars[lb_masters[0]].ansible_default_ipv4.address }}"
- "{{ hostvars[lb_masters[1]].ansible_default_ipv4.address }}"
- "{{ lb_masters[0] }}"
- "{{ lb_masters[1] }}"
- "localhost"
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: "{{ ansible_default_ipv4.address }}"
bindPort: {{ apiserver_bind_port }}
nodeRegistration:
criSocket: "{{ cri_socket }}"
kubeletExtraArgs:
node-ip: "{{ ansible_default_ipv4.address }}"
when: is_init_master and not adminconf_stat.stat.exists
- name: Run kubeadm init
ansible.builtin.command:
argv:
- kubeadm
- init
- "--config=/root/kubeadm.yaml"
- "--upload-certs"
- "--ignore-preflight-errors=SystemVerification"
- "--ignore-preflight-errors=Swap"
register: kubeadm_init_out
when: is_init_master and not adminconf_stat.stat.exists
failed_when: kubeadm_init_out.rc != 0
- name: Re-check admin.conf after kubeadm init
ansible.builtin.stat:
path: /etc/kubernetes/admin.conf
register: adminconf_stat_after
when: is_init_master
- name: Ensure /root/.kube exists on init master
ansible.builtin.file:
path: /root/.kube
state: directory
mode: "0700"
when: is_init_master
# 让 init master 上 root 可直接 kubectl
- name: Copy admin.conf to /root/.kube/config on init master
ansible.builtin.copy:
remote_src: true
src: /etc/kubernetes/admin.conf
dest: /root/.kube/config
mode: "0600"
when: is_init_master and (adminconf_stat_after.stat.exists | default(false))
# 生成 worker join 命令
- name: Generate worker join command (init master)
ansible.builtin.command:
argv:
- kubeadm
- token
- create
- "--print-join-command"
register: join_worker_cmd_raw
when: is_init_master and (adminconf_stat_after.stat.exists | default(false))
# 获取 control-plane join 需要的 certificate-key
- name: Upload-certs and get certificate key (init master)
ansible.builtin.command:
argv:
- kubeadm
- init
- phase
- upload-certs
- "--upload-certs"
register: upload_certs_out
when: is_init_master and (adminconf_stat_after.stat.exists | default(false))
- name: Extract certificate key
ansible.builtin.set_fact:
cert_key: "{{ (upload_certs_out.stdout_lines | select('match','^[0-9a-f]{64}$') | list | first) | default('') }}"
when: is_init_master and (adminconf_stat_after.stat.exists | default(false))
# 拼出控制面 join 命令:在 worker join 命令基础上增加 --control-plane 与 --certificate-key
- name: Build control-plane join command (init master)
ansible.builtin.set_fact:
join_cp_cmd: "{{ join_worker_cmd_raw.stdout | trim }} --control-plane --certificate-key {{ cert_key }}"
when: is_init_master and (adminconf_stat_after.stat.exists | default(false))
# 把 join 命令保存成脚本文件(便于人工排障/复用)
- name: Save join commands to files (init master)
ansible.builtin.copy:
dest: "{{ item.path }}"
mode: "0700"
content: |
#!/usr/bin/env bash
set -e
{{ item.cmd }}
loop:
- { path: "/root/join-worker.sh", cmd: "{{ join_worker_cmd_raw.stdout | trim }}" }
- { path: "/root/join-controlplane.sh", cmd: "{{ join_cp_cmd | trim }}" }
when: is_init_master and (adminconf_stat_after.stat.exists | default(false))
# 关键:把 join 命令存成 localhost 的 delegate_facts,方便其它节点通过 hostvars['localhost'] 读取
- name: Set join commands as global facts on localhost
ansible.builtin.set_fact:
global_join_worker: "{{ join_worker_cmd_raw.stdout | trim }}"
global_join_cp: "{{ join_cp_cmd | trim }}"
delegate_to: localhost
delegate_facts: true
run_once: true
when: is_init_master and (adminconf_stat_after.stat.exists | default(false))
# -------------------------
# join(其余 master / worker)
# -------------------------
- name: Check if node already joined
ansible.builtin.stat:
path: /etc/kubernetes/kubelet.conf
register: kubeletconf_stat
# 第二台 master 加入 control-plane(仅 lb master,且不是 init master)
- name: Join second master as control-plane
ansible.builtin.command: "{{ hostvars['localhost'].global_join_cp }}"
when:
- is_lb_master
- not is_init_master
- not kubeletconf_stat.stat.exists
- hostvars['localhost'].global_join_cp is defined
- (hostvars['localhost'].global_join_cp | length) > 0
# worker 加入集群(非 lb master 视为 worker)
- name: Join workers
ansible.builtin.command: "{{ hostvars['localhost'].global_join_worker }}"
when:
- (not is_lb_master)
- not kubeletconf_stat.stat.exists
- hostvars['localhost'].global_join_worker is defined
- (hostvars['localhost'].global_join_worker | length) > 0
handlers:
# containerd 配置变更后重启
- name: Restart containerd
ansible.builtin.systemd:
name: containerd
state: restarted
# haproxy 配置变更后重启
- name: Restart haproxy
ansible.builtin.systemd:
name: haproxy
state: restarted
# keepalived 配置变更后重启
- name: Restart keepalived
ansible.builtin.systemd:
name: keepalived
state: restarted
# sshd 配置变更后重启
- name: Restart ssh
ansible.builtin.systemd:
name: ssh
state: restarted
评论 (0)