首页
导航
统计
留言
更多
壁纸
直播
关于
推荐
星的魔法
星的导航页
谷歌一下
镜像国内下载站
大模型国内下载站
docker镜像国内下载站
腾讯视频
Search
1
Ubuntu安装 kubeadm 部署k8s 1.30
308 阅读
2
kubeadm 部署k8s 1.30
202 阅读
3
rockylinux 9.3详细安装drbd
186 阅读
4
rockylinux 9.3详细安装drbd+keepalived
146 阅读
5
k8s 高可用部署+升级
133 阅读
默认分类
日记
linux
docker
k8s
ELK
Jenkins
Grafana
Harbor
Prometheus
Cepf
k8s安装
Gitlab
traefik
sonarqube
OpenTelemetry
MinIOn
Containerd进阶使用
ArgoCD
nexus
golang
Git
Python
Web开发
HTML和CSS
JavaScript
对象模型
公司
zabbix
zookeeper
hadoop
登录
/
注册
Search
标签搜索
k8s
linux
docker
drbd+keepalivde
ansible
dcoker
webhook
星
累计撰写
143
篇文章
累计收到
1,006
条评论
首页
栏目
默认分类
日记
linux
docker
k8s
ELK
Jenkins
Grafana
Harbor
Prometheus
Cepf
k8s安装
Gitlab
traefik
sonarqube
OpenTelemetry
MinIOn
Containerd进阶使用
ArgoCD
nexus
golang
Git
Python
Web开发
HTML和CSS
JavaScript
对象模型
公司
zabbix
zookeeper
hadoop
页面
导航
统计
留言
壁纸
直播
关于
推荐
星的魔法
星的导航页
谷歌一下
镜像国内下载站
大模型国内下载站
docker镜像国内下载站
腾讯视频
搜索到
141
篇与
的结果
2023-09-06
altertmanager邮件报警+对接钉钉
一、配置altertmanager二进制包下载地址:https://github.com/prometheus/alertmanager/releases/ 官方文档: https://prometheus.io/docs/alerting/configuration/ 一个报警信息在生命周期内有下面 3 种状态`pending`: 当某个监控指标触发了告警表达式的条件,但还没有持续足够长的时间,即没有超过 `for` 阈值设定的时间,这个告警状态被标记为 `pending` `firing`: 当某个监控指标触发了告警条件并且持续超过了设定的 `for` 时间,告警将由pending状态改成 `firing`。 Prometheus 在 firing 状态下将告警信息发送至 Alertmanager。 Alertmanager 应用路由规则,将通知发送给配置的接收器,例如邮件。 `inactive`: 当某个监控指标不再满足告警条件或者告警从未被触发时,这个告警状态被标记为 `inactive`3种状态转换流程 初始状态:`inactive` - 内存使用率正常,告警处于 `inactive` 状态。 expr设置的条件首次满足: - 内存使用率首次超过 20%,告警状态变为 `pending`。 2分钟内情况: - 如果内存使用率在超过 20% 的状态下持续了2分钟或以上,告警状态从 `pending` 变为 `firing`。 - 如果内存使用率在2分钟内恢复正常,状态从 `pending` 变回 `inactive`。解压以后直接kubectl apply -f alertmanager.yml[root@master01 ddd]# tar xf alertmanager-0.27.0.linux-amd64.tar.gz [root@master01 ddd]# ls alertmanager-0.27.0.linux-amd64 alertmanager-0.27.0.linux-amd64.tar.gz [root@master01 ddd]# cd alertmanager-0.27.0.linux-amd64/ [root@master01 alertmanager-0.27.0.linux-amd64]# ls alertmanager alertmanager.yml amtool LICENSE NOTICE [root@master01 alertmanager-0.27.0.linux-amd64]# vi alertmanager [root@master01 alertmanager-0.27.0.linux-amd64]# vi alertmanager.yml 配置文件[root@master01 test]# cat altertmanager.yaml # alertmanager-config.yaml apiVersion: v1 kind: ConfigMap metadata: name: alert-config namespace: monitor data: template_email.tmpl: |- {{ define "email.html" }} {{- if gt (len .Alerts.Firing) 0 -}} @报警<br> {{- range .Alerts }} <strong>实例:</strong> {{ .Labels.instance }}<br> <strong>概述:</strong> {{ .Annotations.summary }}<br> <strong>详情:</strong> {{ .Annotations.description }}<br> <strong>时间:</strong> {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br> {{- end -}} {{- end }} {{- if gt (len .Alerts.Resolved) 0 -}} @恢复<br> {{- range .Alerts }} <strong>实例:</strong> {{ .Labels.instance }}<br> <strong>信息:</strong> {{ .Annotations.summary }}<br> <strong>恢复:</strong> {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br> {{- end -}} {{- end }} {{ end }} config.yml: |- templates: # 1、增加 templates 配置,指定模板文件 - '/etc/alertmanager/template_email.tmpl' inhibit_rules: - source_match: # prometheus配置文件中的报警规则1产生的所有报警信息都带着下面2个标签,第一个标签是promethus自动添加,第二个使我们自己加的 alertname: NodeMemoryUsage severity: critical target_match: severity: normal # prometheus配置文件中的报警规则2产生的所有报警信息都带着该标签 equal: - instance # instance是每条报警规则自带的标签,值为对应的节点名 # 一、全局配置 global: # (1)当alertmanager持续多长时间未接收到告警后标记告警状态为 resolved(解决了) resolve_timeout: 5m # (2)配置发邮件的邮箱 smtp_smarthost: 'smtp.163.com:25' smtp_from: '15555519627@163.com' smtp_auth_username: '15555519627@163.com' smtp_auth_password: 'PZJWYQLDCKQGTTKZ' # 填入你开启pop3时获得的码 smtp_hello: '163.com' smtp_require_tls: false # 二、设置报警的路由分发策略 route: # 定义用于告警分组的标签。当有多个告警消息有相同的 alertname 和 cluster 标签时,这些告警消息将会被聚合到同一个分组中 # 例如,接收到的报警信息里面有许多具有 cluster=XXX 和 alertname=YYY 这样的标签的报警信息将会批量被聚合到一个分组里面 group_by: ['alertname', 'cluster'] # 当一个新的报警分组被创建后,需要等待至少 group_wait 时间来初始化通知, # 这种方式可以确保您能有足够的时间为同一分组来获取/累积多个警报,然后一起触发这个报警信息。 group_wait: 30s # 短期聚合: group_interval 确保在短时间内,同一分组的多个告警将会合并/聚合到一起等待被发送,避免过于频繁的告警通知。 group_interval: 30s # 长期提醒: repeat_interval确保长时间未解决的告警不会被遗忘,Alertmanager每隔一段时间定期提醒相关人员,直到告警被解决。 repeat_interval: 120s # 实验环境想快速看下效果,可以缩小该时间,比如设置为120s # 上述两个参数的综合解释: #(1)当一个新的告警被触发时,会立即发送初次通知 #(2)然后开始一个 group_interval 窗口(例如 30 秒)。 # 在 group_interval 窗口内,任何新的同分组告警会被聚合到一起,但不会立即触发发送。 #(3)聚合窗口结束后, # 如果刚好抵达 repeat_interval 的时间点,聚合的告警会和原有未解决的告警一起发送通知。 # 如果没有抵达 repeat_interval 的时间点,则原有未解决的报警不会重复发送,直到到达下一个 repeat_interval 时间点。 # 这两个参数一起工作,确保短时间内的警报状态变化不会造成过多的重复通知,同时在长期未解决的情况下提供定期的提醒。 # 默认的receiver:如果一个报警没有被一个route匹配,则发送给默认的接收器,与下面receivers中定义的name呼应 receiver: default routes: # 子路由规则。子路由继承父路由的所有属性,可以进行覆盖和更具体的规则匹配。 - receiver: email # 匹配此子路由的告警将发送到的接收器,该名字也与下面的receivers中定义的name呼应 group_wait: 10s # 等待时间,可覆盖父路由的 group_by: ['instance'] # 根据instance做分组 match: # 告警标签匹配条件,只有匹配到特定条件的告警才会应用该子路由规则。 team: node # 只有拥有 team=node 标签的告警才会路由到 email 接收器。 continue: true #不设置这个只能匹配一条 - receiver: mywebhook # 匹配此子路由的告警将发送到的接收器,该名字也与下面的receivers中定义的name呼应 group_wait: 10s # 等待时间,可覆盖父路由的 group_by: ['instance'] # 根据instance做分组 match: # 告警标签匹配条件,只有匹配到特定条件的告警才会应用该子路由规则。 team: node # 只有拥有 team=node 标签的告警才会路由到 email 接收器。 # 三、定义接收器,与上面的路由定义中引用的介receiver相呼应 receivers: - name: 'default' # 默认接收器配置,未匹配任何特定路由规则的告警会发送到此接收器。 email_configs: - to: '7902731@qq.com@qq.com' send_resolved: true # : 当告警恢复时是否也发送通知。 - name: 'email' # 名为 email 的接收器配置,与之前定义的子路由相对应。 email_configs: - to: '15555519627@163.com' send_resolved: true html: '{{ template "email.html" . }}' #这个是对接webhook钉钉的 - name: 'mywebhook' # 默认接收器配置,未匹配任何特定路由规则的告警会发送到此接收器。 webhook_configs: - url: 'http://promoter:8080/dingtalk/webhook1/send' send_resolved: true # : 当告警恢复时是否也发送通知。 --- # alertmanager-deploy.yaml apiVersion: apps/v1 kind: Deployment metadata: name: alertmanager namespace: monitor labels: app: alertmanager spec: selector: matchLabels: app: alertmanager template: metadata: labels: app: alertmanager spec: volumes: - name: alertcfg configMap: name: alert-config containers: - name: alertmanager # 版本去查看官网https://github.com/prometheus/alertmanager/releases/ # 1、官网镜像地址,需要你为containerd配置好镜像加速 #image: prom/alertmanager:v0.27.0 # 2、搞成了国内的地址 image: registry.cn-hangzhou.aliyuncs.com/egon-k8s-test/alertmanager:v0.27.0 imagePullPolicy: IfNotPresent args: - '--config.file=/etc/alertmanager/config.yml' ports: - containerPort: 9093 name: http volumeMounts: - mountPath: '/etc/alertmanager' name: alertcfg resources: requests: cpu: 100m memory: 256Mi limits: cpu: 100m memory: 256Mi --- # alertmanager-svc.yaml apiVersion: v1 kind: Service metadata: name: alertmanager namespace: monitor labels: app: alertmanager spec: selector: app: alertmanager type: NodePort ports: - name: web port: 9093 targetPort: http [root@master01 test]# kubectl -n monitor get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager NodePort 10.99.103.160 <none> 9093:30610/TCP 107m grafana NodePort 10.99.18.224 <none> 3000:30484/TCP 28h prometheus NodePort 10.108.206.132 <none> 9090:31119/TCP 3d1h promoter ClusterIP 10.97.213.227 <none> 8080/TCP 18h redis ClusterIP 10.97.184.21 <none> 6379/TCP,9121/TCP 2d22h [root@master01 test]# kubectl -n monitor get pods NAME READY STATUS RESTARTS AGE alertmanager-56b46ff6b4-mvbb8 1/1 Running 0 125m grafana-86cfcd87fb-59gtb 1/1 Running 1 (3h25m ago) 28h node-exporter-6f4d4 1/1 Running 4 (3h25m ago) 2d21h node-exporter-swr5j 1/1 Running 4 (3h25m ago) 2d21h node-exporter-tf84v 1/1 Running 4 (3h25m ago) 2d21h node-exporter-z9svr 1/1 Running 4 (3h25m ago) 2d21h prometheus-7f8f87f55d-zbnsr 1/1 Running 1 (3h25m ago) 21h promoter-6f68cff456-wqmg9 1/1 Running 1 (3h25m ago) 18h redis-84bbc5df9b-rnm6q 2/2 Running 8 (3h25m ago) 2d22h 基于webhook对接钉钉报警 prometheus(报警规则)----》alertmanager组件-----------------------------》邮箱 prometheus(报警规则)----》alertmanager组件------钉钉的webhook软件------》钉钉{lamp/}二、配置钉钉1.下载钉钉 2.添加群聊(至少2个人才可以拉群) 3.在群里添加机器人得道AIP接口和密钥测试是否可以正常使用#python 3.8 import time import sys import hmac import hashlib import base64 import urllib.parse import requests timestamp = str(round(time.time() * 1000)) secret = 'SEC45045323ac8b379b88e04750c7954645edc54c4ffdedd717b82804c8684c0706' secret_enc = secret.encode('utf-8') string_to_sign = '{}\n{}'.format(timestamp, secret) string_to_sign_enc = string_to_sign.encode('utf-8') hmac_code = hmac.new(secret_enc, string_to_sign_enc, digestmod=hashlib.sha256).digest() sign = urllib.parse.quote_plus(base64.b64encode(hmac_code)) print(timestamp) print(sign) MESSAGE = sys.argv[1] webhook_url =f'https://oapi.dingtalk.com/robot/send?access_token=13ddb964c0108de8b56eb944c5e407d448cb2db02e3885c45585f8eb06779def×tamp={timestamp}&sign={sign}' response = requests.post(webhook_url,headers={'Content-Type': 'application/json'},json={"msgtype": "text","text": {"content":f"'{MESSAGE}'"}}) print(response.text) print(response.status_code)pip3 install requests -i https://mirrors.aliyun.com/pypi/simple/ python3 webhook_test.py 测试部署钉钉的webhook软件wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v2.1.0/prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz #解压出来里面的config.yml配置 cat > /usr/local/prometheus-webhook-dingtalk/config.yml << "EOF" templates: - /etc/template.tmpl targets: webhook1: url: https://oapi.dingtalk.com/robot/send?access_token=3acdac2167b83e0b54f751c0cfcbb676b7828af183aca2e21428c489883ced8b # secret for signature secret: SEC67f8b6d15997deaf686ab0509b2dad943aca99d700131f88d010ef57e591aea0 message: # 哪个target需要引用模版,就增加这一小段配置,其中default.tmpl就是你一会要定义的模版 text: '{{ template "default.tmpl" . }}' # 可以添加其他的对接,主要用于对接到不同的群中的机器人 webhook_mention_all: url: https://oapi.dingtalk.com/robot/send?access_token=3acdac2167b83e0b54f751c0cfcbb676b7828af183aca2e21428c489883ced8b secret: SEC67f8b6d15997deaf686ab0509b2dad943aca99d700131f88d010ef57e591aea0 mention: all: true webhook_mention_users: url: https://oapi.dingtalk.com/robot/send?access_token=3acdac2167b83e0b54f751c0cfcbb676b7828af183aca2e21428c489883ced8b secret: SEC67f8b6d15997deaf686ab0509b2dad943aca99d700131f88d010ef57e591aea0 mention: mobiles: ['18611453110'] EOF可以做成系统服务cat > /lib/systemd/system/dingtalk.service << 'EOF' [Unit] Description=dingtalk Documentation=https://github.com/timonwong/prometheus-webhook-dingtalk/ After=network.target [Service] Restart=on-failure WorkingDirectory=/usr/local/prometheus-webhook-dingtalk ExecStart=/usr/local/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk --web.listen-address=0.0.0.0:8060 --config.file=/usr/local/prometheus-webhook-dingtalk/config.yml User=nobody [Install] WantedBy=multi-user.target EOF systemctl daemon-reload systemctl restart dingtalk systemctl status dingtalk配置alertmanager对接钉钉webhook[root@master01 test]# cat altertmanager.yaml # alertmanager-config.yaml apiVersion: v1 kind: ConfigMap metadata: name: alert-config namespace: monitor data: template_email.tmpl: |- {{ define "email.html" }} {{- if gt (len .Alerts.Firing) 0 -}} @报警<br> {{- range .Alerts }} <strong>实例:</strong> {{ .Labels.instance }}<br> <strong>概述:</strong> {{ .Annotations.summary }}<br> <strong>详情:</strong> {{ .Annotations.description }}<br> <strong>时间:</strong> {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br> {{- end -}} {{- end }} {{- if gt (len .Alerts.Resolved) 0 -}} @恢复<br> {{- range .Alerts }} <strong>实例:</strong> {{ .Labels.instance }}<br> <strong>信息:</strong> {{ .Annotations.summary }}<br> <strong>恢复:</strong> {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br> {{- end -}} {{- end }} {{ end }} config.yml: |- templates: # 1、增加 templates 配置,指定模板文件 - '/etc/alertmanager/template_email.tmpl' inhibit_rules: - source_match: # prometheus配置文件中的报警规则1产生的所有报警信息都带着下面2个标签,第一个标签是promethus自动添加,第二个使我们自己加的 alertname: NodeMemoryUsage severity: critical target_match: severity: normal # prometheus配置文件中的报警规则2产生的所有报警信息都带着该标签 equal: - instance # instance是每条报警规则自带的标签,值为对应的节点名 # 一、全局配置 global: # (1)当alertmanager持续多长时间未接收到告警后标记告警状态为 resolved(解决了) resolve_timeout: 5m # (2)配置发邮件的邮箱 smtp_smarthost: 'smtp.163.com:25' smtp_from: '15555519627@163.com' smtp_auth_username: '15555519627@163.com' smtp_auth_password: 'PZJWYQLDCKQGTTKZ' # 填入你开启pop3时获得的码 smtp_hello: '163.com' smtp_require_tls: false # 二、设置报警的路由分发策略 route: # 定义用于告警分组的标签。当有多个告警消息有相同的 alertname 和 cluster 标签时,这些告警消息将会被聚合到同一个分组中 # 例如,接收到的报警信息里面有许多具有 cluster=XXX 和 alertname=YYY 这样的标签的报警信息将会批量被聚合到一个分组里面 group_by: ['alertname', 'cluster'] # 当一个新的报警分组被创建后,需要等待至少 group_wait 时间来初始化通知, # 这种方式可以确保您能有足够的时间为同一分组来获取/累积多个警报,然后一起触发这个报警信息。 group_wait: 30s # 短期聚合: group_interval 确保在短时间内,同一分组的多个告警将会合并/聚合到一起等待被发送,避免过于频繁的告警通知。 group_interval: 30s # 长期提醒: repeat_interval确保长时间未解决的告警不会被遗忘,Alertmanager每隔一段时间定期提醒相关人员,直到告警被解决。 repeat_interval: 120s # 实验环境想快速看下效果,可以缩小该时间,比如设置为120s # 上述两个参数的综合解释: #(1)当一个新的告警被触发时,会立即发送初次通知 #(2)然后开始一个 group_interval 窗口(例如 30 秒)。 # 在 group_interval 窗口内,任何新的同分组告警会被聚合到一起,但不会立即触发发送。 #(3)聚合窗口结束后, # 如果刚好抵达 repeat_interval 的时间点,聚合的告警会和原有未解决的告警一起发送通知。 # 如果没有抵达 repeat_interval 的时间点,则原有未解决的报警不会重复发送,直到到达下一个 repeat_interval 时间点。 # 这两个参数一起工作,确保短时间内的警报状态变化不会造成过多的重复通知,同时在长期未解决的情况下提供定期的提醒。 # 默认的receiver:如果一个报警没有被一个route匹配,则发送给默认的接收器,与下面receivers中定义的name呼应 receiver: default routes: # 子路由规则。子路由继承父路由的所有属性,可以进行覆盖和更具体的规则匹配。 - receiver: email # 匹配此子路由的告警将发送到的接收器,该名字也与下面的receivers中定义的name呼应 group_wait: 10s # 等待时间,可覆盖父路由的 group_by: ['instance'] # 根据instance做分组 match: # 告警标签匹配条件,只有匹配到特定条件的告警才会应用该子路由规则。 team: node # 只有拥有 team=node 标签的告警才会路由到 email 接收器。 continue: true #不设置这个只能匹配一条 - receiver: mywebhook # 匹配此子路由的告警将发送到的接收器,该名字也与下面的receivers中定义的name呼应 group_wait: 10s # 等待时间,可覆盖父路由的 group_by: ['instance'] # 根据instance做分组 match: # 告警标签匹配条件,只有匹配到特定条件的告警才会应用该子路由规则。 team: node # 只有拥有 team=node 标签的告警才会路由到 email 接收器。 # 三、定义接收器,与上面的路由定义中引用的介receiver相呼应 receivers: - name: 'default' # 默认接收器配置,未匹配任何特定路由规则的告警会发送到此接收器。 email_configs: - to: '7902731@qq.com@qq.com' send_resolved: true # : 当告警恢复时是否也发送通知。 - name: 'email' # 名为 email 的接收器配置,与之前定义的子路由相对应。 email_configs: - to: '15555519627@163.com' send_resolved: true html: '{{ template "email.html" . }}' #这个是对接webhook钉钉的 - name: 'mywebhook' # 默认接收器配置,未匹配任何特定路由规则的告警会发送到此接收器。 webhook_configs: - url: 'http://promoter:8080/dingtalk/webhook1/send' send_resolved: true # : 当告警恢复时是否也发送通知。 --- # alertmanager-deploy.yaml apiVersion: apps/v1 kind: Deployment metadata: name: alertmanager namespace: monitor labels: app: alertmanager spec: selector: matchLabels: app: alertmanager template: metadata: labels: app: alertmanager spec: volumes: - name: alertcfg configMap: name: alert-config containers: - name: alertmanager # 版本去查看官网https://github.com/prometheus/alertmanager/releases/ # 1、官网镜像地址,需要你为containerd配置好镜像加速 #image: prom/alertmanager:v0.27.0 # 2、搞成了国内的地址 image: registry.cn-hangzhou.aliyuncs.com/egon-k8s-test/alertmanager:v0.27.0 imagePullPolicy: IfNotPresent args: - '--config.file=/etc/alertmanager/config.yml' ports: - containerPort: 9093 name: http volumeMounts: - mountPath: '/etc/alertmanager' name: alertcfg resources: requests: cpu: 100m memory: 256Mi limits: cpu: 100m memory: 256Mi --- # alertmanager-svc.yaml apiVersion: v1 kind: Service metadata: name: alertmanager namespace: monitor labels: app: alertmanager spec: selector: app: alertmanager type: NodePort ports: - name: web port: 9093 targetPort: http 补充:报警图片 https://egonimages.oss-cn-beijing.aliyuncs.com/gaojing1.jpg https://egonimages.oss-cn-beijing.aliyuncs.com/gaojing2.jpg https://egonimages.oss-cn-beijing.aliyuncs.com/gaojing3.png https://egonimages.oss-cn-beijing.aliyuncs.com/gaojing4.jpg https://egonimages.oss-cn-beijing.aliyuncs.com/gaojing5.jpg https://egonimages.oss-cn-beijing.aliyuncs.com/gaojing6.png 定制内容(略) 自行研究吧:https://github.com/timonwong/prometheus-webhook-dingtalk/blob/main/template/default.tmpl
2023年09月06日
91 阅读
0 评论
0 点赞
2023-09-06
grafana
grafana使用要先做nfs挂载卷[root@master01 test]# kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE grafana-pv 2Gi RWO Retain Bound monitor/grafana-pvc nfs-client <unset> 27h[root@master01 test]# cat grafana.yaml # grafana.yaml # 为grafana创建持久存储,用于存放插件等数据,挂载到容器的/var/lib/grafana下 apiVersion: v1 kind: PersistentVolumeClaim metadata: name: grafana-pvc namespace: monitor labels: app: grafana spec: storageClassName: nfs-client accessModes: - ReadWriteOnce resources: requests: storage: 2Gi --- apiVersion: apps/v1 kind: Deployment metadata: name: grafana namespace: monitor spec: selector: matchLabels: app: grafana template: metadata: labels: app: grafana spec: volumes: - name: storage persistentVolumeClaim: claimName: grafana-pvc securityContext: runAsUser: 0 # 必须以root身份运行 containers: - name: grafana image: grafana/grafana # 默认lastest最新,也可以指定版本grafana/grafana:10.4.4 imagePullPolicy: IfNotPresent ports: - containerPort: 3000 name: grafana env: # 配置 grafana 的管理员用户和密码的, - name: GF_SECURITY_ADMIN_USER value: admin - name: GF_SECURITY_ADMIN_PASSWORD value: admin321 readinessProbe: failureThreshold: 10 httpGet: path: /api/health port: 3000 scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 30 livenessProbe: failureThreshold: 3 httpGet: path: /api/health port: 3000 scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 resources: limits: cpu: 150m memory: 512Mi requests: cpu: 150m memory: 512Mi volumeMounts: - mountPath: /var/lib/grafana name: storage --- apiVersion: v1 kind: Service metadata: name: grafana namespace: monitor spec: type: NodePort ports: - port: 3000 selector: app: grafana [root@master01 /]# kubectl -n monitor get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager NodePort 10.99.103.160 <none> 9093:30610/TCP 94m grafana NodePort 10.99.18.224 <none> 3000:30484/TCP 28h prometheus NodePort 10.108.206.132 <none> 9090:31119/TCP 3d1h promoter ClusterIP 10.97.213.227 <none> 8080/TCP 18h redis ClusterIP 10.97.184.21 <none> 6379/TCP,9121/TCP 2d21h 使用grafana出图先使用浏览器访问<你的物理机IP地址>:(1)添加仪表图形(2)选择对接的监控(3)设置需要对接的IP+端口(其他不用修改)(4)添加图形模板(5)查看已经配置好的
2023年09月06日
19 阅读
0 评论
0 点赞
2023-09-06
监控k8s
抓取apiserver的监控指标 - job_name: 'kubernetes-apiservers' kubernetes_sd_configs: - role: endpoints scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_service_name,__meta_kubernetes_endpoint_port_name] # 把符合要求的保留下来 action: keep regex: default;kubernetes;https抓取kube-controller-manager的监控指标 # 1、创建svc,标签选择选中, apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/component: kube-controller-manager app.kubernetes.io/name: kube-controller-manager k8s-app: kube-controller-manager name: kube-controller-manager namespace: kube-system spec: clusterIP: None ports: - name: https-metrics port: 10257 targetPort: 10257 protocol: TCP selector: component: kube-controller-manager # 2、你可以查看下上述svc其关联的endpoint用的地址都是物理机的地址,而每台机器上的controller-manager都监听在127.0.0.1,所以是无法访问的,需要修改其默认监听 vi /etc/kubernetes/manifests/kube-controller-manager.yaml # 修改--bind-address=127.0.0.1 每台master节点都改 # 3、添加监控 - job_name: 'kube-controller-manager' kubernetes_sd_configs: - role: endpoints scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_service_name,__meta_kubernetes_endpoint_port_name] action: keep regex: kube-system;kube-controller-manager;https-metrics # 这里的https-metrics名字与你svc中为ports起的名字保持一致 抓取kube-scheduler的监控指标 同上 抓取etcd的监控指标 # 1、修改每个master节点上etcd的静态配置yaml - --listen-metrics-urls=http://127.0.0.1:2381 # 2、etcd-service.yaml apiVersion: v1 kind: Service metadata: namespace: kube-system name: etcd labels: k8s-app: etcd spec: selector: component: etcd type: ClusterIP clusterIP: None ports: - name: http port: 2381 targetPort: 2381 protocol: TCP # 3、etcd监控 - job_name: 'etcd' kubernetes_sd_configs: - role: endpoints scheme: http relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: kube-system;etcd;http 自动发现业务服务 - job_name: 'kubernetes-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: # 1、匹配元数据__meta_kubernetes_service_annotation_prometheus_io_scrape包含"true"的留下 - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true # 2、__meta_kubernetes_service_annotation_prometheus_io_scheme的值中包含0或1个https,则改名为__scheme__ - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) # 3、__meta_kubernetes_service_annotation_prometheus_io_path的值至少有一个任意字符,则改名为__metrics_path__ - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) # 4、提取ip与port拼接到一起格式为: 1.1.1.1:3333,然后赋值给新label名:__address__ - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) # RE2 正则规则,+是一次多多次,?是0次或1次,其中?:表示非匹>配组(意思就是不获取匹配结果) replacement: $1:$2 # 5、添加标签 - action: labelmap regex: __meta_kubernetes_service_label_(.+) # 6、名称空间标签替换为kubernetes_namespace - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace # 7、服务名替换为kubernetes_name - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name # 8、pod名替换为kubernetes_pod_name - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name 自动发现测试 # prome-redis.yaml apiVersion: apps/v1 kind: Deployment metadata: name: redis namespace: monitor spec: selector: matchLabels: app: redis template: metadata: labels: app: redis spec: containers: - name: redis image: redis:4 resources: requests: cpu: 100m memory: 100Mi ports: - containerPort: 6379 - name: redis-exporter image: oliver006/redis_exporter:latest resources: requests: cpu: 100m memory: 100Mi ports: - containerPort: 9121 --- kind: Service apiVersion: v1 metadata: name: redis namespace: monitor annotations: # --------------------------------》 添加 prometheus.io/scrape: 'true' prometheus.io/port: '9121' spec: selector: app: redis ports: - name: redis port: 6379 targetPort: 6379 - name: prom port: 9121 targetPort: 9121
2023年09月06日
24 阅读
0 评论
0 点赞
2023-09-03
prometheus部署
一、 二进制安装prometheus server 物理机直接部署下载最新版二进制包见:https://prometheus.io/download , 下载历史版本见:https://github.com/prometheus/prometheus/tags LTS: 2.53.x版 注意:最新版本的prometheus,可能会出现grafana上模板没有数据,不兼容的新规则的问题 ========================================》二进制安装prometheus server # 1、先做个软连接方便后续升级 ln -s /monitor/prometheus-2.53.0.linux-amd64 /monitor/prometheus mkdir /monitor/prometheus/data # 创建tsdb数据目录 # 2、添加系统服务 cat > /usr/lib/systemd/system/prometheus.service << 'EOF' [Unit] Description=prometheus server daemon [Service] Restart=on-failure ExecStart=/monitor/prometheus/prometheus --config.file=/monitor/prometheus/prometheus.yml --storage.tsdb.path=/monitor/prometheus/data --storage.tsdb.retention.time=30d --web.enable-lifecycle [Install] WantedBy=multi-user.target EOF # 3、启动 systemctl daemon-reload systemctl enable prometheus.service systemctl start prometheus.service systemctl status prometheus netstat -tunalp |grep 9090测试下载并构建测试程序,作为被监控者,对外暴漏了/metrics接口 yum install golang -y git clone https://github.com/prometheus/client_golang.git cd client_golang/examples/random export GO111MODULE=on export GOPROXY=https://goproxy.cn go build # 得到一个二进制命令random 然后在 3 个独立的终端里面运行 3 个服务: ./random -listen-address=:8080 # 对外暴漏http://localhost:8080/metrics ./random -listen-address=:8081 # 对外暴漏http://localhost:8081/metrics ./random -listen-address=:8082 # 对外暴漏http://localhost:8080/metrics 因为都对外暴漏了/metrics接口,并且数据格式遵循prometheus规范,所以我们可以在prometheus.yml中添加监控项假设8080与8081是生产实例,8082是金丝雀实例,那么我们放到不同的target里然后用标签区分,配置如下 scrape_configs: - job_name: 'example-random' # Override the global default and scrape targets from this job every 5 seconds. scrape_interval: 5s static_configs: - targets: ['192.168.110.101:8080', '192.168.110.101:8081'] labels: group: 'production' - targets: ['192.168.110.101:8082'] labels: group: 'canary'systemctl restart prometheus 然后查看页面http://192.168.110.101:9090/ 点击Status---》Targets,发现有新增的监控job二、 安装prometheus server到k8s#prometheus-cm.yaml apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: monitor data: prometheus.yml: | global: scrape_interval: 15s # Prometheus每隔15s就会从所有配置的目标端点抓取最新的数据 scrape_timeout: 15s # 某个抓取操作在 15 秒内未完成,会被视为超时,不会包含在最新的数据中。 evaluation_interval: 15s # # 每15s对告警规则进行计算 scrape_configs: - job_name: "prometheus" static_configs: - targets: ["localhost:9090"] prometheus-pv-pvc.yamlapiVersion: v1 kind: PersistentVolume metadata: name: prometheus-local labels: app: prometheus spec: accessModes: - ReadWriteOnce capacity: storage: 20Gi storageClassName: local-storage local: path: /data/k8s/prometheus nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - master01 persistentVolumeReclaimPolicy: Retain --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: prometheus-data namespace: monitor spec: selector: matchLabels: app: prometheus accessModes: - ReadWriteOnce resources: requests: storage: 20Gi storageClassName: local-storageprometheus-rbac.yamlapiVersion: v1 kind: ServiceAccount metadata: name: prometheus namespace: monitor --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: - '' resources: - nodes - services - endpoints - pods - nodes/proxy verbs: - get - list - watch - apiGroups: - 'extensions' resources: - ingresses verbs: - get - list - watch - apiGroups: - '' resources: - configmaps - nodes/metrics verbs: - get - nonResourceURLs: # 用来对非资源型 metrics 进行操作的权限声明 - /metrics verbs: - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole # 由于我们要获取的资源信息,在每一个 namespace 下面都有可能存在,所以我们这里使用的是 ClusterRole 的资源对象 name: prometheus subjects: - kind: ServiceAccount name: prometheus namespace: monitor prometheus-deploy.yamlapiVersion: apps/v1 kind: Deployment metadata: name: prometheus namespace: monitor labels: app: prometheus spec: selector: matchLabels: app: prometheus template: metadata: labels: app: prometheus spec: serviceAccountName: prometheus securityContext: # 确保这里的缩进使用的是空格 runAsUser: 0 containers: - image: registry.cn-guangzhou.aliyuncs.com/xingcangku/oooo:1.0 name: prometheus args: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' - '--storage.tsdb.retention.time=24h' - '--web.enable-admin-api' - '--web.enable-lifecycle' ports: - containerPort: 9090 name: http volumeMounts: - mountPath: '/etc/prometheus' name: config-volume - mountPath: '/prometheus' name: data resources: requests: cpu: 100m memory: 512Mi limits: cpu: 100m memory: 512Mi volumes: - name: data persistentVolumeClaim: claimName: prometheus-data - name: config-volume configMap: name: prometheus-config prometheus-svc.yamlapiVersion: v1 kind: Service metadata: name: prometheus namespace: monitor labels: app: prometheus spec: selector: app: prometheus type: NodePort ports: - name: web port: 9090 targetPort: 9090 #targetPort: http kubectl create namespace monitor kubectl apply -f cm.yaml mkdir /data/k8s/prometheus # 在pv所亲和的节点上创建 kubectl apply -f pv-pvc.yaml kubectl apply -f rbac.yaml kubectl apply -f deploy.yaml kubectl apply -f svc.yaml # 把二进制的停掉 systemctl stop prometheus systemctl disable prometheus # 添加监控项,然后apply -f [root@master01 monitor]# cat prometheus-cm.yaml apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: monitor data: prometheus.yml: | global: scrape_interval: 15s # Prometheus 每隔 15 秒从所有配置的目标端点抓取最新的数据 scrape_timeout: 15s # 某个抓取操作在 15 秒内未完成,会被视为超时,不会包含在最新的数据中。 evaluation_interval: 15s # 每 15 秒对告警规则进行计算 scrape_configs: - job_name: "prometheus" static_configs: - targets: ["localhost:9090"] - job_name: 'example-random' scrape_interval: 5s static_configs: - targets: ['192.168.110.101:8080', '192.168.110.101:8081'] labels: group: 'production' - targets: ['192.168.110.101:8082'] labels: group: 'canary' #重载服务 [root@master01 monitor]# kubectl -n monitor get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES prometheus-7b644bfcfc-l5twf 1/1 Running 0 5h25m 10.244.0.18 master01 <none> <none> [root@master01 monitor]# curl -X POST "http://10.108.206.132:9090/-/reload"三、监控应用软件(1)服务自带/metrics接口,直接监控,在配置中增加下述target,然后apply -f - job_name: "coredns" static_configs: - targets: ["kube-dns.kube-system.svc.cluster.local:9153"] 等一会后 curl -X POST "http://10.108.206.132:9090/-/reload"(2)应用软件没有自带自带/metrics接口,需要安装对应的exporterexporter官网地址:https://prometheus.io/docs/instrumenting/exporters/ 安装redis yum install redis -y sed -ri 's/bind 127.0.0.1/bind 0.0.0.0/g' /etc/redis.conf sed -ri 's/port 6379/port 16379/g' /etc/redis.conf cat >> /etc/redis.conf << "EOF" requirepass 123456 EOF systemctl restart redis systemctl status redis 添加redis_exporter来采集redis的监控信息 wget https://github.com/oliver006/redis_exporter/releases/download/v1.61.0/redis_exporter-v1.61.0.linux-amd64.tar.gz # 2、安装 tar xf redis_exporter-v1.61.0.linux-amd64.tar.gz mv redis_exporter-v1.61.0.linux-amd64/redis_exporter /usr/bin/ # 3、制作系统服务 cat > /usr/lib/systemd/system/redis_exporter.service << 'EOF' [Unit] Description=Redis Exporter Wants=network-online.target After=network-online.target [Service] User=root Group=root Type=simple ExecStart=/usr/bin/redis_exporter --redis.addr=redis://127.0.0.1:16379 --redis.password=123456 --web.listen-address=0.0.0.0:9122 --exclude-latency-histogram-metrics [Install] WantedBy=multi-user.target EOF #4、启动 systemctl daemon-reload systemctl restart redis_exporter systemctl status redis_exporter # 5、在cm中增加监控项目 - job_name: "redis-server" # 添加这一条 static_configs: - targets: ["192.168.71.101:9122"] kubectl apply -f prometheus-cm.yaml # 6、过一会后,执行prometheus server的reload curl -X POST "http://10.108.206.132:9090/-/reload" # 7、补充说明 如果你的redis-server跑在k8s中,那我们通常不会像上面一样裸部署一个redis_exporter,而是会以`sidecar` 的形式将redis_exporter和主应用redis_server部署在同一个 Pod 中,如下所示 # prome-redis.yaml apiVersion: apps/v1 kind: Deployment metadata: name: redis namespace: monitor spec: selector: matchLabels: app: redis template: metadata: labels: app: redis spec: containers: - name: redis image: redis:4 resources: requests: cpu: 100m memory: 100Mi ports: - containerPort: 6379 - name: redis-exporter image: oliver006/redis_exporter:latest resources: requests: cpu: 100m memory: 100Mi ports: - containerPort: 9121 --- apiVersion: v1 kind: Service metadata: name: redis namespace: monitor spec: selector: app: redis ports: - name: redis port: 6379 targetPort: 6379 - name: prom port: 9121 targetPort: 9121 # 然后你就可以用该svc的clusterip结合9121端口来访问/metrics接口 # curl 上面的svc的clusterip地址:9121/metrics # 添加监控项,直接用svc名字即可, 更新prometheus-cm.yaml 的配置文件如下 - job_name: 'redis' static_configs: - targets: ['redis:9121']
2023年09月03日
20 阅读
1 评论
0 点赞
2023-09-01
webhook开发 侧车容器
一、 先准备好webhook环境之前博客有写二、 准备业务镜像1. 下载镜像 docker pull centos:7 2.配置yum源 rm -f /etc/yum.repos.d/CentOS-Base.repo cat > /etc/yum.repos.d/CentOS-Base.repo <<EOF [base] name=CentOS-\$releasever - Base - mirrors.aliyun.com failovermethod=priority baseurl=http://mirrors.aliyun.com/centos/\$releasever/os/\$basearch/ http://mirrors.aliyuncs.com/centos/\$releasever/os/\$basearch/ http://mirrors.cloud.aliyuncs.com/centos/\$releasever/os/\$basearch/ gpgcheck=1 gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7 #released updates [updates] name=CentOS-\$releasever - Updates - mirrors.aliyun.com failovermethod=priority baseurl=http://mirrors.aliyun.com/centos/\$releasever/updates/\$basearch/ http://mirrors.aliyuncs.com/centos/\$releasever/updates/\$basearch/ http://mirrors.cloud.aliyuncs.com/centos/\$releasever/updates/\$basearch/ gpgcheck=1 gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7 #additional packages that may be useful [extras] name=CentOS-\$releasever - Extras - mirrors.aliyun.com failovermethod=priority baseurl=http://mirrors.aliyun.com/centos/\$releasever/extras/\$basearch/ http://mirrors.aliyuncs.com/centos/\$releasever/extras/\$basearch/ http://mirrors.cloud.aliyuncs.com/centos/\$releasever/extras/\$basearch/ gpgcheck=1 gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7 #additional packages that extend functionality of existing packages [centosplus] name=CentOS-\$releasever - Plus - mirrors.aliyun.com failovermethod=priority baseurl=http://mirrors.aliyun.com/centos/\$releasever/centosplus/\$basearch/ http://mirrors.aliyuncs.com/centos/\$releasever/centosplus/\$basearch/ http://mirrors.cloud.aliyuncs.com/centos/\$releasever/centosplus/\$basearch/ gpgcheck=1 enabled=0 gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7 #contrib - packages by Centos Users [contrib] name=CentOS-\$releasever - Contrib - mirrors.aliyun.com failovermethod=priority baseurl=http://mirrors.aliyun.com/centos/\$releasever/contrib/\$basearch/ http://mirrors.aliyuncs.com/centos/\$releasever/contrib/\$basearch/ http://mirrors.cloud.aliyuncs.com/centos/\$releasever/contrib/\$basearch/ gpgcheck=1 enabled=0 gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7 EOF 3.安装需要的应用 yum install -y cronie openssh-server ssh wget 4.安装mysql5.7 wget https://dev.mysql.com/get/mysql57-community-release-el7-11.noarch.rpm rpm -ivh mysql57-community-release-el7-11.noarch.rpm yum repolist enabled | grep "mysql.*-community.*" yum install mysql-community-server 5.创建用户mysql并且给权限 chown -R mysql:mysql /var/lib/mysql chmod -R 755 /var/lib/mysql 6.后台启动数据库 /usr/sbin/mysqld --user=mysql & 7.初始化数据库 /usr/sbin/mysqld --initialize 8.查看初始化的密码 grep 'temporary password' /var/log/mysqld.log mysql -u root -p 9.把初始化的密码输入上 ALTER USER 'root'@'localhost' IDENTIFIED BY 'newpassword'; 10.测试创建一个账号还有一个库和一个表 -- 创建用户 'axing' 并设置密码 CREATE USER 'axing'@'localhost' IDENTIFIED BY 'Egon@123'; -- 授予所有权限给用户 'axing' 在所有数据库中 GRANT ALL PRIVILEGES ON *.* TO 'axing'@'localhost' WITH GRANT OPTION; -- 刷新权限表以使更改生效 FLUSH PRIVILEGES; --创建库 CREATE DATABASE my_database; USE my_database; --创建表 CREATE TABLE my_table ( id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(100) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); 三、准备测车镜像(1)准备数据库备份的脚本cat jiaoben.py脚本里面有些地方还可以优化,嫌麻烦没完善了。(按这个是可以跑起来的修改好对应的账号密码还有IP地址)#!/bin/bash /usr/sbin/mysqld --user=mysql & /usr/sbin/sshd # 配置 #!/bin/bash # 启动 MySQL 和 SSH 服务 /usr/sbin/mysqld --user=mysql & # 启动 MySQL 数据库服务 /usr/sbin/sshd # 启动 SSH 服务 # 配置变量 MYSQL_HOST="127.0.0.1" # MySQL 数据库主机地址 MYSQL_USER="root" # MySQL 数据库用户名 MYSQL_PASSWORD="Egon@123" # MySQL 数据库密码 BACKUP_DIR="/backup" # 本地备份文件存储目录 REMOTE_HOST="192.168.110.110" # 远程主机地址 REMOTE_USER="root" # 远程主机用户名 REMOTE_DIR="/remote/backup/dir" # 远程备份文件存储目录 host_info="root:1" # 存储用户名和密码的变量 target_ip="192.168.110.110" # 目标主机 IP 地址 # 提取用户名和密码 user=$(echo $host_info | awk -F: '{print $1}') # 从 host_info 中提取用户名 pass=$(echo $host_info | awk -F: '{print $2}') # 从 host_info 中提取密码 # SSH 密钥路径 key_path="/root/.ssh/id_rsa" # SSH 私钥路径 pub_key_path="/root/.ssh/id_rsa.pub" # SSH 公钥路径 # 检查并生成 SSH 密钥对 if [ ! -f "$pub_key_path" ]; then echo "SSH 公钥文件不存在,生成新的密钥对..." # 如果公钥文件不存在,则生成新的密钥对 ssh-keygen -t rsa -b 4096 -f "$key_path" -N "" # 生成新的 SSH 密钥对 else echo "SSH 公钥文件已存在。" # 如果公钥文件已存在,则不进行生成 fi # 使用 expect 自动化 ssh-copy-id 过程 expect << EOF spawn ssh-copy-id -i $pub_key_path $user@$target_ip expect { "yes/no" {send "yes\n"; exp_continue} "password:" {send "$pass\n"} } expect eof EOF # 检查 expect 命令的退出状态 if [ $? -eq 0 ]; then echo "公钥已成功复制到目标主机。" # 如果 expect 命令成功执行,打印成功消息 else echo "公钥复制过程失败。" # 如果 expect 命令失败,打印失败消息 fi # 创建备份目录 mkdir -p $BACKUP_DIR # 创建备份存储目录(如果不存在) # 获取当前时间 TIMESTAMP=$(date +"%F-%H-%M-%S") # 获取当前时间,格式为 YYYY-MM-DD-HH-MM-SS # 备份数据库 mysqldump -h $MYSQL_HOST --all-databases > $BACKUP_DIR/db_backup_$TIMESTAMP.sql # 使用 mysqldump 工具备份所有数据库到本地文件 # 创建远程备份目录(如果不存在) ssh root@$REMOTE_HOST "mkdir -p $REMOTE_DIR" # 在远程主机上创建备份存储目录(如果不存在) # 上传备份到远程主机 scp $BACKUP_DIR/db_backup_$TIMESTAMP.sql $REMOTE_USER@$REMOTE_HOST:$REMOTE_DIR # 使用 scp 工具将备份文件上传到远程主机 # 删除本地备份文件 rm $BACKUP_DIR/db_backup_$TIMESTAMP.sql # 删除本地备份文件,以节省磁盘空间(2)准备定时任务的文件cat crontab0 2 * * * /usr/local/bin/jiaoben.py(3)构建镜像cat dockerfile可以偷懒直接用上面业务镜像 # 使用 CentOS 作为基础镜像 FROM registry.cn-guangzhou.aliyuncs.com/xingcangku/axingcangku:latest # 复制备份脚本 COPY jiaoben.sh /usr/local/bin/jiaoben.sh RUN chmod +x /usr/local/bin/jiaoben.sh # 复制 crontab 文件 COPY crontab /etc/cron.d/backup-cron RUN chmod 0644 /etc/cron.d/backup-cron # 确保 cron 服务在容器中运行 RUN crontab /etc/cron.d/backup-cron # 创建 MySQL 配置文件 .my.cnf RUN echo "[client]" > /root/.my.cnf && \ echo "user=root" >> /root/.my.cnf && \ echo "password=Egon@123" >> /root/.my.cnf && \ chmod 600 /root/.my.cnf # 启动 cron 服务,并保持容器运行 CMD ["crond", "-n"]四、准备webhook.py文件这个是webhook pod里面运行的python文件有两个功能1.检测到带有app:mysql标签的pod的时候,会自动创建一个侧车容器,会定时备份数据给远程主机。2.如果创建的pod没有带有标签会自动添加标签3.labels 中存在 'app: test' 标签,则添加 annotation reloader.stakater.com/auto: "true"# -*- coding: utf-8 -*- from flask import Flask, request, jsonify import base64 import json import ssl import kubernetes.client from kubernetes.client.rest import ApiException from kubernetes import config import os import threading import time # 创建 Flask 应用实例 app = Flask(__name__) # 加载 Kubernetes 集群内的配置 config.load_incluster_config() @app.route('/mutate', methods=['POST']) def mutate_pod(): try: # 从 AdmissionReview 请求中解析 JSON 数据 admission_review = request.get_json() pod = admission_review['request']['object'] patch = [] # 初始化 metadata, labels, 和 annotations 如果它们不存在 # 检查 pod 是否有 metadata 部分,如果没有则添加 if 'metadata' not in pod: patch.append({ "op": "add", "path": "/metadata", "value": {} }) # 检查 metadata 是否有 labels 部分,如果没有则添加 if 'labels' not in pod.get('metadata', {}): patch.append({ "op": "add", "path": "/metadata/labels", "value": {} }) # 检查 metadata 是否有 annotations 部分,如果没有则添加 if 'annotations' not in pod.get('metadata', {}): patch.append({ "op": "add", "path": "/metadata/annotations", "value": {} }) # 获取现有的 labels 和 annotations labels = pod.get('metadata', {}).get('labels', {}) annotations = pod.get('metadata', {}).get('annotations', {}) # 如果 labels 中不存在 'environment' 标签,则添加 if 'environment' not in labels: patch.append({ "op": "add", "path": "/metadata/labels/environment", "value": "production" }) # 如果 labels 中存在 'app: test' 标签,则添加 annotation reloader.stakater.com/auto: "true" if labels.get('app') == 'test': if 'reloader.stakater.com/auto' not in annotations: patch.append({ "op": "add", "path": "/metadata/annotations/reloader.stakater.com~1auto", "value": "true" }) # 如果 labels 中存在 'app: mysql' 标签,则在 Pod 中添加 sidecar 容器 if labels.get('app') == 'mysql': container = { "name": "sidecar-container", "image": "registry.cn-guangzhou.aliyuncs.com/xingcangku/axingcangku:v1.1", "ports": [{"containerPort": 8080}] } patch.append({ "op": "add", "path": "/spec/containers/-", "value": container }) # 构造 AdmissionReview 响应 admission_response = { "apiVersion": "admission.k8s.io/v1", "kind": "AdmissionReview", "response": { "uid": admission_review['request']['uid'], "allowed": True, "patchType": "JSONPatch", "patch": base64.b64encode(json.dumps(patch).encode()).decode() # 对 patch 进行 Base64 编码 } } # 返回 AdmissionReview 响应 return jsonify(admission_response) except Exception as e: # 如果发生异常,返回包含错误消息的响应 return jsonify({ "apiVersion": "admission.k8s.io/v1", "kind": "AdmissionReview", "response": { "uid": admission_review['request']['uid'], "allowed": False, "status": { "message": str(e) # 将异常消息作为响应的一部分 } } }), 500 def keep_alive(): while True: time.sleep(3600) # 每小时休眠一次,保持进程活跃 if __name__ == '__main__': # 创建 SSL 上下文并加载证书 context = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH) context.load_cert_chain('/certs/tls.crt', '/certs/tls.key') # 加载 SSL 证书和私钥 # 使用 SSL 上下文启动 Flask 应用,监听所有网络接口的 8080 端口 app.run(host='0.0.0.0', port=8080, ssl_context=context) 五、总结:这个测车容器主要实现了,在使用mysql为业务pod的时候有数据备份的功能。实现了定时自动备份数据的功能。六、更新可以安装ansible直接执行下面的命令部署,但是因为镜像里面的webhook.py文件的定制。 如果需要自定义功能需要按上面的步骤创建镜像然后在下面的代码中更新镜像地址即可。--- - name: Deploy Kubernetes Webhook hosts: localhost connection: local become: yes tasks: # 1. 生成 CA 私钥 - name: Generate CA private key command: openssl genrsa -out ca.key 2048 args: creates: ca.key # 2. 生成自签名 CA 证书 - name: Generate self-signed CA certificate command: > openssl req -x509 -new -nodes -key ca.key -subj "/CN=webhook-service.default.svc" -days 36500 -out ca.crt args: creates: ca.crt # 3. 创建 OpenSSL 配置文件 - name: Create OpenSSL configuration file copy: dest: webhook-openssl.cnf content: | [req] default_bits = 2048 prompt = no default_md = sha256 req_extensions = req_ext distinguished_name = dn [dn] C = CN ST = Shanghai L = Shanghai O = egonlin OU = egonlin CN = webhook-service.default.svc [req_ext] subjectAltName = @alt_names [alt_names] DNS.1 = webhook-service DNS.2 = webhook-service.default DNS.3 = webhook-service.default.svc DNS.4 = webhook-service.default.svc.cluster.local [req_distinguished_name] CN = webhook-service.default.svc [v3_req] keyUsage = critical, digitalSignature, keyEncipherment extendedKeyUsage = serverAuth subjectAltName = @alt_names [v3_ext] authorityKeyIdentifier=keyid,issuer:always basicConstraints=CA:FALSE keyUsage=keyEncipherment,dataEncipherment extendedKeyUsage=serverAuth,clientAuth subjectAltName=@alt_names # 4. 生成 Webhook 服务的私钥 - name: Generate webhook service private key command: openssl genrsa -out webhook.key 2048 args: creates: webhook.key # 5. 使用配置文件生成 CSR - name: Generate CSR command: openssl req -new -key webhook.key -out webhook.csr -config webhook-openssl.cnf args: creates: webhook.csr # 6. 生成 webhook 证书 - name: Generate webhook certificate command: > openssl x509 -req -in webhook.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out webhook.crt -days 36500 -extensions v3_ext -extfile webhook-openssl.cnf args: creates: webhook.crt # 7. 删除旧的 Kubernetes Secret - name: Delete existing Kubernetes Secret command: kubectl delete secret webhook-certs --namespace=default ignore_errors: true # 8. 创建 Kubernetes Secret - name: Create Kubernetes Secret for webhook certificates command: > kubectl create secret tls webhook-certs --cert=webhook.crt --key=webhook.key --namespace=default --dry-run=client -o yaml register: secret_yaml - name: Apply Kubernetes Secret command: kubectl apply -f - args: stdin: "{{ secret_yaml.stdout }}" # 9. 创建 webhook Deployment 和 Service - name: Create webhook deployment YAML copy: dest: webhook-deployment.yaml content: | apiVersion: apps/v1 kind: Deployment metadata: name: webhook-deployment namespace: default spec: replicas: 1 selector: matchLabels: app: webhook template: metadata: labels: app: webhook spec: containers: - name: webhook image: registry.cn-guangzhou.aliyuncs.com/xingcangku/webhook:v1.0 command: [ "python", "/app/webhook.py" ] volumeMounts: - name: webhook-certs mountPath: /certs readOnly: true volumes: - name: webhook-certs secret: secretName: webhook-certs - name: Apply webhook deployment command: kubectl apply -f webhook-deployment.yaml - name: Create webhook service YAML copy: dest: webhook-service.yaml content: | apiVersion: v1 kind: Service metadata: name: webhook-service namespace: default spec: ports: - port: 443 targetPort: 443 selector: app: webhook - name: Apply webhook service command: kubectl apply -f webhook-service.yaml # 10. 生成 base64 编码的 CA 证书 - name: Create base64 encoded CA certificate shell: base64 -w 0 ca.crt > ca.crt.base64 args: creates: ca.crt.base64 # 11. 读取 base64 内容并生成 MutatingWebhookConfiguration YAML - name: Read CA base64 content slurp: src: ca.crt.base64 register: ca_base64 - name: Generate MutatingWebhookConfiguration YAML copy: dest: m-w-c.yaml content: | apiVersion: admissionregistration.k8s.io/v1 kind: MutatingWebhookConfiguration metadata: name: example-mutating-webhook webhooks: - name: example.webhook.com clientConfig: service: name: webhook-service namespace: default path: "/mutate" caBundle: "{{ ca_base64.content | b64decode }}" rules: - operations: ["CREATE"] apiGroups: [""] apiVersions: ["v1"] resources: ["pods"] admissionReviewVersions: ["v1"] sideEffects: None - name: Apply MutatingWebhookConfiguration command: kubectl apply -f m-w-c.yaml
2023年09月01日
18 阅读
0 评论
0 点赞
1
...
24
25
26
...
29