首页
导航
统计
留言
更多
壁纸
直播
关于
推荐
星的魔法
星的导航页
星的云盘
谷歌一下
腾讯视频
Search
1
Ubuntu安装 kubeadm 部署k8s 1.30
174 阅读
2
rockylinux 9.3详细安装drbd
112 阅读
3
kubeadm 部署k8s 1.30
111 阅读
4
rockylinux 9.3详细安装drbd+keepalived
105 阅读
5
ceshi
72 阅读
默认分类
日记
linux
docker
k8s
ELK
Jenkins
Grafana
Harbor
Prometheus
Cepf
k8s安装
Gitlab
traefik
sonarqube
OpenTelemetry
MinIOn
golang
Git
Python
Web开发
HTML和CSS
JavaScript
对象模型
登录
/
注册
Search
标签搜索
k8s
linux
docker
drbd+keepalivde
ansible
dcoker
webhook
星
累计撰写
77
篇文章
累计收到
938
条评论
首页
栏目
默认分类
日记
linux
docker
k8s
ELK
Jenkins
Grafana
Harbor
Prometheus
Cepf
k8s安装
Gitlab
traefik
sonarqube
OpenTelemetry
MinIOn
golang
Git
Python
Web开发
HTML和CSS
JavaScript
对象模型
页面
导航
统计
留言
壁纸
直播
关于
推荐
星的魔法
星的导航页
星的云盘
谷歌一下
腾讯视频
搜索到
2
篇与
的结果
2025-06-18
部署Prometheus监控
一、组件说明#如果已安装metrics-server需要先卸载,否则冲突 1. MetricServer:是kubernetes集群资源使用情况的聚合器,收集数据给kubernetes集群内使用,如kubectl,hpa,scheduler等。 2. PrometheusOperator:是一个系统监测和警报工具箱,用来存储监控数据。 3. NodeExporter:用于各node的关键度量指标状态数据。 4. KubeStateMetrics:收集kubernetes集群内资源对象数据,制定告警规则。 5. Prometheus:采用pull方式收集apiserver,scheduler,controller-manager,kubelet组件数据,通过http协议传输。 6. Grafana:是可视化数据统计和监控平台。二、安装部署项目地址:https://github.com/prometheus-operator/kube-prometheus三、版本选择可参考官方文档https://github.com/prometheus-operator/kube-prometheus?tab=readme-ov-file#compatibility,例如 k8s 版本为 1.30,推荐的 kube-Prometheus 版本为release-0.14四、克隆项目至本地git clone -b release-0.13 https://github.com/prometheus-operator/kube-prometheus.git五、创建资源对象#如果是国内 要改镜像地址 [root@master1 k8s-install]# kubectl create namespace monitoring [root@master1 k8s-install]# cd kube-prometheus/ [root@master1 kube-prometheus]# kubectl apply --server-side -f manifests/setup [root@master1 kube-prometheus]# kubectl wait \ --for condition=Established \ --all CustomResourceDefinition \ --namespace=monitoring [root@master1 kube-prometheus]# kubectl apply -f manifests/root@k8s01:~/helm/prometheus/kube-prometheus# kubectl apply --server-side -f manifests/setup customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com serverside-applied customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com serverside-applied customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com serverside-applied customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com serverside-applied customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com serverside-applied customresourcedefinition.apiextensions.k8s.io/prometheusagents.monitoring.coreos.com serverside-applied customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com serverside-applied customresourcedefinition.apiextensions.k8s.io/scrapeconfigs.monitoring.coreos.com serverside-applied customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com serverside-applied customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com serverside-applied namespace/monitoring serverside-applied root@k8s01:~/helm/prometheus/kube-prometheus# kubectl wait \ > --for condition=Established \ > --all CustomResourceDefinitions \ > --namespace=monitoring customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io condition met customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io condition met customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io condition met customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io condition met customresourcedefinition.apiextensions.k8s.io/ingressroutes.traefik.io condition met customresourcedefinition.apiextensions.k8s.io/ingressroutetcps.traefik.io condition met customresourcedefinition.apiextensions.k8s.io/ingressrouteudps.traefik.io condition met customresourcedefinition.apiextensions.k8s.io/instrumentations.opentelemetry.io condition met customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io condition met customresourcedefinition.apiextensions.k8s.io/middlewares.traefik.io condition met customresourcedefinition.apiextensions.k8s.io/middlewaretcps.traefik.io condition met customresourcedefinition.apiextensions.k8s.io/opampbridges.opentelemetry.io condition met customresourcedefinition.apiextensions.k8s.io/opentelemetrycollectors.opentelemetry.io condition met customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io condition met customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/policybindings.sts.min.io condition met customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/prometheusagents.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/scrapeconfigs.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/serverstransports.traefik.io condition met customresourcedefinition.apiextensions.k8s.io/serverstransporttcps.traefik.io condition met customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/targetallocators.opentelemetry.io condition met customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com condition met customresourcedefinition.apiextensions.k8s.io/tlsoptions.traefik.io condition met customresourcedefinition.apiextensions.k8s.io/tlsstores.traefik.io condition met customresourcedefinition.apiextensions.k8s.io/traefikservices.traefik.io condition met root@k8s01:~/helm/prometheus/kube-prometheus# kubectl apply -f manifests/ alertmanager.monitoring.coreos.com/main created networkpolicy.networking.k8s.io/alertmanager-main created poddisruptionbudget.policy/alertmanager-main created prometheusrule.monitoring.coreos.com/alertmanager-main-rules created secret/alertmanager-main created service/alertmanager-main created serviceaccount/alertmanager-main created servicemonitor.monitoring.coreos.com/alertmanager-main created clusterrole.rbac.authorization.k8s.io/blackbox-exporter created clusterrolebinding.rbac.authorization.k8s.io/blackbox-exporter created configmap/blackbox-exporter-configuration created deployment.apps/blackbox-exporter created networkpolicy.networking.k8s.io/blackbox-exporter created service/blackbox-exporter created serviceaccount/blackbox-exporter created servicemonitor.monitoring.coreos.com/blackbox-exporter created secret/grafana-config created secret/grafana-datasources created configmap/grafana-dashboard-alertmanager-overview created configmap/grafana-dashboard-apiserver created configmap/grafana-dashboard-cluster-total created configmap/grafana-dashboard-controller-manager created configmap/grafana-dashboard-grafana-overview created configmap/grafana-dashboard-k8s-resources-cluster created configmap/grafana-dashboard-k8s-resources-multicluster created configmap/grafana-dashboard-k8s-resources-namespace created configmap/grafana-dashboard-k8s-resources-node created configmap/grafana-dashboard-k8s-resources-pod created configmap/grafana-dashboard-k8s-resources-workload created configmap/grafana-dashboard-k8s-resources-workloads-namespace created configmap/grafana-dashboard-kubelet created configmap/grafana-dashboard-namespace-by-pod created configmap/grafana-dashboard-namespace-by-workload created configmap/grafana-dashboard-node-cluster-rsrc-use created configmap/grafana-dashboard-node-rsrc-use created configmap/grafana-dashboard-nodes-aix created configmap/grafana-dashboard-nodes-darwin created configmap/grafana-dashboard-nodes created configmap/grafana-dashboard-persistentvolumesusage created configmap/grafana-dashboard-pod-total created configmap/grafana-dashboard-prometheus-remote-write created configmap/grafana-dashboard-prometheus created configmap/grafana-dashboard-proxy created configmap/grafana-dashboard-scheduler created configmap/grafana-dashboard-workload-total created configmap/grafana-dashboards created deployment.apps/grafana created networkpolicy.networking.k8s.io/grafana created prometheusrule.monitoring.coreos.com/grafana-rules created service/grafana created serviceaccount/grafana created servicemonitor.monitoring.coreos.com/grafana created prometheusrule.monitoring.coreos.com/kube-prometheus-rules created clusterrole.rbac.authorization.k8s.io/kube-state-metrics created clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created deployment.apps/kube-state-metrics created networkpolicy.networking.k8s.io/kube-state-metrics created prometheusrule.monitoring.coreos.com/kube-state-metrics-rules created service/kube-state-metrics created serviceaccount/kube-state-metrics created servicemonitor.monitoring.coreos.com/kube-state-metrics created prometheusrule.monitoring.coreos.com/kubernetes-monitoring-rules created servicemonitor.monitoring.coreos.com/kube-apiserver created servicemonitor.monitoring.coreos.com/coredns created servicemonitor.monitoring.coreos.com/kube-controller-manager created servicemonitor.monitoring.coreos.com/kube-scheduler created servicemonitor.monitoring.coreos.com/kubelet created clusterrole.rbac.authorization.k8s.io/node-exporter created clusterrolebinding.rbac.authorization.k8s.io/node-exporter created daemonset.apps/node-exporter created networkpolicy.networking.k8s.io/node-exporter created prometheusrule.monitoring.coreos.com/node-exporter-rules created service/node-exporter created serviceaccount/node-exporter created servicemonitor.monitoring.coreos.com/node-exporter created clusterrole.rbac.authorization.k8s.io/prometheus-k8s created clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created networkpolicy.networking.k8s.io/prometheus-k8s created poddisruptionbudget.policy/prometheus-k8s created prometheus.monitoring.coreos.com/k8s created prometheusrule.monitoring.coreos.com/prometheus-k8s-prometheus-rules created rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s-config created role.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s created service/prometheus-k8s created serviceaccount/prometheus-k8s created servicemonitor.monitoring.coreos.com/prometheus-k8s created apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created clusterrole.rbac.authorization.k8s.io/prometheus-adapter created clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created configmap/adapter-config created deployment.apps/prometheus-adapter created networkpolicy.networking.k8s.io/prometheus-adapter created poddisruptionbudget.policy/prometheus-adapter created rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created service/prometheus-adapter created serviceaccount/prometheus-adapter created servicemonitor.monitoring.coreos.com/prometheus-adapter created clusterrole.rbac.authorization.k8s.io/prometheus-operator created clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created deployment.apps/prometheus-operator created networkpolicy.networking.k8s.io/prometheus-operator created prometheusrule.monitoring.coreos.com/prometheus-operator-rules created service/prometheus-operator created serviceaccount/prometheus-operator created servicemonitor.monitoring.coreos.com/prometheus-operator created root@k8s01:~/helm/prometheus/kube-prometheus# 六、验证查看#查看pod状态 root@k8s03:~# kubectl get pod -n monitoring NAME READY STATUS RESTARTS AGE alertmanager-main-0 2/2 Running 0 50m alertmanager-main-1 2/2 Running 0 50m alertmanager-main-2 2/2 Running 0 50m blackbox-exporter-57bb665766-d9kwj 3/3 Running 0 50m grafana-fdf8c48f-f6cck 1/1 Running 0 50m kube-state-metrics-5ffdd9685c-hg5hc 3/3 Running 0 50m node-exporter-8l29v 2/2 Running 0 31m node-exporter-gdclz 2/2 Running 0 28m node-exporter-j5r76 2/2 Running 0 50m prometheus-adapter-7945bdf5d7-dh75k 1/1 Running 0 50m prometheus-adapter-7945bdf5d7-nbp94 1/1 Running 0 50m prometheus-k8s-0 2/2 Running 0 50m prometheus-k8s-1 2/2 Running 0 50m prometheus-operator-85c5ffc677-jk8c9 2/2 Running 0 50m #查看top信息 root@k8s03:~# kubectl top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% k8s01 3277m 40% 6500Mi 66% k8s02 6872m 85% 4037Mi 36% k8s03 362m 4% 6407Mi 65% 七、新增ingress资源#以ingress-nginx为例 [root@master1 manifests]# cat ingress.yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: alertmanager namespace: monitoring annotations: nginx.ingress.kubernetes.io/rewrite-target: / spec: ingressClassName: nginx rules: - host: alertmanager.local.com http: paths: - path: / pathType: Prefix backend: service: name: alertmanager-main port: number: 9093 --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: grafana namespace: monitoring annotations: nginx.ingress.kubernetes.io/rewrite-target: / spec: ingressClassName: nginx rules: - host: grafana.local.com http: paths: - path: / pathType: Prefix backend: service: name: grafana port: number: 3000 --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: prometheus namespace: monitoring annotations: nginx.ingress.kubernetes.io/rewrite-target: / spec: ingressClassName: nginx rules: - host: prometheus.local.com http: paths: - path: / pathType: Prefix backend: service: name: prometheus-k8s port: number: 9090#以traefik为例: [root@master1 manifests]# cat ingress.yaml apiVersion: traefik.io/v1alpha1 kind: IngressRoute metadata: name: alertmanager namespace: monitoring spec: entryPoints: - web routes: - match: Host(`alertmanager.local.com`) kind: Rule services: - name: alertmanager-main port: 9093 --- apiVersion: traefik.io/v1alpha1 kind: IngressRoute metadata: name: grafana namespace: monitoring spec: entryPoints: - web routes: - match: Host(`grafana.local.com`) kind: Rule services: - name: grafana port: 3000 --- apiVersion: traefik.io/v1alpha1 kind: IngressRoute metadata: name: prometheus namespace: monitoring spec: entryPoints: - web routes: - match: Host(`prometheus.local.com`) kind: Rule services: - name: prometheus-k8s port: 9090 [root@master1 manifests]# kubectl apply -f ingress.yaml ingressroute.traefik.containo.us/alertmanager created ingressroute.traefik.containo.us/grafana created ingressroute.traefik.containo.us/prometheus created八、web访问验证#新增hosts解析记录 win notepad $env:windir\System32\drivers\etc\hosts 192.168.3.200 alertmanager.local.com 192.168.3.200 prometheus.local.com 192.168.3.200 grafana.local.com访问http://alertmanager.local.com:30080 ,查看当前激活的告警访问http://prometheus.local.com/targets:30080,查看targets已全部up访问http://grafana.local.com:30080/login,默认用户名和密码是admin/admin查看数据源,以为我们自动配置Prometheus数据源 九、targets异常处理查看targets可发现有两个监控任务没有对应的instance,这和serviceMonitor资源对象有关root@k8s01:~/helm/prometheus/kube-prometheus# cat prometheus-kubeControllerManagerService.yaml apiVersion: v1 kind: Service metadata: namespace: kube-system name: kube-controller-manager labels: app.kubernetes.io/name: kube-controller-manager spec: selector: component: kube-controller-manager type: ClusterIP ports: - name: https-metrics port: 10257 targetPort: 10257 protocol: TCP #新建prometheus-kubeControllerManagerService.yaml并apply创建资源 apiVersion: v1 kind: Service metadata: namespace: kube-system name: kube-controller-manager labels: app.kubernetes.io/name: kube-controller-manager spec: selector: component: kube-controller-manager type: ClusterIP ports: - name: https-metrics port: 10257 targetPort: 10257 protocol: TCP
2025年06月18日
4 阅读
1 评论
0 点赞
2023-09-03
prometheus部署
一、 二进制安装prometheus server 物理机直接部署下载最新版二进制包见:https://prometheus.io/download , 下载历史版本见:https://github.com/prometheus/prometheus/tags LTS: 2.53.x版 注意:最新版本的prometheus,可能会出现grafana上模板没有数据,不兼容的新规则的问题 ========================================》二进制安装prometheus server # 1、先做个软连接方便后续升级 ln -s /monitor/prometheus-2.53.0.linux-amd64 /monitor/prometheus mkdir /monitor/prometheus/data # 创建tsdb数据目录 # 2、添加系统服务 cat > /usr/lib/systemd/system/prometheus.service << 'EOF' [Unit] Description=prometheus server daemon [Service] Restart=on-failure ExecStart=/monitor/prometheus/prometheus --config.file=/monitor/prometheus/prometheus.yml --storage.tsdb.path=/monitor/prometheus/data --storage.tsdb.retention.time=30d --web.enable-lifecycle [Install] WantedBy=multi-user.target EOF # 3、启动 systemctl daemon-reload systemctl enable prometheus.service systemctl start prometheus.service systemctl status prometheus netstat -tunalp |grep 9090测试下载并构建测试程序,作为被监控者,对外暴漏了/metrics接口 yum install golang -y git clone https://github.com/prometheus/client_golang.git cd client_golang/examples/random export GO111MODULE=on export GOPROXY=https://goproxy.cn go build # 得到一个二进制命令random 然后在 3 个独立的终端里面运行 3 个服务: ./random -listen-address=:8080 # 对外暴漏http://localhost:8080/metrics ./random -listen-address=:8081 # 对外暴漏http://localhost:8081/metrics ./random -listen-address=:8082 # 对外暴漏http://localhost:8080/metrics 因为都对外暴漏了/metrics接口,并且数据格式遵循prometheus规范,所以我们可以在prometheus.yml中添加监控项假设8080与8081是生产实例,8082是金丝雀实例,那么我们放到不同的target里然后用标签区分,配置如下 scrape_configs: - job_name: 'example-random' # Override the global default and scrape targets from this job every 5 seconds. scrape_interval: 5s static_configs: - targets: ['192.168.110.101:8080', '192.168.110.101:8081'] labels: group: 'production' - targets: ['192.168.110.101:8082'] labels: group: 'canary'systemctl restart prometheus 然后查看页面http://192.168.110.101:9090/ 点击Status---》Targets,发现有新增的监控job二、 安装prometheus server到k8s#prometheus-cm.yaml apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: monitor data: prometheus.yml: | global: scrape_interval: 15s # Prometheus每隔15s就会从所有配置的目标端点抓取最新的数据 scrape_timeout: 15s # 某个抓取操作在 15 秒内未完成,会被视为超时,不会包含在最新的数据中。 evaluation_interval: 15s # # 每15s对告警规则进行计算 scrape_configs: - job_name: "prometheus" static_configs: - targets: ["localhost:9090"] prometheus-pv-pvc.yamlapiVersion: v1 kind: PersistentVolume metadata: name: prometheus-local labels: app: prometheus spec: accessModes: - ReadWriteOnce capacity: storage: 20Gi storageClassName: local-storage local: path: /data/k8s/prometheus nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - master01 persistentVolumeReclaimPolicy: Retain --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: prometheus-data namespace: monitor spec: selector: matchLabels: app: prometheus accessModes: - ReadWriteOnce resources: requests: storage: 20Gi storageClassName: local-storageprometheus-rbac.yamlapiVersion: v1 kind: ServiceAccount metadata: name: prometheus namespace: monitor --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: - '' resources: - nodes - services - endpoints - pods - nodes/proxy verbs: - get - list - watch - apiGroups: - 'extensions' resources: - ingresses verbs: - get - list - watch - apiGroups: - '' resources: - configmaps - nodes/metrics verbs: - get - nonResourceURLs: # 用来对非资源型 metrics 进行操作的权限声明 - /metrics verbs: - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole # 由于我们要获取的资源信息,在每一个 namespace 下面都有可能存在,所以我们这里使用的是 ClusterRole 的资源对象 name: prometheus subjects: - kind: ServiceAccount name: prometheus namespace: monitor prometheus-deploy.yamlapiVersion: apps/v1 kind: Deployment metadata: name: prometheus namespace: monitor labels: app: prometheus spec: selector: matchLabels: app: prometheus template: metadata: labels: app: prometheus spec: serviceAccountName: prometheus securityContext: # 确保这里的缩进使用的是空格 runAsUser: 0 containers: - image: registry.cn-guangzhou.aliyuncs.com/xingcangku/oooo:1.0 name: prometheus args: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' - '--storage.tsdb.retention.time=24h' - '--web.enable-admin-api' - '--web.enable-lifecycle' ports: - containerPort: 9090 name: http volumeMounts: - mountPath: '/etc/prometheus' name: config-volume - mountPath: '/prometheus' name: data resources: requests: cpu: 100m memory: 512Mi limits: cpu: 100m memory: 512Mi volumes: - name: data persistentVolumeClaim: claimName: prometheus-data - name: config-volume configMap: name: prometheus-config prometheus-svc.yamlapiVersion: v1 kind: Service metadata: name: prometheus namespace: monitor labels: app: prometheus spec: selector: app: prometheus type: NodePort ports: - name: web port: 9090 targetPort: 9090 #targetPort: http kubectl create namespace monitor kubectl apply -f cm.yaml mkdir /data/k8s/prometheus # 在pv所亲和的节点上创建 kubectl apply -f pv-pvc.yaml kubectl apply -f rbac.yaml kubectl apply -f deploy.yaml kubectl apply -f svc.yaml # 把二进制的停掉 systemctl stop prometheus systemctl disable prometheus # 添加监控项,然后apply -f [root@master01 monitor]# cat prometheus-cm.yaml apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: monitor data: prometheus.yml: | global: scrape_interval: 15s # Prometheus 每隔 15 秒从所有配置的目标端点抓取最新的数据 scrape_timeout: 15s # 某个抓取操作在 15 秒内未完成,会被视为超时,不会包含在最新的数据中。 evaluation_interval: 15s # 每 15 秒对告警规则进行计算 scrape_configs: - job_name: "prometheus" static_configs: - targets: ["localhost:9090"] - job_name: 'example-random' scrape_interval: 5s static_configs: - targets: ['192.168.110.101:8080', '192.168.110.101:8081'] labels: group: 'production' - targets: ['192.168.110.101:8082'] labels: group: 'canary' #重载服务 [root@master01 monitor]# kubectl -n monitor get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES prometheus-7b644bfcfc-l5twf 1/1 Running 0 5h25m 10.244.0.18 master01 <none> <none> [root@master01 monitor]# curl -X POST "http://10.108.206.132:9090/-/reload"三、监控应用软件(1)服务自带/metrics接口,直接监控,在配置中增加下述target,然后apply -f - job_name: "coredns" static_configs: - targets: ["kube-dns.kube-system.svc.cluster.local:9153"] 等一会后 curl -X POST "http://10.108.206.132:9090/-/reload"(2)应用软件没有自带自带/metrics接口,需要安装对应的exporterexporter官网地址:https://prometheus.io/docs/instrumenting/exporters/ 安装redis yum install redis -y sed -ri 's/bind 127.0.0.1/bind 0.0.0.0/g' /etc/redis.conf sed -ri 's/port 6379/port 16379/g' /etc/redis.conf cat >> /etc/redis.conf << "EOF" requirepass 123456 EOF systemctl restart redis systemctl status redis 添加redis_exporter来采集redis的监控信息 wget https://github.com/oliver006/redis_exporter/releases/download/v1.61.0/redis_exporter-v1.61.0.linux-amd64.tar.gz # 2、安装 tar xf redis_exporter-v1.61.0.linux-amd64.tar.gz mv redis_exporter-v1.61.0.linux-amd64/redis_exporter /usr/bin/ # 3、制作系统服务 cat > /usr/lib/systemd/system/redis_exporter.service << 'EOF' [Unit] Description=Redis Exporter Wants=network-online.target After=network-online.target [Service] User=root Group=root Type=simple ExecStart=/usr/bin/redis_exporter --redis.addr=redis://127.0.0.1:16379 --redis.password=123456 --web.listen-address=0.0.0.0:9122 --exclude-latency-histogram-metrics [Install] WantedBy=multi-user.target EOF #4、启动 systemctl daemon-reload systemctl restart redis_exporter systemctl status redis_exporter # 5、在cm中增加监控项目 - job_name: "redis-server" # 添加这一条 static_configs: - targets: ["192.168.71.101:9122"] kubectl apply -f prometheus-cm.yaml # 6、过一会后,执行prometheus server的reload curl -X POST "http://10.108.206.132:9090/-/reload" # 7、补充说明 如果你的redis-server跑在k8s中,那我们通常不会像上面一样裸部署一个redis_exporter,而是会以`sidecar` 的形式将redis_exporter和主应用redis_server部署在同一个 Pod 中,如下所示 # prome-redis.yaml apiVersion: apps/v1 kind: Deployment metadata: name: redis namespace: monitor spec: selector: matchLabels: app: redis template: metadata: labels: app: redis spec: containers: - name: redis image: redis:4 resources: requests: cpu: 100m memory: 100Mi ports: - containerPort: 6379 - name: redis-exporter image: oliver006/redis_exporter:latest resources: requests: cpu: 100m memory: 100Mi ports: - containerPort: 9121 --- apiVersion: v1 kind: Service metadata: name: redis namespace: monitor spec: selector: app: redis ports: - name: redis port: 6379 targetPort: 6379 - name: prom port: 9121 targetPort: 9121 # 然后你就可以用该svc的clusterip结合9121端口来访问/metrics接口 # curl 上面的svc的clusterip地址:9121/metrics # 添加监控项,直接用svc名字即可, 更新prometheus-cm.yaml 的配置文件如下 - job_name: 'redis' static_configs: - targets: ['redis:9121']
2023年09月03日
14 阅读
1 评论
0 点赞