首页
导航
统计
留言
更多
壁纸
直播
关于
推荐
星的魔法
星的导航页
谷歌一下
镜像国内下载站
大模型国内下载站
docker镜像国内下载站
腾讯视频
Search
1
Ubuntu安装 kubeadm 部署k8s 1.30
200 阅读
2
rockylinux 9.3详细安装drbd
128 阅读
3
kubeadm 部署k8s 1.30
125 阅读
4
rockylinux 9.3详细安装drbd+keepalived
116 阅读
5
ceshi
80 阅读
默认分类
日记
linux
docker
k8s
ELK
Jenkins
Grafana
Harbor
Prometheus
Cepf
k8s安装
Gitlab
traefik
sonarqube
OpenTelemetry
MinIOn
Containerd进阶使用
ArgoCD
golang
Git
Python
Web开发
HTML和CSS
JavaScript
对象模型
公司
登录
/
注册
Search
标签搜索
k8s
linux
docker
drbd+keepalivde
ansible
dcoker
webhook
星
累计撰写
112
篇文章
累计收到
940
条评论
首页
栏目
默认分类
日记
linux
docker
k8s
ELK
Jenkins
Grafana
Harbor
Prometheus
Cepf
k8s安装
Gitlab
traefik
sonarqube
OpenTelemetry
MinIOn
Containerd进阶使用
ArgoCD
golang
Git
Python
Web开发
HTML和CSS
JavaScript
对象模型
公司
页面
导航
统计
留言
壁纸
直播
关于
推荐
星的魔法
星的导航页
谷歌一下
镜像国内下载站
大模型国内下载站
docker镜像国内下载站
腾讯视频
搜索到
5
篇与
的结果
2025-06-20
基于OpenTelemetry+Grafana可观测性实践
一、方案介绍OpenTelemetry + Prometheus + Loki + Tempo + Grafana 是一套现代化、云备份的可安装性解决方案组合,涵盖Trace(追踪追踪)、Log(日志)、Metrics(指标)三大核心维度,为微服务架构中的应用提供统一的可安装性平台。二、组件介绍三、系统架构四、部署示例应用4.1 应用介绍https://opentelemetry.io/docs/demo/kubernetes-deployment/ 官方为大家写了一个opentelemetry-demo。 这个项目模拟了一个微服务版本的电子商城,主要包含了以下一些项目:4.2 部署应用4.2.1获取图表包# helm repo open-telemetry https://open-telemetry.github.io 添加/opentelemetry-helm-charts # helm pull open-telemetry/opentelemetry-demo --untar # cd opentelemetry-demo # ls Chart.lock Chart.yaml 示例 grafana-dashboards README.md UPGRADING.md values.yaml 图表 ci flagd 产品模板values.schema.json4.2.2 自定义图表包,默认图表包集成了opentelemetry-collector、prometheus、grafana、opensearch、jaeger组件,我们先将其取消# vim 值.yaml 默认: # 评估所有组件的环境变量列表 环境: -名称:OTEL_COLLECTOR_NAME 值:center-collector.opentelemetry.svc opentelemetry-收集器: 已启用:false 耶格尔: 已启用:false 普罗米修斯: 已启用:false 格拉法纳: 已启用:false 开放搜索: 已启用:false4.2.3安装示例应用# helm install demo .-f values.yaml -所有服务渴望通过前置代理获得:http://localhost:8080 通过运行以下命令: kubectl --namespace 默认端口转发 svc/frontend-proxy 8080 :8080 通过端口转发暴露frontend-proxy服务后,这些路径上可以使用以下服务: 网上商店 http://localhost:8080/ Jaeger 用户界面 http://localhost:8080/jaeger/ui/ Grafana http://localhost:8080/grafana/ 负载生成器 UI http://localhost:8080/loadgen/ 功能标志UI http://localhost:8080/feature/ # kubectl 获取 pod 名称 就绪状态 重启时间 Accounting-79cdcf89df-h8nnc 1 /1 运动 0 2分15秒 ad-dc6768b6-lvzcq 1 /1 跑步 0 2分14秒 cart-65c89fcdd7-8tcwp 1 /1 运动 0 2分15秒 checkout-7c45459f67-xvft2 1 /1 运动 0 2分13秒 currency-65dd8c8f6-pxxbb 1 /1 跑步 0 2分15秒 email-5659b8d84f-9ljr9 1 /1 运动 0 2分15秒 flagd-57fdd95655-xrmsk 2 /2 运动 0 2分14秒 欺诈检测-7db9cbbd4d-znxq6 1 /1 运动 0 2分15秒 frontend-6bd764b6b9-gmstv 1 /1 跑步 0 2分15秒 frontend-proxy-56977d5ddb-cl87k 1 /1 跑步 0 2分15秒 image-provider-54b56c68b8-gdgnv 1 /1 跑步 0 2分15秒 kafka-976bc899f-79vd7 1 /1 运动 0 2分14秒 load-generator-79dd9d8d58-hcw8c 1 /1 运行 0 2分15秒 payment-6d9748df64-46zwt 1/1 正在播放 0 2分15秒 产品目录-658d99b4d4-xpczv 1/1 运行 0 2m13s quote-5dfbb544f5-6r8gr 1/1 播放 0 2分14秒 推荐-764b6c5cf8-lnkm6 1/1 播放 0 2分14秒 Shipping-5f65469746-zdr2g 1/1 运行 0 2分15秒 valkey-cart-85ccb5db-kr74s 1/1 运动 0 2分15秒 # kubectl 获取服务 名称类型 供应商 IP 外部 IP 端口年龄 广告 ClusterIP 10.103.72.85 <无> 8080/TCP 2分19秒 购物车 ClusterIP 10.106.118.178 <无> 8080/TCP 2分19秒 检出 ClusterIP 10.109.56.238 <无> 8080/TCP 2m19s 货币 ClusterIP 10.96.112.137 <无> 8080/TCP 2m19s 电子邮件 ClusterIP 10.103.214.222 <无> 8080/TCP 2分19秒 flagd ClusterIP 10.101.48.231 <无> 8013/TCP,8016/TCP,4000/TCP 2分19秒 前 ClusterIP 10.103.70.199 <无> 8080/TCP 2m19s 增强代理 ClusterIP 10.106.13.80 <无> 8080/TCP 2分19秒 镜像提供者 ClusterIP 10.109.69.146 <无> 8081/TCP 2m19s kafka ClusterIP 10.104.9.210 <无> 9092/TCP,9093/TCP 2分19秒 kubernetes ClusterIP 10.96.0.1 <无> 443/TCP 176d 负载生成器 ClusterIP 10.106.97.167 <none> 8089/TCP 2m19s 付款 ClusterIP 10.102.143.196 <无> 8080/TCP 2m19s 产品目录 ClusterIP 10.109.219.138 <无> 8080/TCP 2m19s 引用 ClusterIP 10.111.139.80 <无> 8080/TCP 2m19s 建议 ClusterIP 10.97.118.12 <无> 8080/TCP 2m19s 货物运输IP 10.107.102.160 <无> 8080/TCP 2m19s valkey-cart ClusterIP 10.104.34.233 <无> 6379/TCP 2分19秒4.2.4 接下来创建 ingress 资源,引入 frontend-proxy 服务 8080 端口api版本:traefik.io/v1alpha1 种类:IngressRoute 元数据: 名称: 练习 规格: 入口点: - 网络 路线: - 匹配:主持人(`demo.cuiliangblog.cn`) 种类:规则 服务: - 名称:前置代理 端口:80804.2.5创建完成ingress资源后添加主机解析并访问验证。4.3配置Ingress输出以 ingress 为例,从 Traefik v2.6 开始,Traefik 初步支持使用 OpenTelemetry 协议导出数据追踪(traces),这使得你可以将 Traefik 的数据发送到兼容 OTel 的湖南。Traefik 部署可参考文档:https://www.cuiliangblog.cn/detail/section/140101250, 访问配置参考文档:https://doc.traefik.io/traefik/observability/access-logs/#opentelemetry# vim 值.yaml 实验性:#实验性功能配置 otlpLogs: true # 日志导出otlp格式 extraArguments: # 自定义启动参数 —“--experimental.otlpLogs=true” —“--accesslog.otlp=true” -“--accesslog.otlp.grpc=true” “--accesslog.otlp.grpc.endpoint=center-collector.opentelemetry.svc:4317” –“--accesslog.otlp.grpc.insecure=true” 指标: # 指标 addInternals: true # 追踪内部流量 otlp: enabled: true #导出otlp格式 grpc: # 使用grpc协议 端点:“center-collector.opentelemetry.svc:4317”#OpenTelemetry地址 insecure: true # 跳过证书 追踪:#仓库追踪 addInternals: true # 追踪内部流量(如重定向) otlp: enabled: true #导出otlp格式 grpc: # 使用grpc协议 端点:“center-collector.opentelemetry.svc:4317”#OpenTelemetry地址 insecure: true # 跳过证书五、MinIO部署5.1配置MinIO对象存储5.1.1配置minIO[root@k8s-master minio]# cat > minio.yaml << EOF kind: PersistentVolumeClaim apiVersion: v1 metadata: name: minio-pvc namespace: minio spec: storageClassName: nfs-client accessModes: - ReadWriteOnce resources: requests: storage: 50Gi --- apiVersion: apps/v1 kind: Deployment metadata: labels: app: minio name: minio namespace: minio spec: selector: matchLabels: app: minio template: metadata: labels: app: minio spec: containers: - name: minio image: quay.io/minio/minio:latest command: - /bin/bash - -c args: - minio server /data --console-address :9090 volumeMounts: - mountPath: /data name: data ports: - containerPort: 9090 name: console - containerPort: 9000 name: api env: - name: MINIO_ROOT_USER # 指定用户名 value: "admin" - name: MINIO_ROOT_PASSWORD # 指定密码,最少8位置 value: "minioadmin" volumes: - name: data persistentVolumeClaim: claimName: minio-pvc --- apiVersion: v1 kind: Service metadata: name: minio-service namespace: minio spec: type: NodePort selector: app: minio ports: - name: console port: 9090 protocol: TCP targetPort: 9090 nodePort: 30300 - name: api port: 9000 protocol: TCP targetPort: 9000 nodePort: 30200 EOF [root@k8s-master minio]# kubectl apply -f minio.yaml deployment.apps/minio created service/minio-service created5.1.2使用NodePort方式访问网页[root@k8s-master minio]# kubectl get pod -n minio NAME READY STATUS RESTARTS AGE minio-86577f8755-l65mf 1/1 Running 0 11m [root@k8s-master minio]# kubectl get svc -n minio NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE minio-service NodePort 10.102.223.132 <none> 9090:30300/TCP,9000:30200/TCP 10m访问k8s节点ip:30300,默认用户名密码都是admin5.1.3使用ingress方式访问[root@k8s-master minio]# cat minio-ingress.yaml apiVersion: traefik.io/v1alpha1 kind: IngressRoute metadata: name: minio-console namespace: minio spec: entryPoints: - web routes: - match: Host(`minio.test.com`) # 域名 kind: Rule services: - name: minio-service # 与svc的name一致 port: 9090 # 与svc的port一致 --- apiVersion: traefik.io/v1alpha1 kind: IngressRoute metadata: name: minio-api namespace: minio spec: entryPoints: - web routes: - match: Host(`minio-api.test.com`) # 域名 kind: Rule services: - name: minio-service # 与svc的name一致 port: 9000 # 与svc的port一致 [root@k8s-master minio]# kubectl apply -f minio-ingress.yaml ingressroute.traefik.containo.us/minio-console created ingressroute.traefik.containo.us/minio-api created添加hosts记录192.168.10.10 minio.test.com访问域名即可5.2helmminIO 部署集群minIO 集群方式部署使用operator或者helm。如果是一套 k8s 集群部署方式 minio 推荐 shiyonghelm 方式部署,operator 更适合多套 minio 集群多机场场景使用。 helmminIO部署参考文档:https://artifacthub.io/packages/helm/bitnami/minio。5.2.1资源角色规划使用分散方式部署高可用的minIO负载时,驱动器总数至少是4个,以保证纠错码。我们可以在k8s-work1和k8s-work2上的data1和data2路径存放minIO数据,使用本地pv方式持久化数据。# 创建数据存放路径 [root@k8s-work1 ~]# mkdir -p /data1/minio [root@k8s-work1 ~]# mkdir -p /data2/minio [root@k8s-work2 ~]# mkdir -p /data1/minio [root@k8s-work2 ~]# mkdir -p /data2/minio5.2.2下载helm包[root@k8s-master ~]# helm repo add bitnami https://charts.bitnami.com/bitnami [root@k8s-master ~]# helm search repo minio NAME CHART VERSION APP VERSION DESCRIPTION bitnami/minio 14.1.4 2024.3.30 MinIO(R) is an object storage server, compatibl... [root@k8s-master ~]# helm pull bitnami/minio --untar [root@k8s-master ~]# cd minio root@k8s01:~/helm/minio/minio-demo# ls minio minio-17.0.5.tgz root@k8s01:~/helm/minio/minio-demo# cd minio/ root@k8s01:~/helm/minio/minio-demo/minio# ls Chart.lock Chart.yaml ingress.yaml pv.yaml storageClass.yaml values.yaml charts demo.yaml pvc.yaml README.md templates values.yaml.bak 5.2.3创建scprovisioner 字段定义为 no-provisioner,这是尚不支持动态预配置动态生成 PV,所以我们需要提前手动创建 PV。volumeBindingMode 因为关系 定义为 WaitForFirstConsumer,是本地持久卷里一个非常重要的特性,即:延迟绑定。延迟绑定就是在我们提交 PVC 文件时,StorageClass 为我们延迟绑定 PV 与 PVC 的对应。root@k8s01:~/helm/minio/minio-demo/minio# cat storageClass.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: local-storage provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer5.2.4创建pvroot@k8s01:~/helm/minio/minio-demo/minio# cat pv.yaml apiVersion: v1 kind: PersistentVolume metadata: name: minio-pv1 labels: app: minio-0 spec: capacity: storage: 10Gi volumeMode: Filesystem accessModes: - ReadWriteOnce storageClassName: local-storage # storageClass名称,与前面创建的storageClass保持一致 local: path: /data1/minio # 本地存储路径 nodeAffinity: # 调度至work1节点 required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - k8s01 --- apiVersion: v1 kind: PersistentVolume metadata: name: minio-pv2 labels: app: minio-1 spec: capacity: storage: 10Gi volumeMode: Filesystem accessModes: - ReadWriteOnce storageClassName: local-storage local: path: /data2/minio nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - k8s01 --- apiVersion: v1 kind: PersistentVolume metadata: name: minio-pv3 labels: app: minio-2 spec: capacity: storage: 10Gi volumeMode: Filesystem accessModes: - ReadWriteOnce storageClassName: local-storage local: path: /data1/minio nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - k8s02 --- apiVersion: v1 kind: PersistentVolume metadata: name: minio-pv4 labels: app: minio-3 spec: capacity: storage: 10Gi volumeMode: Filesystem accessModes: - ReadWriteOnce storageClassName: local-storage local: path: /data2/minio nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - k8s02 root@k8s01:~/helm/minio/minio-demo/minio# kubectl get pv | grep minio minio-pv1 10Gi RWO Retain Bound minio/data-0-minio-demo-1 local-storage 10d minio-pv2 10Gi RWO Retain Bound minio/data-1-minio-demo-1 local-storage 10d minio-pv3 10Gi RWO Retain Bound minio/data-0-minio-demo-0 local-storage 10d minio-pv4 10Gi RWO Retain Bound minio/data-1-minio-demo-0 local-storage 10d5.2.5创建pvc创建的时候注意pvc的名字的构成:pvc的名字 = volume_name-statefulset_name-序号,然后通过selector标签选择,强制将pvc与pv绑定。root@k8s01:~/helm/minio/minio-demo/minio# cat pvc.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: data-minio-0 namespace: minio spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: local-storage selector: matchLabels: app: minio-0 --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: data-minio-1 namespace: minio spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: local-storage selector: matchLabels: app: minio-1 --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: data-minio-2 namespace: minio spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: local-storage selector: matchLabels: app: minio-2 --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: data-minio-3 namespace: minio spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: local-storage selector: matchLabels: app: minio-3root@k8s01:~/helm/minio/minio-demo/minio# kubectl get pvc -n minio NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE data-0-minio-demo-0 Bound minio-pv3 10Gi RWO local-storage 10d data-0-minio-demo-1 Bound minio-pv1 10Gi RWO local-storage 10d data-1-minio-demo-0 Bound minio-pv4 10Gi RWO local-storage 10d data-1-minio-demo-1 Bound minio-pv2 10Gi RWO local-storage 10d data-minio-0 Pending local-storage 10d 5.2.6 修改配置68 image: 69 registry: docker.io 70 repository: bitnami/minio 71 tag: 2024.3.30-debian-12-r0 104 mode: distributed # 集群模式,单节点为standalone,分布式集群为distributed 197 statefulset: 215 replicaCount: 2 # 节点数 218 zones: 1 # 区域数,1个即可 221 drivesPerNode: 2 # 每个节点数据目录数.2节点×2目录组成4节点的mimio集群 558 #podAnnotations: {} # 导出Prometheus指标 559 podAnnotations: 560 prometheus.io/scrape: "true" 561 prometheus.io/path: "/minio/v2/metrics/cluster" 562 prometheus.io/port: "9000" 1049 persistence: 1052 enabled: true 1060 storageClass: "local-storage" 1063 mountPath: /bitnami/minio/data 1066 accessModes: 1067 - ReadWriteOnce 1070 size: 10Gi 1073 annotations: {} 1076 existingClaim: ""5.2.7 部署miniOkubectl create ns minioroot@k8s01:~/helm/minio/minio-demo/minio# cat demo.yaml --- # Source: minio/templates/console/networkpolicy.yaml kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: minio-demo-console namespace: "minio" labels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: minio app.kubernetes.io/version: 2.0.1 helm.sh/chart: minio-17.0.5 app.kubernetes.io/component: console app.kubernetes.io/part-of: minio spec: podSelector: matchLabels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/name: minio app.kubernetes.io/component: console app.kubernetes.io/part-of: minio policyTypes: - Ingress - Egress egress: - {} ingress: # Allow inbound connections - ports: - port: 9090 --- # Source: minio/templates/networkpolicy.yaml kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: minio-demo namespace: "minio" labels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: minio app.kubernetes.io/version: 2025.5.24 helm.sh/chart: minio-17.0.5 app.kubernetes.io/component: minio app.kubernetes.io/part-of: minio spec: podSelector: matchLabels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/name: minio app.kubernetes.io/component: minio app.kubernetes.io/part-of: minio policyTypes: - Ingress - Egress egress: - {} ingress: # Allow inbound connections - ports: - port: 9000 --- # Source: minio/templates/console/pdb.yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: minio-demo-console namespace: "minio" labels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: minio app.kubernetes.io/version: 2.0.1 helm.sh/chart: minio-17.0.5 app.kubernetes.io/component: console app.kubernetes.io/part-of: minio spec: maxUnavailable: 1 selector: matchLabels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/name: minio app.kubernetes.io/component: console app.kubernetes.io/part-of: minio --- # Source: minio/templates/pdb.yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: minio-demo namespace: "minio" labels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: minio app.kubernetes.io/version: 2025.5.24 helm.sh/chart: minio-17.0.5 app.kubernetes.io/component: minio app.kubernetes.io/part-of: minio spec: maxUnavailable: 1 selector: matchLabels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/name: minio app.kubernetes.io/component: minio app.kubernetes.io/part-of: minio --- # Source: minio/templates/serviceaccount.yaml apiVersion: v1 kind: ServiceAccount metadata: name: minio-demo namespace: "minio" labels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: minio app.kubernetes.io/version: 2025.5.24 helm.sh/chart: minio-17.0.5 app.kubernetes.io/part-of: minio automountServiceAccountToken: false secrets: - name: minio-demo --- # Source: minio/templates/secrets.yaml apiVersion: v1 kind: Secret metadata: name: minio-demo namespace: "minio" labels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: minio app.kubernetes.io/version: 2025.5.24 helm.sh/chart: minio-17.0.5 app.kubernetes.io/component: minio app.kubernetes.io/part-of: minio type: Opaque data: root-user: "YWRtaW4=" root-password: "OGZHWWlrY3lpNA==" --- # Source: minio/templates/console/service.yaml apiVersion: v1 kind: Service metadata: name: minio-demo-console namespace: "minio" labels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: minio app.kubernetes.io/version: 2.0.1 helm.sh/chart: minio-17.0.5 app.kubernetes.io/component: console app.kubernetes.io/part-of: minio spec: type: ClusterIP ports: - name: http port: 9090 targetPort: http nodePort: null selector: app.kubernetes.io/instance: minio-demo app.kubernetes.io/name: minio app.kubernetes.io/component: console app.kubernetes.io/part-of: minio --- # Source: minio/templates/headless-svc.yaml apiVersion: v1 kind: Service metadata: name: minio-demo-headless namespace: "minio" labels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: minio app.kubernetes.io/version: 2025.5.24 helm.sh/chart: minio-17.0.5 app.kubernetes.io/component: minio app.kubernetes.io/part-of: minio spec: type: ClusterIP clusterIP: None ports: - name: tcp-api port: 9000 targetPort: api publishNotReadyAddresses: true selector: app.kubernetes.io/instance: minio-demo app.kubernetes.io/name: minio app.kubernetes.io/component: minio app.kubernetes.io/part-of: minio --- # Source: minio/templates/service.yaml apiVersion: v1 kind: Service metadata: name: minio-demo namespace: "minio" labels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: minio app.kubernetes.io/version: 2025.5.24 helm.sh/chart: minio-17.0.5 app.kubernetes.io/component: minio app.kubernetes.io/part-of: minio spec: type: ClusterIP ports: - name: tcp-api port: 9000 targetPort: api nodePort: null selector: app.kubernetes.io/instance: minio-demo app.kubernetes.io/name: minio app.kubernetes.io/component: minio app.kubernetes.io/part-of: minio --- # Source: minio/templates/console/deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: minio-demo-console namespace: "minio" labels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: minio app.kubernetes.io/version: 2.0.1 helm.sh/chart: minio-17.0.5 app.kubernetes.io/component: console app.kubernetes.io/part-of: minio spec: replicas: 1 strategy: type: RollingUpdate selector: matchLabels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/name: minio app.kubernetes.io/component: console app.kubernetes.io/part-of: minio template: metadata: labels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: minio app.kubernetes.io/version: 2025.5.24 helm.sh/chart: minio-17.0.5 app.kubernetes.io/component: console app.kubernetes.io/part-of: minio spec: serviceAccountName: minio-demo automountServiceAccountToken: false affinity: podAffinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchLabels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/name: minio app.kubernetes.io/component: console topologyKey: kubernetes.io/hostname weight: 1 nodeAffinity: securityContext: fsGroup: 1001 fsGroupChangePolicy: Always supplementalGroups: [] sysctls: [] containers: - name: console image: registry.cn-guangzhou.aliyuncs.com/xingcangku/docker.io-bitnami-minio-object-browser:2.0.1-debian-12-r2 imagePullPolicy: IfNotPresent securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true runAsGroup: 1001 runAsNonRoot: true runAsUser: 1001 seLinuxOptions: {} seccompProfile: type: RuntimeDefault args: - server - --host - "0.0.0.0" - --port - "9090" env: - name: CONSOLE_MINIO_SERVER value: "http://minio-demo:9000" resources: limits: cpu: 150m ephemeral-storage: 2Gi memory: 192Mi requests: cpu: 100m ephemeral-storage: 50Mi memory: 128Mi ports: - name: http containerPort: 9090 livenessProbe: failureThreshold: 5 initialDelaySeconds: 5 periodSeconds: 5 successThreshold: 1 timeoutSeconds: 5 tcpSocket: port: http readinessProbe: failureThreshold: 5 initialDelaySeconds: 5 periodSeconds: 5 successThreshold: 1 timeoutSeconds: 5 httpGet: path: /minio port: http volumeMounts: - name: empty-dir mountPath: /tmp subPath: tmp-dir - name: empty-dir mountPath: /.console subPath: app-console-dir volumes: - name: empty-dir emptyDir: {} --- # Source: minio/templates/application.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: minio-demo namespace: "minio" labels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: minio app.kubernetes.io/version: 2025.5.24 helm.sh/chart: minio-17.0.5 app.kubernetes.io/component: minio app.kubernetes.io/part-of: minio spec: selector: matchLabels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/name: minio app.kubernetes.io/component: minio app.kubernetes.io/part-of: minio podManagementPolicy: Parallel replicas: 2 serviceName: minio-demo-headless updateStrategy: type: RollingUpdate template: metadata: labels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: minio app.kubernetes.io/version: 2025.5.24 helm.sh/chart: minio-17.0.5 app.kubernetes.io/component: minio app.kubernetes.io/part-of: minio annotations: checksum/credentials-secret: b06d639ea8d96eecf600100351306b11b3607d0ae288f01fe3489b67b6cc4873 prometheus.io/path: /minio/v2/metrics/cluster prometheus.io/port: "9000" prometheus.io/scrape: "true" spec: serviceAccountName: minio-demo affinity: podAffinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchLabels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/name: minio app.kubernetes.io/component: minio topologyKey: kubernetes.io/hostname weight: 1 nodeAffinity: automountServiceAccountToken: false securityContext: fsGroup: 1001 fsGroupChangePolicy: OnRootMismatch supplementalGroups: [] sysctls: [] initContainers: containers: - name: minio image: registry.cn-guangzhou.aliyuncs.com/xingcangku/docker.io-bitnami-minio:2025.5.24-debian-12-r6 imagePullPolicy: "IfNotPresent" securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true runAsGroup: 1001 runAsNonRoot: true runAsUser: 1001 seLinuxOptions: {} seccompProfile: type: RuntimeDefault env: - name: BITNAMI_DEBUG value: "false" - name: MINIO_DISTRIBUTED_MODE_ENABLED value: "yes" - name: MINIO_DISTRIBUTED_NODES value: "minio-demo-{0...1}.minio-demo-headless.minio.svc.cluster.local:9000/bitnami/minio/data-{0...1}" - name: MINIO_SCHEME value: "http" - name: MINIO_FORCE_NEW_KEYS value: "no" - name: MINIO_ROOT_USER_FILE value: /opt/bitnami/minio/secrets/root-user - name: MINIO_ROOT_PASSWORD_FILE value: /opt/bitnami/minio/secrets/root-password - name: MINIO_SKIP_CLIENT value: "yes" - name: MINIO_API_PORT_NUMBER value: "9000" - name: MINIO_BROWSER value: "off" - name: MINIO_PROMETHEUS_AUTH_TYPE value: "public" - name: MINIO_DATA_DIR value: "/bitnami/minio/data-0" ports: - name: api containerPort: 9000 livenessProbe: httpGet: path: /minio/health/live port: api scheme: "HTTP" initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 5 readinessProbe: tcpSocket: port: api initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 1 successThreshold: 1 failureThreshold: 5 resources: limits: cpu: 375m ephemeral-storage: 2Gi memory: 384Mi requests: cpu: 250m ephemeral-storage: 50Mi memory: 256Mi volumeMounts: - name: empty-dir mountPath: /tmp subPath: tmp-dir - name: empty-dir mountPath: /opt/bitnami/minio/tmp subPath: app-tmp-dir - name: empty-dir mountPath: /.mc subPath: app-mc-dir - name: minio-credentials mountPath: /opt/bitnami/minio/secrets/ - name: data-0 mountPath: /bitnami/minio/data-0 - name: data-1 mountPath: /bitnami/minio/data-1 volumes: - name: empty-dir emptyDir: {} - name: minio-credentials secret: secretName: minio-demo volumeClaimTemplates: - metadata: name: data-0 labels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/name: minio spec: accessModes: - "ReadWriteOnce" resources: requests: storage: "10Gi" storageClassName: local-storage - metadata: name: data-1 labels: app.kubernetes.io/instance: minio-demo app.kubernetes.io/name: minio spec: accessModes: - "ReadWriteOnce" resources: requests: storage: "10Gi" storageClassName: local-storage 5.2.8查看资源信息root@k8s01:~/helm/minio/minio-demo/minio# kubectl get all -n minio NAME READY STATUS RESTARTS AGE pod/minio-demo-0 1/1 Running 10 (5h27m ago) 10d pod/minio-demo-1 1/1 Running 10 (5h27m ago) 27h pod/minio-demo-console-7b586c5f9c-l8hnc 1/1 Running 9 (5h27m ago) 10d NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/minio-demo ClusterIP 10.97.92.61 <none> 9000/TCP 10d service/minio-demo-console ClusterIP 10.101.127.112 <none> 9090/TCP 10d service/minio-demo-headless ClusterIP None <none> 9000/TCP 10d NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/minio-demo-console 1/1 1 1 10d NAME DESIRED CURRENT READY AGE replicaset.apps/minio-demo-console-7b586c5f9c 1 1 1 10d NAME READY AGE statefulset.apps/minio-demo 2/2 10d 5.2.9创建ingress资源#以ingrss-nginx为例: # cat > ingress.yaml << EOF apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: minio-ingreess namespace: minio annotations: nginx.ingress.kubernetes.io/rewrite-target: / spec: ingressClassName: nginx rules: - host: minio.local.com http: paths: - path: / pathType: Prefix backend: service: name: minio port: number: 9001 EOF#以traefik为例: root@k8s01:~/helm/minio/minio-demo/minio# cat ingress.yaml apiVersion: traefik.io/v1alpha1 kind: IngressRoute metadata: name: minio-console namespace: minio spec: entryPoints: - web routes: - match: Host(`minio.local.com`) kind: Rule services: - name: minio-demo-console # 修正为 Console Service 名称 port: 9090 # 修正为 Console 端口 --- apiVersion: traefik.io/v1alpha1 kind: IngressRoute metadata: name: minio-api namespace: minio spec: entryPoints: - web routes: - match: Host(`minio-api.local.com`) kind: Rule services: - name: minio-demo # 保持 API Service 名称 port: 9000 # 保持 API 端口5.2.10获取用户名密码# 获取用户名和密码 [root@k8s-master minio]# kubectl get secret --namespace minio minio -o jsonpath="{.data.root-user}" | base64 -d admin [root@k8s-master minio]# kubectl get secret --namespace minio minio -o jsonpath="{.data.root-password}" | base64 -d HWLLGMhgkp5.2.11访问web管理页5.3operator部署minIO企业版需要收费六、部署 Prometheus如果已安装metrics-server需要先卸载,否则冲突https://axzys.cn/index.php/archives/423/七、部署Thanos监控[可选]Thanos 很好的弥补了 Prometheus 在持久化存储和 多个 prometheus 集群之间跨集群查询方面的不足的问题。具体可参考文档https://thanos.io/, 部署参考文档:https://github.com/thanos-io/kube-thanos,本实例使用 receive 模式部署。 如果需要使用 sidecar 模式部署,可参考文档:https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/platform/thanos.mdhttps://www.cuiliangblog.cn/detail/section/215968508八、部署 Grafanahttps://axzys.cn/index.php/archives/423/九、部署 OpenTelemetryhttps://www.cuiliangblog.cn/detail/section/215947486root@k8s01:~/helm/opentelemetry/cert-manager# cat new-center-collector.yaml apiVersion: opentelemetry.io/v1beta1 kind: OpenTelemetryCollector # 元数据定义部分 metadata: name: center # Collector 的名称为 center namespace: opentelemetry # 具体的配置内容 spec: replicas: 1 # 设置副本数量为1 # image: otel/opentelemetry-collector-contrib:latest # 使用支持 elasticsearch 的镜像 image: registry.cn-guangzhou.aliyuncs.com/xingcangku/otel-opentelemetry-collector-contrib-latest:latest config: # 定义 Collector 配置 receivers: # 接收器,用于接收遥测数据(如 trace、metrics、logs) otlp: # 配置 OTLP(OpenTelemetry Protocol)接收器 protocols: # 启用哪些协议来接收数据 grpc: endpoint: 0.0.0.0:4317 # 启用 gRPC 协议 http: endpoint: 0.0.0.0:4318 # 启用 HTTP 协议 processors: # 处理器,用于处理收集到的数据 batch: {} # 批处理器,用于将数据分批发送,提高效率 exporters: # 导出器,用于将处理后的数据发送到后端系统 debug: {} # 使用 debug 导出器,将数据打印到终端(通常用于测试或调试) otlp: # 数据发送到tempo的grpc端口 endpoint: "tempo:4317" tls: # 跳过证书验证 insecure: true prometheus: endpoint: "0.0.0.0:9464" # prometheus指标暴露端口 loki: endpoint: http://loki-gateway.loki.svc/loki/api/v1/push headers: X-Scope-OrgID: "fake" # 与Grafana配置一致 labels: attributes: # 从日志属性提取 k8s.pod.name: "pod" k8s.container.name: "container" k8s.namespace.name: "namespace" app: "application" # 映射应用中设置的标签 resource: # 从SDK资源属性提取 service.name: "service" service: # 服务配置部分 telemetry: logs: level: "debug" # 设置 Collector 自身日志等级为 debug(方便观察日志) pipelines: # 定义处理管道 traces: # 定义 trace 类型的管道 receivers: [otlp] # 接收器为 OTLP processors: [batch] # 使用批处理器 exporters: [otlp] # 将数据导出到OTLP metrics: # 定义 metrics 类型的管道 receivers: [otlp] # 接收器为 OTLP processors: [batch] # 使用批处理器 exporters: [prometheus] # 将数据导出到prometheus logs: receivers: [otlp] processors: [batch] # 使用批处理器 exporters: [loki] 十、部署 Tempo 10.1Tempo 介绍Grafana Tempo是一个开源、易于使用的大规模分布式跟踪后端。Tempo具有成本效益,仅需要对象存储即可运行,并且与Grafana,Prometheus和Loki深度集成,Tempo可以与任何开源跟踪协议一起使用,包括Jaeger、Zipkin和OpenTelemetry。它仅支持键/值查找,并且旨在与用于发现的日志和度量标准(示例性)协同工作。https://axzys.cn/index.php/archives/418/十一、部署Loki日志收集 11.1 loki 介绍 11.1.1组件功能Loki架构十分简单,由以下三个部分组成: Loki 是主服务器,负责存储日志和处理查询 。 promtail 是代理,负责收集日志并将其发送给 loki 。 Grafana 用于 UI 展示。 只要在应用程序服务器上安装promtail来收集日志然后发送给Loki存储,就可以在Grafana UI界面通过添加Loki为数据源进行日志查询11.1.2系统架构Distributor(接收日志入口):负责接收客户端发送的日志,进行标签解析、预处理、分片计算,转发给 Ingester。 Ingester(日志暂存处理):处理 Distributor 发送的日志,缓存到内存,定期刷写到对象存储或本地。支持查询时返回缓存数据。 Querier(日志查询器):负责处理来自 Grafana 或其他客户端的查询请求,并从 Ingester 和 Store 中读取数据。 Index:boltdb-shipper 模式的 Index 提供者 在分布式部署中,读取和缓存 index 数据,避免 S3 等远程存储频繁请求。 Chunks 是Loki 中一种核心的数据结构和存储形式,主要由 ingester 负责生成和管理。它不是像 distributor、querier 那样的可部署服务,但在 Loki 架构和存储中极其关键。11.1.3 部署 lokiloki 也分为整体式 、微服务式、可扩展式三种部署模式,具体可参考文档https://grafana.com/docs/loki/latest/setup/install/helm/concepts/,此处以可扩展式为例: loki 使用 minio 对象存储配置可参考文档:https://blog.min.io/how-to-grafana-loki-minio/# helm repo add grafana https://grafana.github.io/helm-charts "grafana" has been added to your repositories # helm pull grafana/loki --untar # ls charts Chart.yaml README.md requirements.lock requirements.yaml templates values.yaml--- # Source: loki/templates/backend/poddisruptionbudget-backend.yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: loki-backend namespace: loki labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: backend spec: selector: matchLabels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: backend maxUnavailable: 1 --- # Source: loki/templates/chunks-cache/poddisruptionbudget-chunks-cache.yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: loki-memcached-chunks-cache namespace: loki labels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: memcached-chunks-cache spec: selector: matchLabels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: memcached-chunks-cache maxUnavailable: 1 --- # Source: loki/templates/read/poddisruptionbudget-read.yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: loki-read namespace: loki labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: read spec: selector: matchLabels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: read maxUnavailable: 1 --- # Source: loki/templates/results-cache/poddisruptionbudget-results-cache.yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: loki-memcached-results-cache namespace: loki labels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: memcached-results-cache spec: selector: matchLabels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: memcached-results-cache maxUnavailable: 1 --- # Source: loki/templates/write/poddisruptionbudget-write.yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: loki-write namespace: loki labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: write spec: selector: matchLabels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: write maxUnavailable: 1 --- # Source: loki/templates/loki-canary/serviceaccount.yaml apiVersion: v1 kind: ServiceAccount metadata: name: loki-canary namespace: loki labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: canary automountServiceAccountToken: true --- # Source: loki/templates/serviceaccount.yaml apiVersion: v1 kind: ServiceAccount metadata: name: loki namespace: loki labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" automountServiceAccountToken: true --- # Source: loki/templates/config.yaml apiVersion: v1 kind: ConfigMap metadata: name: loki namespace: loki data: config.yaml: | auth_enabled: true bloom_build: builder: planner_address: loki-backend-headless.loki.svc.cluster.local:9095 enabled: false bloom_gateway: client: addresses: dnssrvnoa+_grpc._tcp.loki-backend-headless.loki.svc.cluster.local enabled: false chunk_store_config: chunk_cache_config: background: writeback_buffer: 500000 writeback_goroutines: 1 writeback_size_limit: 500MB memcached: batch_size: 4 parallelism: 5 memcached_client: addresses: dnssrvnoa+_memcached-client._tcp.loki-chunks-cache.loki.svc consistent_hash: true max_idle_conns: 72 timeout: 2000ms common: compactor_address: 'http://loki-backend:3100' path_prefix: /var/loki replication_factor: 3 frontend: scheduler_address: "" tail_proxy_url: "" frontend_worker: scheduler_address: "" index_gateway: mode: simple limits_config: max_cache_freshness_per_query: 10m query_timeout: 300s reject_old_samples: true reject_old_samples_max_age: 168h split_queries_by_interval: 15m volume_enabled: true memberlist: join_members: - loki-memberlist pattern_ingester: enabled: false query_range: align_queries_with_step: true cache_results: true results_cache: cache: background: writeback_buffer: 500000 writeback_goroutines: 1 writeback_size_limit: 500MB memcached_client: addresses: dnssrvnoa+_memcached-client._tcp.loki-results-cache.loki.svc consistent_hash: true timeout: 500ms update_interval: 1m ruler: storage: s3: access_key_id: admin bucketnames: null endpoint: minio-demo.minio.svc:9000 insecure: true s3: s3://admin:8fGYikcyi4@minio-demo.minio.svc:9000/loki s3forcepathstyle: true secret_access_key: 8fGYikcyi4 type: s3 wal: dir: /var/loki/ruler-wal runtime_config: file: /etc/loki/runtime-config/runtime-config.yaml schema_config: configs: - from: "2024-04-01" index: period: 24h prefix: index_ object_store: s3 schema: v13 store: tsdb server: grpc_listen_port: 9095 http_listen_port: 3100 http_server_read_timeout: 600s http_server_write_timeout: 600s storage_config: aws: access_key_id: admin secret_access_key: 8fGYikcyi4 region: "" endpoint: minio-demo.minio.svc:9000 insecure: true s3forcepathstyle: true bucketnames: loki bloom_shipper: working_directory: /var/loki/data/bloomshipper boltdb_shipper: index_gateway_client: server_address: dns+loki-backend-headless.loki.svc.cluster.local:9095 hedging: at: 250ms max_per_second: 20 up_to: 3 tsdb_shipper: index_gateway_client: server_address: dns+loki-backend-headless.loki.svc.cluster.local:9095 tracing: enabled: false --- # Source: loki/templates/gateway/configmap-gateway.yaml apiVersion: v1 kind: ConfigMap metadata: name: loki-gateway namespace: loki labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: gateway data: nginx.conf: | worker_processes 5; ## loki: 1 error_log /dev/stderr; pid /tmp/nginx.pid; worker_rlimit_nofile 8192; events { worker_connections 4096; ## loki: 1024 } http { client_body_temp_path /tmp/client_temp; proxy_temp_path /tmp/proxy_temp_path; fastcgi_temp_path /tmp/fastcgi_temp; uwsgi_temp_path /tmp/uwsgi_temp; scgi_temp_path /tmp/scgi_temp; client_max_body_size 4M; proxy_read_timeout 600; ## 10 minutes proxy_send_timeout 600; proxy_connect_timeout 600; proxy_http_version 1.1; #loki_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] $status ' '"$request" $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log /dev/stderr main; sendfile on; tcp_nopush on; resolver kube-dns.kube-system.svc.cluster.local.; server { listen 8080; listen [::]:8080; location = / { return 200 'OK'; auth_basic off; } ######################################################## # Configure backend targets location ^~ /ui { proxy_pass http://loki-write.loki.svc.cluster.local:3100$request_uri; } # Distributor location = /api/prom/push { proxy_pass http://loki-write.loki.svc.cluster.local:3100$request_uri; } location = /loki/api/v1/push { proxy_pass http://loki-write.loki.svc.cluster.local:3100$request_uri; } location = /distributor/ring { proxy_pass http://loki-write.loki.svc.cluster.local:3100$request_uri; } location = /otlp/v1/logs { proxy_pass http://loki-write.loki.svc.cluster.local:3100$request_uri; } # Ingester location = /flush { proxy_pass http://loki-write.loki.svc.cluster.local:3100$request_uri; } location ^~ /ingester/ { proxy_pass http://loki-write.loki.svc.cluster.local:3100$request_uri; } location = /ingester { internal; # to suppress 301 } # Ring location = /ring { proxy_pass http://loki-write.loki.svc.cluster.local:3100$request_uri; } # MemberListKV location = /memberlist { proxy_pass http://loki-write.loki.svc.cluster.local:3100$request_uri; } # Ruler location = /ruler/ring { proxy_pass http://loki-backend.loki.svc.cluster.local:3100$request_uri; } location = /api/prom/rules { proxy_pass http://loki-backend.loki.svc.cluster.local:3100$request_uri; } location ^~ /api/prom/rules/ { proxy_pass http://loki-backend.loki.svc.cluster.local:3100$request_uri; } location = /loki/api/v1/rules { proxy_pass http://loki-backend.loki.svc.cluster.local:3100$request_uri; } location ^~ /loki/api/v1/rules/ { proxy_pass http://loki-backend.loki.svc.cluster.local:3100$request_uri; } location = /prometheus/api/v1/alerts { proxy_pass http://loki-backend.loki.svc.cluster.local:3100$request_uri; } location = /prometheus/api/v1/rules { proxy_pass http://loki-backend.loki.svc.cluster.local:3100$request_uri; } # Compactor location = /compactor/ring { proxy_pass http://loki-backend.loki.svc.cluster.local:3100$request_uri; } location = /loki/api/v1/delete { proxy_pass http://loki-backend.loki.svc.cluster.local:3100$request_uri; } location = /loki/api/v1/cache/generation_numbers { proxy_pass http://loki-backend.loki.svc.cluster.local:3100$request_uri; } # IndexGateway location = /indexgateway/ring { proxy_pass http://loki-backend.loki.svc.cluster.local:3100$request_uri; } # QueryScheduler location = /scheduler/ring { proxy_pass http://loki-backend.loki.svc.cluster.local:3100$request_uri; } # Config location = /config { proxy_pass http://loki-write.loki.svc.cluster.local:3100$request_uri; } # QueryFrontend, Querier location = /api/prom/tail { proxy_pass http://loki-read.loki.svc.cluster.local:3100$request_uri; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; } location = /loki/api/v1/tail { proxy_pass http://loki-read.loki.svc.cluster.local:3100$request_uri; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; } location ^~ /api/prom/ { proxy_pass http://loki-read.loki.svc.cluster.local:3100$request_uri; } location = /api/prom { internal; # to suppress 301 } # if the X-Query-Tags header is empty, set a noop= without a value as empty values are not logged set $query_tags $http_x_query_tags; if ($query_tags !~* '') { set $query_tags "noop="; } location ^~ /loki/api/v1/ { # pass custom headers set by Grafana as X-Query-Tags which are logged as key/value pairs in metrics.go log messages proxy_set_header X-Query-Tags "${query_tags},user=${http_x_grafana_user},dashboard_id=${http_x_dashboard_uid},dashboard_title=${http_x_dashboard_title},panel_id=${http_x_panel_id},panel_title=${http_x_panel_title},source_rule_uid=${http_x_rule_uid},rule_name=${http_x_rule_name},rule_folder=${http_x_rule_folder},rule_version=${http_x_rule_version},rule_source=${http_x_rule_source},rule_type=${http_x_rule_type}"; proxy_pass http://loki-read.loki.svc.cluster.local:3100$request_uri; } location = /loki/api/v1 { internal; # to suppress 301 } } } --- # Source: loki/templates/runtime-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: loki-runtime namespace: loki labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" data: runtime-config.yaml: | {} --- # Source: loki/templates/backend/clusterrole.yaml kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" name: loki-clusterrole rules: - apiGroups: [""] # "" indicates the core API group resources: ["configmaps", "secrets"] verbs: ["get", "watch", "list"] --- # Source: loki/templates/backend/clusterrolebinding.yaml kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: loki-clusterrolebinding labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" subjects: - kind: ServiceAccount name: loki namespace: loki roleRef: kind: ClusterRole name: loki-clusterrole apiGroup: rbac.authorization.k8s.io --- # Source: loki/templates/backend/query-scheduler-discovery.yaml apiVersion: v1 kind: Service metadata: name: loki-query-scheduler-discovery namespace: loki labels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: backend prometheus.io/service-monitor: "false" annotations: spec: type: ClusterIP clusterIP: None publishNotReadyAddresses: true ports: - name: http-metrics port: 3100 targetPort: http-metrics protocol: TCP - name: grpc port: 9095 targetPort: grpc protocol: TCP selector: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: backend --- # Source: loki/templates/backend/service-backend-headless.yaml apiVersion: v1 kind: Service metadata: name: loki-backend-headless namespace: loki labels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: backend variant: headless prometheus.io/service-monitor: "false" annotations: spec: type: ClusterIP clusterIP: None ports: - name: http-metrics port: 3100 targetPort: http-metrics protocol: TCP - name: grpc port: 9095 targetPort: grpc protocol: TCP appProtocol: tcp selector: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: backend --- # Source: loki/templates/backend/service-backend.yaml apiVersion: v1 kind: Service metadata: name: loki-backend namespace: loki labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: backend annotations: spec: type: ClusterIP ports: - name: http-metrics port: 3100 targetPort: http-metrics protocol: TCP - name: grpc port: 9095 targetPort: grpc protocol: TCP selector: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: backend --- # Source: loki/templates/chunks-cache/service-chunks-cache-headless.yaml apiVersion: v1 kind: Service metadata: name: loki-chunks-cache labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: "memcached-chunks-cache" annotations: {} namespace: "loki" spec: type: ClusterIP clusterIP: None ports: - name: memcached-client port: 11211 targetPort: 11211 - name: http-metrics port: 9150 targetPort: 9150 selector: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: "memcached-chunks-cache" --- # Source: loki/templates/gateway/service-gateway.yaml apiVersion: v1 kind: Service metadata: name: loki-gateway namespace: loki labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: gateway prometheus.io/service-monitor: "false" annotations: spec: type: ClusterIP ports: - name: http-metrics port: 80 targetPort: http-metrics protocol: TCP selector: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: gateway --- # Source: loki/templates/loki-canary/service.yaml apiVersion: v1 kind: Service metadata: name: loki-canary namespace: loki labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: canary annotations: spec: type: ClusterIP ports: - name: http-metrics port: 3500 targetPort: http-metrics protocol: TCP selector: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: canary --- # Source: loki/templates/read/service-read-headless.yaml apiVersion: v1 kind: Service metadata: name: loki-read-headless namespace: loki labels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: read variant: headless prometheus.io/service-monitor: "false" annotations: spec: type: ClusterIP clusterIP: None ports: - name: http-metrics port: 3100 targetPort: http-metrics protocol: TCP - name: grpc port: 9095 targetPort: grpc protocol: TCP appProtocol: tcp selector: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: read --- # Source: loki/templates/read/service-read.yaml apiVersion: v1 kind: Service metadata: name: loki-read namespace: loki labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: read annotations: spec: type: ClusterIP ports: - name: http-metrics port: 3100 targetPort: http-metrics protocol: TCP - name: grpc port: 9095 targetPort: grpc protocol: TCP selector: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: read --- # Source: loki/templates/results-cache/service-results-cache-headless.yaml apiVersion: v1 kind: Service metadata: name: loki-results-cache labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: "memcached-results-cache" annotations: {} namespace: "loki" spec: type: ClusterIP clusterIP: None ports: - name: memcached-client port: 11211 targetPort: 11211 - name: http-metrics port: 9150 targetPort: 9150 selector: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: "memcached-results-cache" --- # Source: loki/templates/service-memberlist.yaml apiVersion: v1 kind: Service metadata: name: loki-memberlist namespace: loki labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" annotations: spec: type: ClusterIP clusterIP: None ports: - name: tcp port: 7946 targetPort: http-memberlist protocol: TCP selector: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/part-of: memberlist --- # Source: loki/templates/write/service-write-headless.yaml apiVersion: v1 kind: Service metadata: name: loki-write-headless namespace: loki labels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: write variant: headless prometheus.io/service-monitor: "false" annotations: spec: type: ClusterIP clusterIP: None ports: - name: http-metrics port: 3100 targetPort: http-metrics protocol: TCP - name: grpc port: 9095 targetPort: grpc protocol: TCP appProtocol: tcp selector: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: write --- # Source: loki/templates/write/service-write.yaml apiVersion: v1 kind: Service metadata: name: loki-write namespace: loki labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: write annotations: spec: type: ClusterIP ports: - name: http-metrics port: 3100 targetPort: http-metrics protocol: TCP - name: grpc port: 9095 targetPort: grpc protocol: TCP selector: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: write --- # Source: loki/templates/loki-canary/daemonset.yaml apiVersion: apps/v1 kind: DaemonSet metadata: name: loki-canary namespace: loki labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: canary spec: selector: matchLabels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: canary updateStrategy: rollingUpdate: maxUnavailable: 1 type: RollingUpdate template: metadata: labels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: canary spec: serviceAccountName: loki-canary securityContext: fsGroup: 10001 runAsGroup: 10001 runAsNonRoot: true runAsUser: 10001 containers: - name: loki-canary image: registry.cn-guangzhou.aliyuncs.com/xingcangku/grafana-loki-canary-3.5.0:3.5.0 imagePullPolicy: IfNotPresent args: - -addr=loki-gateway.loki.svc.cluster.local.:80 - -labelname=pod - -labelvalue=$(POD_NAME) - -user=self-monitoring - -tenant-id=self-monitoring - -pass= - -push=true securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true volumeMounts: ports: - name: http-metrics containerPort: 3500 protocol: TCP env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name readinessProbe: httpGet: path: /metrics port: http-metrics initialDelaySeconds: 15 timeoutSeconds: 1 volumes: --- # Source: loki/templates/gateway/deployment-gateway-nginx.yaml apiVersion: apps/v1 kind: Deployment metadata: name: loki-gateway namespace: loki labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: gateway spec: replicas: 1 strategy: type: RollingUpdate revisionHistoryLimit: 10 selector: matchLabels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: gateway template: metadata: annotations: checksum/config: 440a9cd2e87de46e0aad42617818d58f1e2daacb1ae594bad1663931faa44ebc labels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: gateway spec: serviceAccountName: loki enableServiceLinks: true securityContext: fsGroup: 101 runAsGroup: 101 runAsNonRoot: true runAsUser: 101 terminationGracePeriodSeconds: 30 containers: - name: nginx image: registry.cn-guangzhou.aliyuncs.com/xingcangku/docker.io-nginxinc-nginx-unprivileged-1.28-alpine:1.28-alpine imagePullPolicy: IfNotPresent ports: - name: http-metrics containerPort: 8080 protocol: TCP readinessProbe: httpGet: path: / port: http-metrics initialDelaySeconds: 15 timeoutSeconds: 1 securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true volumeMounts: - name: config mountPath: /etc/nginx - name: tmp mountPath: /tmp - name: docker-entrypoint-d-override mountPath: /docker-entrypoint.d resources: {} affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: app.kubernetes.io/component: gateway topologyKey: kubernetes.io/hostname volumes: - name: config configMap: name: loki-gateway - name: tmp emptyDir: {} - name: docker-entrypoint-d-override emptyDir: {} --- # Source: loki/templates/read/deployment-read.yaml apiVersion: apps/v1 kind: Deployment metadata: name: loki-read namespace: loki labels: app.kubernetes.io/part-of: memberlist helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: read spec: replicas: 3 strategy: rollingUpdate: maxSurge: 0 maxUnavailable: 1 revisionHistoryLimit: 10 selector: matchLabels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: read template: metadata: annotations: checksum/config: 1616415aaf41d5dec62fea8a013eab1aa2a559579f5f72299f7041e5cd6ea4c7 labels: app.kubernetes.io/part-of: memberlist app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: read spec: serviceAccountName: loki automountServiceAccountToken: true securityContext: fsGroup: 10001 runAsGroup: 10001 runAsNonRoot: true runAsUser: 10001 terminationGracePeriodSeconds: 30 containers: - name: loki image: registry.cn-guangzhou.aliyuncs.com/xingcangku/docker.io-grafana-loki-3.5.0:3.5.0 imagePullPolicy: IfNotPresent args: - -config.file=/etc/loki/config/config.yaml - -target=read - -legacy-read-mode=false - -common.compactor-grpc-address=loki-backend.loki.svc.cluster.local:9095 ports: - name: http-metrics containerPort: 3100 protocol: TCP - name: grpc containerPort: 9095 protocol: TCP - name: http-memberlist containerPort: 7946 protocol: TCP securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true readinessProbe: httpGet: path: /ready port: http-metrics initialDelaySeconds: 30 timeoutSeconds: 1 volumeMounts: - name: config mountPath: /etc/loki/config - name: runtime-config mountPath: /etc/loki/runtime-config - name: tmp mountPath: /tmp - name: data mountPath: /var/loki resources: {} affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: app.kubernetes.io/component: read topologyKey: kubernetes.io/hostname volumes: - name: tmp emptyDir: {} - name: data emptyDir: {} - name: config configMap: name: loki items: - key: "config.yaml" path: "config.yaml" - name: runtime-config configMap: name: loki-runtime --- # Source: loki/templates/backend/statefulset-backend.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: loki-backend namespace: loki labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: backend app.kubernetes.io/part-of: memberlist spec: replicas: 3 podManagementPolicy: Parallel updateStrategy: rollingUpdate: partition: 0 serviceName: loki-backend-headless revisionHistoryLimit: 10 persistentVolumeClaimRetentionPolicy: whenDeleted: Delete whenScaled: Delete selector: matchLabels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: backend template: metadata: annotations: checksum/config: 1616415aaf41d5dec62fea8a013eab1aa2a559579f5f72299f7041e5cd6ea4c7 labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: backend app.kubernetes.io/part-of: memberlist spec: serviceAccountName: loki automountServiceAccountToken: true securityContext: fsGroup: 10001 runAsGroup: 10001 runAsNonRoot: true runAsUser: 10001 terminationGracePeriodSeconds: 300 containers: - name: loki-sc-rules image: "registry.cn-guangzhou.aliyuncs.com/xingcangku/kiwigrid-k8s-sidecar-1.30.3:1.30.3" imagePullPolicy: IfNotPresent env: - name: METHOD value: WATCH - name: LABEL value: "loki_rule" - name: FOLDER value: "/rules" - name: RESOURCE value: "both" - name: WATCH_SERVER_TIMEOUT value: "60" - name: WATCH_CLIENT_TIMEOUT value: "60" - name: LOG_LEVEL value: "INFO" securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true volumeMounts: - name: sc-rules-volume mountPath: "/rules" - name: loki image: registry.cn-guangzhou.aliyuncs.com/xingcangku/docker.io-grafana-loki-3.5.0:3.5.0 imagePullPolicy: IfNotPresent args: - -config.file=/etc/loki/config/config.yaml - -target=backend - -legacy-read-mode=false ports: - name: http-metrics containerPort: 3100 protocol: TCP - name: grpc containerPort: 9095 protocol: TCP - name: http-memberlist containerPort: 7946 protocol: TCP securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true readinessProbe: httpGet: path: /ready port: http-metrics initialDelaySeconds: 30 timeoutSeconds: 1 volumeMounts: - name: config mountPath: /etc/loki/config - name: runtime-config mountPath: /etc/loki/runtime-config - name: tmp mountPath: /tmp - name: data mountPath: /var/loki - name: sc-rules-volume mountPath: "/rules" resources: {} affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: app.kubernetes.io/component: backend topologyKey: kubernetes.io/hostname volumes: - name: tmp emptyDir: {} - name: config configMap: name: loki items: - key: "config.yaml" path: "config.yaml" - name: runtime-config configMap: name: loki-runtime - name: sc-rules-volume emptyDir: {} volumeClaimTemplates: - metadata: name: data spec: storageClassName: "ceph-cephfs" # 显式指定存储类 accessModes: - ReadWriteOnce resources: requests: storage: 10Gi --- # Source: loki/templates/chunks-cache/statefulset-chunks-cache.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: loki-chunks-cache labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: "memcached-chunks-cache" name: "memcached-chunks-cache" annotations: {} namespace: "loki" spec: podManagementPolicy: Parallel replicas: 1 selector: matchLabels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: "memcached-chunks-cache" name: "memcached-chunks-cache" updateStrategy: type: RollingUpdate serviceName: loki-chunks-cache template: metadata: labels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: "memcached-chunks-cache" name: "memcached-chunks-cache" annotations: spec: serviceAccountName: loki securityContext: fsGroup: 11211 runAsGroup: 11211 runAsNonRoot: true runAsUser: 11211 initContainers: [] nodeSelector: {} affinity: {} topologySpreadConstraints: [] tolerations: [] terminationGracePeriodSeconds: 60 containers: - name: memcached image: registry.cn-guangzhou.aliyuncs.com/xingcangku/memcached-1.6.38-alpine:1.6.38-alpine imagePullPolicy: IfNotPresent resources: limits: memory: 4096Mi requests: cpu: 500m memory: 2048Mi ports: - containerPort: 11211 name: client args: - -m 4096 - --extended=modern,track_sizes - -I 5m - -c 16384 - -v - -u 11211 env: envFrom: securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true - name: exporter image: registry.cn-guangzhou.aliyuncs.com/xingcangku/prom-memcached-exporter-v0.15.2:v0.15.2 imagePullPolicy: IfNotPresent ports: - containerPort: 9150 name: http-metrics args: - "--memcached.address=localhost:11211" - "--web.listen-address=0.0.0.0:9150" resources: limits: {} requests: {} securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true --- # Source: loki/templates/results-cache/statefulset-results-cache.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: loki-results-cache labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: "memcached-results-cache" name: "memcached-results-cache" annotations: {} namespace: "loki" spec: podManagementPolicy: Parallel replicas: 1 selector: matchLabels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: "memcached-results-cache" name: "memcached-results-cache" updateStrategy: type: RollingUpdate serviceName: loki-results-cache template: metadata: labels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: "memcached-results-cache" name: "memcached-results-cache" annotations: spec: serviceAccountName: loki securityContext: fsGroup: 11211 runAsGroup: 11211 runAsNonRoot: true runAsUser: 11211 initContainers: [] nodeSelector: {} affinity: {} topologySpreadConstraints: [] tolerations: [] terminationGracePeriodSeconds: 60 containers: - name: memcached image: registry.cn-guangzhou.aliyuncs.com/xingcangku/memcached-1.6.38-alpine:1.6.38-alpine imagePullPolicy: IfNotPresent resources: limits: memory: 1229Mi requests: cpu: 500m memory: 1229Mi ports: - containerPort: 11211 name: client args: - -m 1024 - --extended=modern,track_sizes - -I 5m - -c 16384 - -v - -u 11211 env: envFrom: securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true - name: exporter image: registry.cn-guangzhou.aliyuncs.com/xingcangku/prom-memcached-exporter-v0.15.2:v0.15.2 imagePullPolicy: IfNotPresent ports: - containerPort: 9150 name: http-metrics args: - "--memcached.address=localhost:11211" - "--web.listen-address=0.0.0.0:9150" resources: limits: {} requests: {} securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true --- # Source: loki/templates/write/statefulset-write.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: loki-write namespace: loki labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: write app.kubernetes.io/part-of: memberlist spec: replicas: 3 podManagementPolicy: Parallel updateStrategy: rollingUpdate: partition: 0 serviceName: loki-write-headless revisionHistoryLimit: 10 selector: matchLabels: app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/component: write template: metadata: annotations: checksum/config: 1616415aaf41d5dec62fea8a013eab1aa2a559579f5f72299f7041e5cd6ea4c7 labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: write app.kubernetes.io/part-of: memberlist spec: serviceAccountName: loki automountServiceAccountToken: true enableServiceLinks: true securityContext: fsGroup: 10001 runAsGroup: 10001 runAsNonRoot: true runAsUser: 10001 terminationGracePeriodSeconds: 300 containers: - name: loki image: registry.cn-guangzhou.aliyuncs.com/xingcangku/docker.io-grafana-loki-3.5.0:3.5.0 imagePullPolicy: IfNotPresent args: - -config.file=/etc/loki/config/config.yaml - -target=write ports: - name: http-metrics containerPort: 3100 protocol: TCP - name: grpc containerPort: 9095 protocol: TCP - name: http-memberlist containerPort: 7946 protocol: TCP securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true readinessProbe: httpGet: path: /ready port: http-metrics initialDelaySeconds: 30 timeoutSeconds: 1 volumeMounts: - name: config mountPath: /etc/loki/config - name: runtime-config mountPath: /etc/loki/runtime-config - name: data mountPath: /var/loki resources: {} affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: app.kubernetes.io/component: write topologyKey: kubernetes.io/hostname volumes: - name: config configMap: name: loki items: - key: "config.yaml" path: "config.yaml" - name: runtime-config configMap: name: loki-runtime volumeClaimTemplates: - apiVersion: v1 kind: PersistentVolumeClaim metadata: name: data spec: accessModes: - ReadWriteOnce resources: requests: storage: "10Gi" --- # Source: loki/templates/tests/test-canary.yaml apiVersion: v1 kind: Pod metadata: name: "loki-helm-test" namespace: loki labels: helm.sh/chart: loki-6.30.1 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.5.0" app.kubernetes.io/component: helm-test annotations: "helm.sh/hook": test spec: containers: - name: loki-helm-test image: registry.cn-guangzhou.aliyuncs.com/xingcangku/docker.io-grafana-loki-helm-test-ewelch-distributed-helm-chart-1:ewelch-distributed-helm-chart-17db5ee env: - name: CANARY_SERVICE_ADDRESS value: "http://loki-canary:3500/metrics" - name: CANARY_PROMETHEUS_ADDRESS value: "" - name: CANARY_TEST_TIMEOUT value: "1m" args: - -test.v restartPolicy: Never root@k8s01:~/helm/loki/loki# kubectl get pod -n loki NAME READY STATUS RESTARTS AGE loki-backend-0 2/2 Running 2 (6h13m ago) 30h loki-backend-1 2/2 Running 2 (6h13m ago) 30h loki-backend-2 2/2 Running 2 (6h13m ago) 30h loki-canary-62z48 1/1 Running 1 (6h13m ago) 30h loki-canary-lg62j 1/1 Running 1 (6h13m ago) 30h loki-canary-nrph4 1/1 Running 1 (6h13m ago) 30h loki-chunks-cache-0 2/2 Running 0 6h12m loki-gateway-75d8cf9754-nwpdw 1/1 Running 13 (6h12m ago) 30h loki-read-dc7bdc98-8kzwk 1/1 Running 1 (6h13m ago) 30h loki-read-dc7bdc98-lmzcd 1/1 Running 1 (6h13m ago) 30h loki-read-dc7bdc98-nrz5h 1/1 Running 1 (6h13m ago) 30h loki-results-cache-0 2/2 Running 2 (6h13m ago) 30h loki-write-0 1/1 Running 1 (6h13m ago) 30h loki-write-1 1/1 Running 1 (6h13m ago) 30h loki-write-2 1/1 Running 1 (6h13m ago) 30h root@k8s01:~/helm/loki/loki# kubectl get svc -n loki NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE loki-backend ClusterIP 10.101.131.151 <none> 3100/TCP,9095/TCP 30h loki-backend-headless ClusterIP None <none> 3100/TCP,9095/TCP 30h loki-canary ClusterIP 10.109.131.175 <none> 3500/TCP 30h loki-chunks-cache ClusterIP None <none> 11211/TCP,9150/TCP 30h loki-gateway ClusterIP 10.98.126.160 <none> 80/TCP 30h loki-memberlist ClusterIP None <none> 7946/TCP 30h loki-query-scheduler-discovery ClusterIP None <none> 3100/TCP,9095/TCP 30h loki-read ClusterIP 10.103.248.164 <none> 3100/TCP,9095/TCP 30h loki-read-headless ClusterIP None <none> 3100/TCP,9095/TCP 30h loki-results-cache ClusterIP None <none> 11211/TCP,9150/TCP 30h loki-write ClusterIP 10.108.223.18 <none> 3100/TCP,9095/TCP 30h loki-write-headless ClusterIP None <none> 3100/TCP,9095/TCP 30h code here...
2025年06月20日
10 阅读
0 评论
0 点赞
2025-06-16
链路追踪数据收集与导出
链路追踪数据收集与导出一、链路数据收集方案在 Kubernetes 中部署应用进行链路追踪数据收集,常见有两种方案: 1、基于 Instrumentation Operator 的自动注入(自动埋点) 通过部署 OpenTelemetry Operator,并创建 Instrumentation 自定义资源(CRD),实现对应用容器的自动注入 SDK 或 Sidecar,从而无需修改应用代码即可采集追踪数据。适合需要快速接入、统一管理、降低改造成本的场景。 2、手动在应用中集成 OpenTelemetry SDK(手动埋点) 在应用程序代码中直接引入 OpenTelemetry SDK,手动埋点关键业务逻辑,控制 trace span 的粒度和内容,并将数据通过 OTLP(OpenTelemetry Protocol)协议导出到后端(如 OpenTelemetry Collector、Jaeger、Tempo 等)。适合需要精准控制追踪数据质量或已有自定义采集需求的场景。 接下来以Instrumentation Operator自动注入方式演示如何收集并处理数据。二、部署测试应用接下来我们部署一个HotROD 演示程序,它内置了OpenTelemetry SDK,我们只需要配置 opentelemetry 接收地址既可,具体可参考文档: https://github.com/jaegertracing/jaeger/tree/main/examples/hotrodapiVersion: apps/v1 kind: Deployment metadata: name: go-demo spec: selector: matchLabels: app: go-demo template: metadata: labels: app: go-demo spec: containers: - name: go-demo image: jaegertracing/example-hotrod:latest imagePullPolicy: IfNotPresent resources: limits: memory: "500Mi" cpu: "200m" ports: - containerPort: 8080 env: - name: OTEL_EXPORTER_OTLP_ENDPOINT # opentelemetry服务地址 value: http://center-collector.opentelemetry.svc:4318 --- apiVersion: v1 kind: Service metadata: name: go-demo spec: selector: app: go-demo ports: - port: 8080 targetPort: 8080 --- apiVersion: traefik.io/v1alpha1 kind: IngressRoute metadata: name: go-demo spec: entryPoints: - web routes: - match: Host(`go-demo.cuiliangblog.cn`) kind: Rule services: - name: go-demo port: 8080接下来浏览器添加 hosts 解析后访问测试三、Jaeger方案 3.1Jaeger介绍 Jaeger 是Uber公司研发,后来贡献给CNCF的一个分布式链路追踪软件,主要用于微服务链路追踪。它优点是性能高(能处理大量追踪数据)、部署灵活(支持单节点和分布式部署)、集成方便(兼容 OpenTelemetry),并且可视化能力强,可以快速定位性能瓶颈和故障。基于上述示意图,我们简要解析下 Jaeger 各个组件以及组件间的关系: Client libraries(客户端库) 功能:将追踪信息(trace/span)插入到应用程序中。 说明: 支持多种语言,如 Go、Java、Python、Node.js 等。 通常使用 OpenTelemetry SDK 或 Jaeger Tracer。 将生成的追踪数据发送到 Agent 或 Collector。 Agent(代理) 功能:接收客户端发来的追踪数据,批量转发给 Collector。 说明: 接收 UDP 数据包(更轻量) 向 Collector 使用 gRPC 发送数据 Collector(收集器) 功能: 接收 Agent 或直接从 SDK 发送的追踪数据。 处理(转码、校验等)后写入存储后端。 可横向扩展,提高吞吐能力。 Ingester(摄取器)(可选) 功能:在使用 Kafka 作为中间缓冲队列时,Ingester 从 Kafka 消费数据并写入存储。 用途:解耦收集与存储、提升稳定性。 Storage Backend(存储后端) 功能:保存追踪数据,供查询和分析使用。 支持: Elasticsearch Cassandra Kafka(用于异步摄取) Badger(仅用于开发) OpenSearch Query(查询服务) 功能:从存储中查询追踪数据,提供给前端 UI 使用。 提供 API 接口:供 UI 或其他系统(如 Grafana Tempo)调用。 UI(前端界面) 功能: 可视化展示 Trace、Span、服务依赖图。 支持搜索条件(服务名、时间范围、trace ID 等)。 常用用途: 查看慢请求 分析请求调用链 排查错误或瓶颈 在本示例中,指标数据采集与收集由 OpenTelemetry 实现,仅需要使用 jaeger-collector 组件接收输入,存入 elasticsearch,使用 jaeger-query 组件查询展示数据既可。3.2部署 Jaeger(all in one)apiVersion: apps/v1 kind: Deployment metadata: name: jaeger namespace: opentelemetry labels: app: jaeger spec: replicas: 1 selector: matchLabels: app: jaeger template: metadata: labels: app: jaeger spec: containers: - name: jaeger image: jaegertracing/all-in-one:latest args: - "--collector.otlp.enabled=true" # 启用 OTLP gRPC - "--collector.otlp.grpc.host-port=0.0.0.0:4317" resources: limits: memory: "2Gi" cpu: "1" ports: - containerPort: 6831 protocol: UDP - containerPort: 16686 protocol: TCP - containerPort: 4317 protocol: TCP --- apiVersion: v1 kind: Service metadata: name: jaeger namespace: opentelemetry labels: app: jaeger spec: selector: app: jaeger ports: - name: jaeger-udp port: 6831 targetPort: 6831 protocol: UDP - name: jaeger-ui port: 16686 targetPort: 16686 protocol: TCP - name: otlp-grpc port: 4317 targetPort: 4317 protocol: TCP --- apiVersion: traefik.io/v1alpha1 kind: IngressRoute metadata: name: jaeger namespace: opentelemetry spec: entryPoints: - web routes: - match: Host(`jaeger.cuiliangblog.cn`) kind: Rule services: - name: jaeger port: 166863.3部署 Jaeger(分布式)all in one 数据存放在内存中不具备高可用性,生产环境中建议使用Elasticsearch 或 OpenSearch 作为 Cassandra 的存储后端,以 ElasticSearch 为例,部署操作具体可参考文档:https://www.cuiliangblog.cn/detail/section/162609409导出 ca 证书# kubectl -n elasticsearch get secret elasticsearch-es-http-certs-public -o go-template='{{index .data "ca.crt" | base64decode }}' > ca.crt # kubectl create secret -n opentelemetry generic es-tls-secret --from-file=ca.crt=./ca.crt secret/es-tls-secret created获取 chart 包# helm repo add jaegertracing https://jaegertracing.github.io/helm-charts "jaegertracing" has been added to your repositories # helm search repo jaegertracing NAME CHART VERSION APP VERSION DESCRIPTION jaegertracing/jaeger 3.4.1 1.53.0 A Jaeger Helm chart for Kubernetes jaegertracing/jaeger-operator 2.57.0 1.61.0 jaeger-operator Helm chart for Kubernetes # helm pull jaegertracing/jaeger --untar # cd jaeger # ls Chart.lock charts Chart.yaml README.md templates values.yaml修改安装参数apiVersion: v1 kind: ServiceAccount metadata: name: jaeger-collector labels: helm.sh/chart: jaeger-3.4.1 app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/version: "1.53.0" app.kubernetes.io/managed-by: Helm app.kubernetes.io/component: collector automountServiceAccountToken: false --- # Source: jaeger/templates/query-sa.yaml apiVersion: v1 kind: ServiceAccount metadata: name: jaeger-query labels: helm.sh/chart: jaeger-3.4.1 app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/version: "1.53.0" app.kubernetes.io/managed-by: Helm app.kubernetes.io/component: query automountServiceAccountToken: false --- # Source: jaeger/templates/spark-sa.yaml apiVersion: v1 kind: ServiceAccount metadata: name: jaeger-spark labels: helm.sh/chart: jaeger-3.4.1 app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/version: "1.53.0" app.kubernetes.io/managed-by: Helm app.kubernetes.io/component: spark automountServiceAccountToken: false --- # Source: jaeger/templates/collector-svc.yaml apiVersion: v1 kind: Service metadata: name: jaeger-collector labels: helm.sh/chart: jaeger-3.4.1 app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/version: "1.53.0" app.kubernetes.io/managed-by: Helm app.kubernetes.io/component: collector spec: ports: - name: grpc port: 14250 protocol: TCP targetPort: grpc appProtocol: grpc - name: http port: 14268 protocol: TCP targetPort: http appProtocol: http - name: otlp-grpc port: 4317 protocol: TCP targetPort: otlp-grpc - name: otlp-http port: 4318 protocol: TCP targetPort: otlp-http - name: admin port: 14269 targetPort: admin selector: app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/component: collector type: ClusterIP --- # Source: jaeger/templates/query-svc.yaml apiVersion: v1 kind: Service metadata: name: jaeger-query labels: helm.sh/chart: jaeger-3.4.1 app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/version: "1.53.0" app.kubernetes.io/managed-by: Helm app.kubernetes.io/component: query spec: ports: - name: query port: 80 protocol: TCP targetPort: query - name: grpc port: 16685 protocol: TCP targetPort: grpc - name: admin port: 16687 protocol: TCP targetPort: admin selector: app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/component: query type: ClusterIP --- # Source: jaeger/templates/collector-deploy.yaml apiVersion: apps/v1 kind: Deployment metadata: name: jaeger-collector labels: helm.sh/chart: jaeger-3.4.1 app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/version: "1.53.0" app.kubernetes.io/managed-by: Helm app.kubernetes.io/component: collector spec: replicas: 1 selector: matchLabels: app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/component: collector template: metadata: annotations: checksum/config-env: 75a11da44c802486bc6f65640aa48a730f0f684c5c07a42ba3cd1735eb3fb070 labels: app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/component: collector spec: securityContext: {} serviceAccountName: jaeger-collector containers: - name: jaeger-collector securityContext: {} image: registry.cn-guangzhou.aliyuncs.com/xingcangku/jaeger-collector:1.53.0 imagePullPolicy: IfNotPresent args: env: - name: COLLECTOR_OTLP_ENABLED value: "true" - name: SPAN_STORAGE_TYPE value: elasticsearch - name: ES_SERVER_URLS value: https://elasticsearch-client.elasticsearch.svc:9200 - name: ES_TLS_SKIP_HOST_VERIFY # 添加临时跳过主机名验证 value: "true" - name: ES_USERNAME value: elastic - name: ES_PASSWORD valueFrom: secretKeyRef: name: jaeger-elasticsearch key: password - name: ES_TLS_ENABLED value: "true" - name: ES_TLS_CA value: /es-tls/ca.crt ports: - containerPort: 14250 name: grpc protocol: TCP - containerPort: 14268 name: http protocol: TCP - containerPort: 14269 name: admin protocol: TCP - containerPort: 4317 name: otlp-grpc protocol: TCP - containerPort: 4318 name: otlp-http protocol: TCP readinessProbe: httpGet: path: / port: admin livenessProbe: httpGet: path: / port: admin resources: {} volumeMounts: - name: es-tls-secret mountPath: /es-tls/ca.crt subPath: ca-cert.pem readOnly: true dnsPolicy: ClusterFirst restartPolicy: Always volumes: - name: es-tls-secret secret: secretName: es-tls-secret --- # Source: jaeger/templates/query-deploy.yaml apiVersion: apps/v1 kind: Deployment metadata: name: jaeger-query labels: helm.sh/chart: jaeger-3.4.1 app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/version: "1.53.0" app.kubernetes.io/managed-by: Helm app.kubernetes.io/component: query spec: replicas: 1 selector: matchLabels: app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/component: query template: metadata: labels: app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/component: query spec: securityContext: {} serviceAccountName: jaeger-query containers: - name: jaeger-query securityContext: {} image: registry.cn-guangzhou.aliyuncs.com/xingcangku/jaegertracing-jaeger-query:1.53.0 imagePullPolicy: IfNotPresent args: env: - name: SPAN_STORAGE_TYPE value: elasticsearch - name: ES_SERVER_URLS value: https://elasticsearch-client.elasticsearch.svc:9200 - name: ES_TLS_SKIP_HOST_VERIFY # 添加临时跳过主机名验证 value: "true" - name: ES_USERNAME value: elastic - name: ES_PASSWORD valueFrom: secretKeyRef: name: jaeger-elasticsearch key: password - name: ES_TLS_ENABLED value: "true" - name: ES_TLS_CA value: /es-tls/ca.crt - name: QUERY_BASE_PATH value: "/" - name: JAEGER_AGENT_PORT value: "6831" ports: - name: query containerPort: 16686 protocol: TCP - name: grpc containerPort: 16685 protocol: TCP - name: admin containerPort: 16687 protocol: TCP resources: {} volumeMounts: - name: es-tls-secret mountPath: /es-tls/ca.crt subPath: ca-cert.pem readOnly: true livenessProbe: httpGet: path: / port: admin readinessProbe: httpGet: path: / port: admin - name: jaeger-agent-sidecar securityContext: {} image: registry.cn-guangzhou.aliyuncs.com/xingcangku/jaegertracing-jaeger-agent:1.53.0 imagePullPolicy: IfNotPresent args: env: - name: REPORTER_GRPC_HOST_PORT value: jaeger-collector:14250 ports: - name: admin containerPort: 14271 protocol: TCP resources: null volumeMounts: livenessProbe: httpGet: path: / port: admin readinessProbe: httpGet: path: / port: admin dnsPolicy: ClusterFirst restartPolicy: Always volumes: - name: es-tls-secret secret: secretName: es-tls-secret --- # Source: jaeger/templates/spark-cronjob.yaml apiVersion: batch/v1 kind: CronJob metadata: name: jaeger-spark labels: helm.sh/chart: jaeger-3.4.1 app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/version: "1.53.0" app.kubernetes.io/managed-by: Helm app.kubernetes.io/component: spark spec: schedule: "49 23 * * *" successfulJobsHistoryLimit: 5 failedJobsHistoryLimit: 5 concurrencyPolicy: Forbid jobTemplate: spec: template: metadata: labels: app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/component: spark spec: serviceAccountName: jaeger-spark securityContext: {} containers: - name: jaeger-spark image: registry.cn-guangzhou.aliyuncs.com/xingcangku/jaegertracing-spark-dependencies:latest imagePullPolicy: IfNotPresent args: env: - name: STORAGE value: elasticsearch - name: ES_SERVER_URLS value: https://elasticsearch-client.elasticsearch.svc:9200 - name: ES_USERNAME value: elastic - name: ES_PASSWORD valueFrom: secretKeyRef: name: jaeger-elasticsearch key: password - name: ES_TLS_ENABLED value: "true" - name: ES_TLS_CA value: /es-tls/ca.crt - name: ES_NODES value: https://elasticsearch-client.elasticsearch.svc:9200 - name: ES_NODES_WAN_ONLY value: "false" resources: {} volumeMounts: securityContext: {} restartPolicy: OnFailure volumes: --- # Source: jaeger/templates/elasticsearch-secret.yaml apiVersion: v1 kind: Secret metadata: name: jaeger-elasticsearch labels: helm.sh/chart: jaeger-3.4.1 app.kubernetes.io/name: jaeger app.kubernetes.io/instance: jaeger app.kubernetes.io/version: "1.53.0" app.kubernetes.io/managed-by: Helm annotations: "helm.sh/hook": pre-install,pre-upgrade "helm.sh/hook-weight": "-1" "helm.sh/hook-delete-policy": before-hook-creation "helm.sh/resource-policy": keep type: Opaque data: password: "ZWdvbjY2Ng=="安装 jaegerroot@k8s01:~/helm/jaeger/jaeger# kubectl delete -n opentelemetry -f test.yaml serviceaccount "jaeger-collector" deleted serviceaccount "jaeger-query" deleted serviceaccount "jaeger-spark" deleted service "jaeger-collector" deleted service "jaeger-query" deleted deployment.apps "jaeger-collector" deleted deployment.apps "jaeger-query" deleted cronjob.batch "jaeger-spark" deleted secret "jaeger-elasticsearch" deleted root@k8s01:~/helm/jaeger/jaeger# vi test.yaml root@k8s01:~/helm/jaeger/jaeger# kubectl apply -n opentelemetry -f test.yaml serviceaccount/jaeger-collector created serviceaccount/jaeger-query created serviceaccount/jaeger-spark created service/jaeger-collector created service/jaeger-query created deployment.apps/jaeger-collector created deployment.apps/jaeger-query created cronjob.batch/jaeger-spark created secret/jaeger-elasticsearch created root@k8s01:~/helm/jaeger/jaeger# kubectl get pods -n opentelemetry -w NAME READY STATUS RESTARTS AGE center-collector-78f7bbdf45-j798s 1/1 Running 2 (6h2m ago) 30h jaeger-7989549bb9-hn8jh 1/1 Running 2 (6h2m ago) 25h jaeger-collector-7f8fb4c946-nkg4m 1/1 Running 0 3s jaeger-query-5cdb7b68bd-xpftn 2/2 Running 0 3s ^Croot@k8s01:~/helm/jaeger/jaeger# kubectl get svc -n opentelemetry | grep jaeger jaeger ClusterIP 10.100.251.219 <none> 6831/UDP,16686/TCP,4317/TCP 25h jaeger-collector ClusterIP 10.111.17.41 <none> 14250/TCP,14268/TCP,4317/TCP,4318/TCP,14269/TCP 51s jaeger-query ClusterIP 10.98.118.118 <none> 80/TCP,16685/TCP,16687/TCP 51s创建 ingress 资源root@k8s01:~/helm/jaeger/jaeger# cat jaeger.yaml apiVersion: traefik.io/v1alpha1 kind: IngressRoute metadata: name: jaeger namespace: opentelemetry spec: entryPoints: - web routes: - match: Host(`jaeger.axinga.cn`) kind: Rule services: - name: jaeger port: 16686接下来配置 hosts 解析后浏览器访问既可。配置 CollectorapiVersion: opentelemetry.io/v1beta1 kind: OpenTelemetryCollector # 元数据定义部分 metadata: name: center # Collector 的名称为 center namespace: opentelemetry # 具体的配置内容 spec: replicas: 1 # 设置副本数量为1 config: # 定义 Collector 配置 receivers: # 接收器,用于接收遥测数据(如 trace、metrics、logs) otlp: # 配置 OTLP(OpenTelemetry Protocol)接收器 protocols: # 启用哪些协议来接收数据 grpc: endpoint: 0.0.0.0:4317 # 启用 gRPC 协议 http: endpoint: 0.0.0.0:4318 # 启用 HTTP 协议 processors: # 处理器,用于处理收集到的数据 batch: {} # 批处理器,用于将数据分批发送,提高效率 exporters: # 导出器,用于将处理后的数据发送到后端系统 # debug: {} # 使用 debug 导出器,将数据打印到终端(通常用于测试或调试) otlp: # 数据发送到jaeger的grpc端口 endpoint: "jaeger-collector:4317" tls: # 跳过证书验证 insecure: true service: # 服务配置部分 pipelines: # 定义处理管道 traces: # 定义 trace 类型的管道 receivers: [otlp] # 接收器为 OTLP processors: [batch] # 使用批处理器 exporters: [otlp] # 将数据发送到otlp接下来我们随机访问 demo 应用,并在 jaeger 查看链路追踪数据。Jaeger 系统找到了一些 trace 并显示了一些关于该 trace 的元数据,包括参与该 trace 的不同服务的名称以及每个服务发送到 Jaeger 的 span 记录数。jaeger 使用具体可参考文章https://medium.com/jaegertracing/take-jaeger-for-a-hotrod-ride-233cf43e46c2四、Tempo 方案4.1Tempo 介绍Grafana Tempo是一个开源、易于使用的大规模分布式跟踪后端。Tempo具有成本效益,仅需要对象存储即可运行,并且与Grafana,Prometheus和Loki深度集成,Tempo可以与任何开源跟踪协议一起使用,包括Jaeger、Zipkin和OpenTelemetry。它仅支持键/值查找,并且旨在与用于发现的日志和度量标准(示例性)协同工作Distributors(分发器) 功能:接收客户端发送的追踪数据并进行初步验证 说明: 对 Trace 进行分片、标签处理。 将数据转发给合适的 Ingesters。 Ingesters(摄取器) 功能:处理和持久化 Trace 数据 说明: 接收来自 Distributor 的数据。 在内存中缓存直到追踪完成(完整的 Trace)。 再写入后端对象存储。 Storage(对象存储) 功能:持久化存储 Trace 数据 说明: 支持多种对象存储(S3、GCS、MinIO、Azure Blob 等)。 Tempo 存储的是压缩的完整 Trace 文件,使用 trace ID 进行索引。 Compactor(数据压缩) 功能:合并 trace 数据,压缩多个小 block 成一个大 block。 说明: 可以单独运行 compactor 容器或进程。 通常以 后台任务 的方式运行,不参与实时 ingest 或 query。 Tempo Query(查询前端) 功能:处理来自用户或 Grafana 的查询请求 说明: 接收查询请求。 提供缓存、合并和调度功能,优化查询性能。 将请求转发给 Querier。 Querier(查询器) 功能:从存储中检索 Trace 数据 说明: 根据 trace ID 从对象存储中检索完整 trace。 解压和返回结构化的 Span 数据。 返回结果供 Grafana 或其他前端展示。4.2部署 Tempo推荐用Helm 安装,官方提供了tempo-distributed Helm chart 和 tempo Helm chart 两种部署模式,一般来说本地测试使用 tempo Helm chart,而生产环境可以使用 Tempo 的微服务部署方式 tempo-distributed。接下来以整体模式为例,具体可参考文档https://github.com/grafana/helm-charts/tree/main/charts/tempo 创建 s3 的 bucket、ak、sk 资源,并配置权限。具体可参考上面minio4.2.1获取 chart 包# helm repo add grafana https://grafana.github.io/helm-charts # helm pull grafana/tempo --untar # cd tempo # ls Chart.yaml README.md README.md.gotmpl templates values.yaml4.2.2修改配置,prometheus 默认未启用远程写入,可参考文章开启远程写入https://www.cuiliangblog.cn/detail/section/15189202# vim values.yaml tempo: storage: trace: # 默认使用本地文件存储,改为使用s3对象存储 backend: s3 s3: bucket: tempo # store traces in this bucket endpoint: minio-service.minio.svc:9000 # api endpoint access_key: zbsIQQnsp871ZnZ2AuKr # optional. access key when using static credentials. secret_key: zxL5EeXwU781M8inSBPcgY49mEbBVoR1lvFCX4JU # optional. secret key when using static credentials. insecure: true # 跳过证书验证4.2.3创建 temporoot@k8s01:~/helm/opentelemetry/tempo# cat test.yaml --- # Source: tempo/templates/serviceaccount.yaml apiVersion: v1 kind: ServiceAccount metadata: name: tempo namespace: opentelemetry labels: helm.sh/chart: tempo-1.23.1 app.kubernetes.io/name: tempo app.kubernetes.io/instance: tempo app.kubernetes.io/version: "2.8.0" app.kubernetes.io/managed-by: Helm automountServiceAccountToken: true --- # Source: tempo/templates/configmap-tempo.yaml apiVersion: v1 kind: ConfigMap metadata: name: tempo namespace: opentelemetry labels: helm.sh/chart: tempo-1.23.1 app.kubernetes.io/name: tempo app.kubernetes.io/instance: tempo app.kubernetes.io/version: "2.8.0" app.kubernetes.io/managed-by: Helm data: overrides.yaml: | overrides: {} tempo.yaml: | memberlist: cluster_label: "tempo.opentelemetry" multitenancy_enabled: false usage_report: reporting_enabled: true compactor: compaction: block_retention: 24h distributor: receivers: jaeger: protocols: grpc: endpoint: 0.0.0.0:14250 thrift_binary: endpoint: 0.0.0.0:6832 thrift_compact: endpoint: 0.0.0.0:6831 thrift_http: endpoint: 0.0.0.0:14268 otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 ingester: {} server: http_listen_port: 3200 storage: trace: backend: s3 s3: access_key: admin bucket: tempo endpoint: minio-demo.minio.svc:9000 secret_key: 8fGYikcyi4 insecure: true #tls: false wal: path: /var/tempo/wal querier: {} query_frontend: {} overrides: defaults: {} per_tenant_override_config: /conf/overrides.yaml --- # Source: tempo/templates/service.yaml apiVersion: v1 kind: Service metadata: name: tempo namespace: opentelemetry labels: helm.sh/chart: tempo-1.23.1 app.kubernetes.io/name: tempo app.kubernetes.io/instance: tempo app.kubernetes.io/version: "2.8.0" app.kubernetes.io/managed-by: Helm spec: type: ClusterIP ports: - name: tempo-jaeger-thrift-compact port: 6831 protocol: UDP targetPort: 6831 - name: tempo-jaeger-thrift-binary port: 6832 protocol: UDP targetPort: 6832 - name: tempo-prom-metrics port: 3200 protocol: TCP targetPort: 3200 - name: tempo-jaeger-thrift-http port: 14268 protocol: TCP targetPort: 14268 - name: grpc-tempo-jaeger port: 14250 protocol: TCP targetPort: 14250 - name: tempo-zipkin port: 9411 protocol: TCP targetPort: 9411 - name: tempo-otlp-legacy port: 55680 protocol: TCP targetPort: 55680 - name: tempo-otlp-http-legacy port: 55681 protocol: TCP targetPort: 55681 - name: grpc-tempo-otlp port: 4317 protocol: TCP targetPort: 4317 - name: tempo-otlp-http port: 4318 protocol: TCP targetPort: 4318 - name: tempo-opencensus port: 55678 protocol: TCP targetPort: 55678 selector: app.kubernetes.io/name: tempo app.kubernetes.io/instance: tempo --- # Source: tempo/templates/statefulset.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: tempo namespace: opentelemetry labels: helm.sh/chart: tempo-1.23.1 app.kubernetes.io/name: tempo app.kubernetes.io/instance: tempo app.kubernetes.io/version: "2.8.0" app.kubernetes.io/managed-by: Helm spec: replicas: 1 selector: matchLabels: app.kubernetes.io/name: tempo app.kubernetes.io/instance: tempo serviceName: tempo-headless template: metadata: labels: app.kubernetes.io/name: tempo app.kubernetes.io/instance: tempo annotations: checksum/config: 563d333fcd3b266c31add18d53e0fa1f5e6ed2e1588e6ed4c466a8227285129b spec: serviceAccountName: tempo automountServiceAccountToken: true containers: - args: - -config.file=/conf/tempo.yaml - -mem-ballast-size-mbs=1024 image: registry.cn-guangzhou.aliyuncs.com/xingcangku/grafana-tempo-2.8.0:2.8.0 imagePullPolicy: IfNotPresent name: tempo ports: - containerPort: 3200 name: prom-metrics - containerPort: 6831 name: jaeger-thrift-c protocol: UDP - containerPort: 6832 name: jaeger-thrift-b protocol: UDP - containerPort: 14268 name: jaeger-thrift-h - containerPort: 14250 name: jaeger-grpc - containerPort: 9411 name: zipkin - containerPort: 55680 name: otlp-legacy - containerPort: 4317 name: otlp-grpc - containerPort: 55681 name: otlp-httplegacy - containerPort: 4318 name: otlp-http - containerPort: 55678 name: opencensus livenessProbe: failureThreshold: 3 httpGet: path: /ready port: 3200 initialDelaySeconds: 30 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 readinessProbe: failureThreshold: 3 httpGet: path: /ready port: 3200 initialDelaySeconds: 20 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 resources: {} env: volumeMounts: - mountPath: /conf name: tempo-conf securityContext: fsGroup: 10001 runAsGroup: 10001 runAsNonRoot: true runAsUser: 10001 volumes: - configMap: name: tempo name: tempo-conf updateStrategy: type: RollingUpdate root@k8s01:~/helm/opentelemetry/tempo# kubectl get pod -n opentelemetry NAME READY STATUS RESTARTS AGE center-collector-67dcddd7db-8hd98 1/1 Running 0 4h3m tempo-0 1/1 Running 35 (5h57m ago) 8d root@k8s01:~/helm/opentelemetry/tempo# kubectl get svc -n opentelemetry | grep tempo tempo ClusterIP 10.105.249.189 <none> 6831/UDP,6832/UDP,3200/TCP,14268/TCP,14250/TCP,9411/TCP,55680/TCP,55681/TCP,4317/TCP,4318/TCP,55678/TCP 8d root@k8s01:~/helm/opentelemetry/tempo# 4.2.4配置 Collector#按之前上面的完整配置 下面可以参考 tempo 服务的otlp 数据接收端口分别为4317(grpc)和4318(http),修改OpenTelemetryCollector 配置,将数据发送到 tempo 的 otlp 接收端口。 apiVersion: opentelemetry.io/v1beta1 kind: OpenTelemetryCollector # 元数据定义部分 metadata: name: center # Collector 的名称为 center namespace: opentelemetry # 具体的配置内容 spec: replicas: 1 # 设置副本数量为1 config: # 定义 Collector 配置 receivers: # 接收器,用于接收遥测数据(如 trace、metrics、logs) otlp: # 配置 OTLP(OpenTelemetry Protocol)接收器 protocols: # 启用哪些协议来接收数据 grpc: endpoint: 0.0.0.0:4317 # 启用 gRPC 协议 http: endpoint: 0.0.0.0:4318 # 启用 HTTP 协议 processors: # 处理器,用于处理收集到的数据 batch: {} # 批处理器,用于将数据分批发送,提高效率 exporters: # 导出器,用于将处理后的数据发送到后端系统 # debug: {} # 使用 debug 导出器,将数据打印到终端(通常用于测试或调试) otlp: # 数据发送到tempo的grpc端口 endpoint: "tempo:4317" tls: # 跳过证书验证 insecure: true service: # 服务配置部分 pipelines: # 定义处理管道 traces: # 定义 trace 类型的管道 receivers: [otlp] # 接收器为 OTLP processors: [batch] # 使用批处理器 exporters: [otlp] # 将数据打印到OTLP4.2.5访问验证4.2.6服务拓扑图配置Tempo Metrics Generator 是 Grafana Tempo 提供的一个 可选组件,用于将 Trace(链路追踪数据)转换为 Metrics(指标数据),从而实现 Trace-to-Metrics(T2M) 的能力,默认情况下 tempo 并未启用该功能。4.2.6.1prometheus 开启remote-write-receiver 功能,关键配置如下:# vim prometheus-prometheus.yaml spec: # enableFeatures: enableFeatures: # 开启远程写入 - remote-write-receiver externalLabels: web.enable-remote-write-receiver: "true" # kubectl apply -f prometheus-prometheus.yaml具体可参考文档:https://m.cuiliangblog.cn/detail/section/151892024.2.6.2tempo 开启metricsGenerator 功能,关键配置如下:# vim values.yaml global: per_tenant_override_config: /runtime-config/overrides.yaml metrics_generator_processors: - 'service-graphs' - 'span-metrics' tempo: metricsGenerator: enabled: true # 从 Trace 中自动生成 metrics(指标),用于服务调用关系图 remoteWriteUrl: "http://prometheus-k8s.monitoring.svc:9090/api/v1/write" # prometheus地址 overrides: # 多租户默认配置启用metrics defaults: metrics_generator: processors: - service-graphs - span-metrics4.2.6.3此时查询 prometheus 图表,可以获取traces 相关指标grafana 数据源启用节点图与服务图,配置如下查看服务图数据
2025年06月16日
10 阅读
1 评论
0 点赞
2025-06-15
OpenTelemetry数据收集
一、收集器配置详解OpenTelemetry 的 Collector 组件是实现观测数据(Trace、Metrics、Logs)收集、处理和导出的一站式服务。它的配置主要分为以下 四大核心模块: receivers(接收数据) processors(数据处理) exporters(导出数据) service(工作流程)1、配置格式#具体配置项可参考文档https://opentelemetry.io/docs/collector/configuration/ apiVersion: opentelemetry.io/v1beta1 kind: OpenTelemetryCollector # 定义资源类型为 OpenTelemetryCollector metadata: name: sidecar # Collector 的名称 spec: mode: sidecar # 以 sidecar 模式运行(与应用容器同 Pod) config: # Collector 配置部分(结构化 YAML) receivers: # 数据接收器(如 otlp、prometheus) processors: # 数据处理器(如 batch、resource、attributes) exporters: # 数据导出器(如 otlp、logging、jaeger、prometheus) service: # 服务配置(定义哪些 pipeline 生效) pipelines: traces: # trace 数据的处理流程 metrics: # metric 数据的处理流程 logs: # log 数据的处理流程2、Receivers(接收器)用于接收数据。支持的类型有很多, otlp:接收 otlp 协议的数据内容 receivers: otlp: protocols: grpc: # 高性能、推荐使用 endpoint: 0.0.0.0:4317 http: # 浏览器或无 gRPC 支持的环境 endpoint: 0.0.0.0:4318prometheus: 用于采集 /metrics 接口的数据。 receivers: prometheus: config: scrape_configs: - job_name: my-service static_configs: - targets: ['my-app:8080']filelog: 从文件读取日志 receivers: filelog: include: [ /var/log/myapp/*.log ] start_at: beginning operators: - type: json_parser parse_from: body timestamp: parse_from: attributes.time3、Processors(处理器)用于在导出前对数据进行修改、增强或过滤。常用的包括: batch : 将数据批处理后导出,提高吞吐量。 processors: batch: timeout: 10s send_batch_size: 1024resource : 为 trace/metric/log 添加统一标签。 processors: resource: attributes: - key: service.namespace value: demo action: insertattributes : 添加、修改或删除属性 processors: attributes: actions: - key: http.method value: GET action: insert处理器配置可参考文档:https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor4、Exporters(导出器)用于将数据导出到后端系统 otlp: 用于将数据发送到另一个 OTEL Collector、Jaeger、Tempo、Datadog 等。 exporters: otlp: endpoint: tempo-collector:4317 tls: insecure: truePrometheus: 用于暴露一个 /metrics HTTP 端口给 Prometheus 拉取。 exporters: prometheus: endpoint: "0.0.0.0:8889"logging : 调试用,打印数据到控制台。 exporters: debug: loglevel: debug5、Service(工作流程)service.pipelines 是一个“调度图”,告诉 OpenTelemetry Collector,对于某种类型的数据,比如 trace,请用哪个 receiver 来接收,用哪些 processor 来处理,最终送到哪些 exporter 去导出。service: pipelines: traces: receivers: [otlp] processors: [batch, resource] exporters: [otlp, logging] metrics: receivers: [prometheus] processors: [batch] exporters: [prometheus] logs: receivers: [filelog] processors: [batch] exporters: [otlp]二、Collector 发行版本区别opentelemetry-collector 和 opentelemetry-collector-contrib 是两个 OpenTelemetry Collector 的发行版本,它们的区别主要在于 内置组件的丰富程度 和 维护主体。
2025年06月15日
4 阅读
0 评论
0 点赞
2025-06-14
OpenTelemetry 应用埋点
一、部署示例应用 1、部署java应用apiVersion: apps/v1 kind: Deployment metadata: name: java-demo spec: selector: matchLabels: app: java-demo template: metadata: labels: app: java-demo spec: containers: - name: java-demo image: registry.cn-guangzhou.aliyuncs.com/xingcangku/spring-petclinic:1.5.1 imagePullPolicy: IfNotPresent resources: limits: memory: "1Gi" # 增加内存 cpu: "500m" ports: - containerPort: 8080 --- apiVersion: v1 kind: Service metadata: name: java-demo spec: type: ClusterIP # 改为 ClusterIP,Traefik 使用服务发现 selector: app: java-demo ports: - port: 80 targetPort: 8080 --- apiVersion: traefik.io/v1alpha1 kind: IngressRoute metadata: name: java-demo spec: entryPoints: - web # 使用 WEB 入口点 (端口 8000) routes: - match: Host(`java-demo.local.cn`) # 可以修改为您需要的域名 kind: Rule services: - name: java-demo port: 80 2、部署python应用apiVersion: apps/v1 kind: Deployment metadata: name: python-demo spec: selector: matchLabels: app: python-demo template: metadata: labels: app: python-demo spec: containers: - name: python-demo image: registry.cn-guangzhou.aliyuncs.com/xingcangku/python-demoapp:latest imagePullPolicy: IfNotPresent resources: limits: memory: "500Mi" cpu: "200m" ports: - containerPort: 5000 --- apiVersion: v1 kind: Service metadata: name: python-demo spec: selector: app: python-demo ports: - port: 5000 targetPort: 5000 --- apiVersion: traefik.io/v1alpha1 kind: IngressRoute metadata: name: python-demo spec: entryPoints: - web routes: - match: Host(`python-demo.local.com`) kind: Rule services: - name: python-demo port: 5000二、应用埋点 1、java应用自动埋点apiVersion: opentelemetry.io/v1alpha1 kind: Instrumentation # 声明资源类型为 Instrumentation(用于语言自动注入) metadata: name: java-instrumentation # Instrumentation 资源的名称(可以被 Deployment 等引用) namespace: opentelemetry spec: propagators: # 指定用于 trace 上下文传播的方式,支持多种格式 - tracecontext # W3C Trace Context(最通用的跨服务追踪格式) - baggage # 传播用户定义的上下文键值对 - b3 # Zipkin 的 B3 header(用于兼容 Zipkin 环境) sampler: # 定义采样策略(决定是否收集 trace) type: always_on # 始终采样所有请求(适合测试或调试环境) java: # image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest # 使用的 Java 自动注入 agent 镜像地址 image: harbor.cuiliangblog.cn/otel/autoinstrumentation-java:latest env: - name: OTEL_EXPORTER_OTLP_ENDPOINT value: http://center-collector.opentelemetry.svc:4318#为了启用自动检测,我们需要更新部署文件并向其添加注解。这样我们可以告诉 OpenTelemetry Operator 将 sidecar 和 java-instrumentation 注入到我们的应用程序中。修改 Deployment 配置如下: apiVersion: apps/v1 kind: Deployment metadata: name: java-demo spec: selector: matchLabels: app: java-demo template: metadata: labels: app: java-demo annotations: instrumentation.opentelemetry.io/inject-java: "opentelemetry/java-instrumentation" # 填写 Instrumentation 资源的名称 sidecar.opentelemetry.io/inject: "opentelemetry/sidecar" # 注入一个 sidecar 模式的 OpenTelemetry Collector spec: containers: - name: java-demo image: registry.cn-guangzhou.aliyuncs.com/xingcangku/spring-petclinic:1.5.1 imagePullPolicy: IfNotPresent resources: limits: memory: "500Mi" cpu: "200m" ports: - containerPort: 8080#接下来更新 deployment,然后查看资源信息,java-demo 容器已经变为两个。 root@k8s01:~/helm/opentelemetry# kubectl get pods NAME READY STATUS RESTARTS AGE java-demo-5cdd74d47-vmqqx 0/2 Init:0/1 0 6s java-demo-5f4d989b88-xrzg7 1/1 Running 0 42m my-sonarqube-postgresql-0 1/1 Running 8 (2d21h ago) 9d my-sonarqube-sonarqube-0 0/1 Pending 0 6d6h python-demo-69c56c549c-jcgmj 1/1 Running 0 16m redis-5ff4857944-v2vz5 1/1 Running 5 (2d21h ago) 6d2h root@k8s01:~/helm/opentelemetry# kubectl get pods -w NAME READY STATUS RESTARTS AGE java-demo-5cdd74d47-vmqqx 0/2 PodInitializing 0 9s java-demo-5f4d989b88-xrzg7 1/1 Running 0 42m my-sonarqube-postgresql-0 1/1 Running 8 (2d21h ago) 9d my-sonarqube-sonarqube-0 0/1 Pending 0 6d6h python-demo-69c56c549c-jcgmj 1/1 Running 0 17m redis-5ff4857944-v2vz5 1/1 Running 5 (2d21h ago) 6d2h java-demo-5cdd74d47-vmqqx 2/2 Running 0 23s java-demo-5f4d989b88-xrzg7 1/1 Terminating 0 43m java-demo-5f4d989b88-xrzg7 0/1 Terminating 0 43m java-demo-5f4d989b88-xrzg7 0/1 Terminating 0 43m java-demo-5f4d989b88-xrzg7 0/1 Terminating 0 43m java-demo-5f4d989b88-xrzg7 0/1 Terminating 0 43m root@k8s01:~/helm/opentelemetry# kubectl get pods -w NAME READY STATUS RESTARTS AGE java-demo-5cdd74d47-vmqqx 2/2 Running 0 28s my-sonarqube-postgresql-0 1/1 Running 8 (2d21h ago) 9d my-sonarqube-sonarqube-0 0/1 Pending 0 6d6h python-demo-69c56c549c-jcgmj 1/1 Running 0 17m redis-5ff4857944-v2vz5 1/1 Running 5 (2d21h ago) 6d2h ^Croot@k8s01:~/helm/opentelemetry# kubectl get opentelemetrycollectors -A NAMESPACE NAME MODE VERSION READY AGE IMAGE MANAGEMENT opentelemetry center deployment 0.127.0 1/1 3h22m registry.cn-guangzhou.aliyuncs.com/xingcangku/opentelemetry-collector-0.127.0:0.127.0 managed opentelemetry sidecar sidecar 0.127.0 3h19m managed root@k8s01:~/helm/opentelemetry# kubectl get instrumentations -A NAMESPACE NAME AGE ENDPOINT SAMPLER SAMPLER ARG opentelemetry java-instrumentation 2m26s always_on #查看 sidecar日志,已正常启动并发送 spans 数据 root@k8s01:~/helm/opentelemetry# kubectl logs java-demo-5cdd74d47-vmqqx -c otc-container 2025-06-14T15:31:35.013Z info service@v0.127.0/service.go:199 Setting up own telemetry... {"resource": {}} 2025-06-14T15:31:35.014Z debug builders/builders.go:24 Stable component. {"resource": {}, "otelcol.component.id": "otlp", "otelcol.component.kind": "exporter", "otelcol.signal": "traces"} 2025-06-14T15:31:35.014Z info builders/builders.go:26 Development component. May change in the future. {"resource": {}, "otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "traces"} 2025-06-14T15:31:35.014Z debug builders/builders.go:24 Beta component. May change in the future. {"resource": {}, "otelcol.component.id": "batch", "otelcol.component.kind": "processor", "otelcol.pipeline.id": "traces", "otelcol.signal": "traces"} 2025-06-14T15:31:35.014Z debug builders/builders.go:24 Stable component. {"resource": {}, "otelcol.component.id": "otlp", "otelcol.component.kind": "receiver", "otelcol.signal": "traces"} 2025-06-14T15:31:35.014Z debug otlpreceiver@v0.127.0/otlp.go:58 created signal-agnostic logger {"resource": {}, "otelcol.component.id": "otlp", "otelcol.component.kind": "receiver"} 2025-06-14T15:31:35.021Z info service@v0.127.0/service.go:266 Starting otelcol... {"resource": {}, "Version": "0.127.0", "NumCPU": 8} 2025-06-14T15:31:35.021Z info extensions/extensions.go:41 Starting extensions... {"resource": {}} 2025-06-14T15:31:35.021Z info grpc@v1.72.1/clientconn.go:176 [core] original dial target is: "center-collector.opentelemetry.svc:4317" {"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.021Z info grpc@v1.72.1/clientconn.go:459 [core] [Channel #1]Channel created {"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.021Z info grpc@v1.72.1/clientconn.go:207 [core] [Channel #1]parsed dial target is: resolver.Target{URL:url.URL{Scheme:"passthrough", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/center-collector.opentelemetry.svc:4317", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}} {"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.021Z info grpc@v1.72.1/clientconn.go:208 [core] [Channel #1]Channel authority set to "center-collector.opentelemetry.svc:4317" {"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.022Z info grpc@v1.72.1/resolver_wrapper.go:210 [core] [Channel #1]Resolver state updated: { "Addresses": [ { "Addr": "center-collector.opentelemetry.svc:4317", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Endpoints": [ { "Addresses": [ { "Addr": "center-collector.opentelemetry.svc:4317", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Attributes": null } ], "ServiceConfig": null, "Attributes": null } (resolver returned new addresses) {"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.022Z info grpc@v1.72.1/balancer_wrapper.go:122 [core] [Channel #1]Channel switches to new LB policy "pick_first" {"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.023Z info gracefulswitch/gracefulswitch.go:194 [pick-first-leaf-lb] [pick-first-leaf-lb 0xc000bc6090] Received new config { "shuffleAddressList": false }, resolver state { "Addresses": [ { "Addr": "center-collector.opentelemetry.svc:4317", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Endpoints": [ { "Addresses": [ { "Addr": "center-collector.opentelemetry.svc:4317", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Attributes": null } ], "ServiceConfig": null, "Attributes": null } {"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.023Z info grpc@v1.72.1/clientconn.go:563 [core] [Channel #1]Channel Connectivity change to CONNECTING{"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.023Z info grpc@v1.72.1/balancer_wrapper.go:195 [core] [Channel #1 SubChannel #2]Subchannel created {"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.023Z info grpc@v1.72.1/clientconn.go:364 [core] [Channel #1]Channel exiting idle mode {"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.023Z info grpc@v1.72.1/clientconn.go:1224 [core] [Channel #1 SubChannel #2]Subchannel Connectivity change to CONNECTING {"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.024Z info grpc@v1.72.1/clientconn.go:1343 [core] [Channel #1 SubChannel #2]Subchannel picks a new address "center-collector.opentelemetry.svc:4317" to connect {"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.024Z info grpc@v1.72.1/server.go:690 [core] [Server #3]Server created {"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.024Z info otlpreceiver@v0.127.0/otlp.go:116 Starting GRPC server {"resource": {}, "otelcol.component.id": "otlp", "otelcol.component.kind": "receiver", "endpoint": "0.0.0.0:4317"} 2025-06-14T15:31:35.025Z info grpc@v1.72.1/server.go:886 [core] [Server #3 ListenSocket #4]ListenSocket created {"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.025Z info otlpreceiver@v0.127.0/otlp.go:173 Starting HTTP server {"resource": {}, "otelcol.component.id": "otlp", "otelcol.component.kind": "receiver", "endpoint": "0.0.0.0:4318"} 2025-06-14T15:31:35.026Z info service@v0.127.0/service.go:289 Everything is ready. Begin running and processing data. {"resource": {}} 2025-06-14T15:31:35.034Z info grpc@v1.72.1/clientconn.go:1224 [core] [Channel #1 SubChannel #2]Subchannel Connectivity change to READY {"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.034Z info pickfirstleaf/pickfirstleaf.go:197 [pick-first-leaf-lb] [pick-first-leaf-lb 0xc000bc6090] SubConn 0xc0008e1db0 reported connectivity state READY and the health listener is disabled. Transitioning SubConn to READY. {"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.034Z info grpc@v1.72.1/clientconn.go:563 [core] [Channel #1]Channel Connectivity change to READY {"resource": {}, "grpc_log": true} root@k8s01:~/helm/opentelemetry# kubectl logs java-demo-5cdd74d47-vmqqx -c otc-container 2025-06-14T15:31:35.013Z info service@v0.127.0/service.go:199 Setting up own telemetry... {"resource": {}} 2025-06-14T15:31:35.014Z debug builders/builders.go:24 Stable component. {"resource": {}, "otelcol.component.id": "otlp 2025-06-14T15:31:35.014Z info builders/builders.go:26 Development component. May change in the future. {"resource": {aces"} 2025-06-14T15:31:35.014Z debug builders/builders.go:24 Beta component. May change in the future. {"resource": {}, "oteles", "otelcol.signal": "traces"} 2025-06-14T15:31:35.014Z debug builders/builders.go:24 Stable component. {"resource": {}, "otelcol.component.id": "otlp 2025-06-14T15:31:35.014Z debug otlpreceiver@v0.127.0/otlp.go:58 created signal-agnostic logger {"resource": {}, "otel 2025-06-14T15:31:35.021Z info service@v0.127.0/service.go:266 Starting otelcol... {"resource": {}, "Version": "0.127.0", 2025-06-14T15:31:35.021Z info extensions/extensions.go:41 Starting extensions... {"resource": {}} 2025-06-14T15:31:35.021Z info grpc@v1.72.1/clientconn.go:176 [core] original dial target is: "center-collector.opentelemetr 2025-06-14T15:31:35.021Z info grpc@v1.72.1/clientconn.go:459 [core] [Channel #1]Channel created {"resource": {}, "grpc 2025-06-14T15:31:35.021Z info grpc@v1.72.1/clientconn.go:207 [core] [Channel #1]parsed dial target is: resolver.Target{URL:ector.opentelemetry.svc:4317", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}} {"resource": { 2025-06-14T15:31:35.021Z info grpc@v1.72.1/clientconn.go:208 [core] [Channel #1]Channel authority set to "center-collector. 2025-06-14T15:31:35.022Z info grpc@v1.72.1/resolver_wrapper.go:210 [core] [Channel #1]Resolver state updated: { "Addresses": [ { "Addr": "center-collector.opentelemetry.svc:4317", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Endpoints": [ { "Addresses": [ { "Addr": "center-collector.opentelemetry.svc:4317", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Attributes": null } ], "ServiceConfig": null, "Attributes": null } (resolver returned new addresses) {"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.022Z info grpc@v1.72.1/balancer_wrapper.go:122 [core] [Channel #1]Channel switches to new LB policy " 2025-06-14T15:31:35.023Z info gracefulswitch/gracefulswitch.go:194 [pick-first-leaf-lb] [pick-first-leaf-lb 0xc000bc6090] "shuffleAddressList": false }, resolver state { "Addresses": [ { "Addr": "center-collector.opentelemetry.svc:4317", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Endpoints": [ { "Addresses": [ { "Addr": "center-collector.opentelemetry.svc:4317", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Attributes": null } ], "ServiceConfig": null, "Attributes": null } {"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.023Z info grpc@v1.72.1/clientconn.go:563 [core] [Channel #1]Channel Connectivity change to CONNECTING 2025-06-14T15:31:35.023Z info grpc@v1.72.1/balancer_wrapper.go:195 [core] [Channel #1 SubChannel #2]Subchannel created 2025-06-14T15:31:35.023Z info grpc@v1.72.1/clientconn.go:364 [core] [Channel #1]Channel exiting idle mode {"resource": { 2025-06-14T15:31:35.023Z info grpc@v1.72.1/clientconn.go:1224 [core] [Channel #1 SubChannel #2]Subchannel Connectivity chang 2025-06-14T15:31:35.024Z info grpc@v1.72.1/clientconn.go:1343 [core] [Channel #1 SubChannel #2]Subchannel picks a new addres 2025-06-14T15:31:35.024Z info grpc@v1.72.1/server.go:690 [core] [Server #3]Server created {"resource": {}, "grpc 2025-06-14T15:31:35.024Z info otlpreceiver@v0.127.0/otlp.go:116 Starting GRPC server {"resource": {}, "otelcol.comp 2025-06-14T15:31:35.025Z info grpc@v1.72.1/server.go:886 [core] [Server #3 ListenSocket #4]ListenSocket created {"reso 2025-06-14T15:31:35.025Z info otlpreceiver@v0.127.0/otlp.go:173 Starting HTTP server {"resource": {}, "otelcol.comp 2025-06-14T15:31:35.026Z info service@v0.127.0/service.go:289 Everything is ready. Begin running and processing data. {"reso 2025-06-14T15:31:35.034Z info grpc@v1.72.1/clientconn.go:1224 [core] [Channel #1 SubChannel #2]Subchannel Connectivity chang 2025-06-14T15:31:35.034Z info pickfirstleaf/pickfirstleaf.go:197 [pick-first-leaf-lb] [pick-first-leaf-lb 0xc000bc6090]ansitioning SubConn to READY. {"resource": {}, "grpc_log": true} 2025-06-14T15:31:35.034Z info grpc@v1.72.1/clientconn.go:563 [core] [Channel #1]Channel Connectivity change to READY {"reso #查看collector 日志,已经收到 traces 数据 root@k8s01:~/helm/opentelemetry# kubectl get pod -n opentelemetry NAME READY STATUS RESTARTS AGE center-collector-78f7bbdf45-j798s 1/1 Running 0 3h24m root@k8s01:~/helm/opentelemetry# kubectl get -n opentelemetry pods NAME READY STATUS RESTARTS AGE center-collector-78f7bbdf45-j798s 1/1 Running 0 3h25m root@k8s01:~/helm/opentelemetry# kubectl logs -n opentelemetry center-collector-78f7bbdf45-j798s 2025-06-14T12:09:21.290Z info service@v0.127.0/service.go:199 Setting up own telemetry... {"resource": {}} 2025-06-14T12:09:21.291Z info builders/builders.go:26 Development component. May change in the future. {"resource": {}, "otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "traces"} 2025-06-14T12:09:21.294Z info service@v0.127.0/service.go:266 Starting otelcol... {"resource": {}, "Version": "0.127.0", "NumCPU": 8} 2025-06-14T12:09:21.294Z info extensions/extensions.go:41 Starting extensions... {"resource": {}} 2025-06-14T12:09:21.294Z info otlpreceiver@v0.127.0/otlp.go:116 Starting GRPC server {"resource": {}, "otelcol.component.id": "otlp", "otelcol.component.kind": "receiver", "endpoint": "0.0.0.0:4317"} 2025-06-14T12:09:21.295Z info otlpreceiver@v0.127.0/otlp.go:173 Starting HTTP server {"resource": {}, "otelcol.component.id": "otlp", "otelcol.component.kind": "receiver", "endpoint": "0.0.0.0:4318"} 2025-06-14T12:09:21.295Z info service@v0.127.0/service.go:289 Everything is ready. Begin running and processing data. {"resource": {}} root@k8s01:~/helm/opentelemetry# 2、python应用自动埋点与 java 应用类似,python 应用同样也支持自动埋点, OpenTelemetry 提供了 opentelemetry-instrument CLI 工具,在启动 Python 应用时通过 sitecustomize 或环境变量注入自动 instrumentation。 我们先创建一个java-instrumentation 资源apiVersion: opentelemetry.io/v1alpha1 kind: Instrumentation # 声明资源类型为 Instrumentation(用于语言自动注入) metadata: name: python-instrumentation # Instrumentation 资源的名称(可以被 Deployment 等引用) namespace: opentelemetry spec: propagators: # 指定用于 trace 上下文传播的方式,支持多种格式 - tracecontext # W3C Trace Context(最通用的跨服务追踪格式) - baggage # 传播用户定义的上下文键值对 - b3 # Zipkin 的 B3 header(用于兼容 Zipkin 环境) sampler: # 定义采样策略(决定是否收集 trace) type: always_on # 始终采样所有请求(适合测试或调试环境) python: image: registry.cn-guangzhou.aliyuncs.com/xingcangku/autoinstrumentation-python:latest env: - name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED # 启用日志的自动检测 value: "true" - name: OTEL_PYTHON_LOG_CORRELATION # 在日志中启用跟踪上下文注入 value: "true" - name: OTEL_EXPORTER_OTLP_ENDPOINT value: http://center-collector.opentelemetry.svc:4318^Croot@k8s01:~/helm/opentelemetry# cat new-python-demo.yaml apiVersion: apps/v1 kind: Deployment metadata: name: python-demo spec: selector: matchLabels: app: python-demo template: metadata: labels: app: python-demo annotations: instrumentation.opentelemetry.io/inject-python: "opentelemetry/python-instrumentation" # 填写 Instrumentation 资源的名称 sidecar.opentelemetry.io/inject: "opentelemetry/sidecar" # 注入一个 sidecar 模式的 OpenTelemetry Collector spec: containers: - name: pyhton-demo image: registry.cn-guangzhou.aliyuncs.com/xingcangku/python-demoapp:latest imagePullPolicy: IfNotPresent resources: limits: memory: "500Mi" cpu: "200m" ports: - containerPort: 5000 oot@k8s03:~# kubectl get pods NAME READY STATUS RESTARTS AGE java-demo-5559f949b9-74p68 2/2 Running 0 2m14s java-demo-5559f949b9-kwgpc 0/2 Terminating 0 14m my-sonarqube-postgresql-0 1/1 Running 8 (2d22h ago) 9d my-sonarqube-sonarqube-0 0/1 Pending 0 6d7h python-demo-599fc7f8d6-lbhnr 2/2 Running 0 20m redis-5ff4857944-v2vz5 1/1 Running 5 (2d22h ago) 6d3h root@k8s03:~# kubectl logs python-demo-599fc7f8d6-lbhnr -c otc-container 2025-06-14T15:57:12.951Z info service@v0.127.0/service.go:199 Setting up own telemetry... {"resource": {}} 2025-06-14T15:57:12.952Z info builders/builders.go:26 Development component. May change in the future. {"resource{}, "otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "traces"} 2025-06-14T15:57:12.952Z debug builders/builders.go:24 Stable component. {"resource": {}, "otelcol.component.id": "p", "otelcol.component.kind": "exporter", "otelcol.signal": "traces"} 2025-06-14T15:57:12.952Z debug builders/builders.go:24 Beta component. May change in the future. {"resource": {}, "lcol.component.id": "batch", "otelcol.component.kind": "processor", "otelcol.pipeline.id": "traces", "otelcol.signal": "traces"} 2025-06-14T15:57:12.952Z debug builders/builders.go:24 Stable component. {"resource": {}, "otelcol.component.id": "p", "otelcol.component.kind": "receiver", "otelcol.signal": "traces"} 2025-06-14T15:57:12.952Z debug otlpreceiver@v0.127.0/otlp.go:58 created signal-agnostic logger {"resource": {}, "lcol.component.id": "otlp", "otelcol.component.kind": "receiver"} 2025-06-14T15:57:12.953Z info service@v0.127.0/service.go:266 Starting otelcol... {"resource": {}, "Version": "0.127, "NumCPU": 8} 2025-06-14T15:57:12.953Z info extensions/extensions.go:41 Starting extensions... {"resource": {}} 2025-06-14T15:57:12.953Z info grpc@v1.72.1/clientconn.go:176 [core] original dial target is: "center-collector.opentelery.svc:4317" {"resource": {}, "grpc_log": true} 2025-06-14T15:57:12.954Z info grpc@v1.72.1/clientconn.go:459 [core] [Channel #1]Channel created {"resource": {}, "c_log": true} 2025-06-14T15:57:12.954Z info grpc@v1.72.1/clientconn.go:207 [core] [Channel #1]parsed dial target is: resolver.Target{:url.URL{Scheme:"passthrough", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/center-collector.opentelemetry.svc:4317", Rawh:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}} {"resource": {}, "grpc_log": true} 2025-06-14T15:57:12.954Z info grpc@v1.72.1/clientconn.go:208 [core] [Channel #1]Channel authority set to "center-collec.opentelemetry.svc:4317" {"resource": {}, "grpc_log": true} 2025-06-14T15:57:12.954Z info grpc@v1.72.1/resolver_wrapper.go:210 [core] [Channel #1]Resolver state updated: { "Addresses": [ { "Addr": "center-collector.opentelemetry.svc:4317", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Endpoints": [ { "Addresses": [ { "Addr": "center-collector.opentelemetry.svc:4317", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Attributes": null } ], "ServiceConfig": null, "Attributes": null } (resolver returned new addresses) {"resource": {}, "grpc_log": true} 2025-06-14T15:57:12.954Z info grpc@v1.72.1/balancer_wrapper.go:122 [core] [Channel #1]Channel switches to new LB poli"pick_first" {"resource": {}, "grpc_log": true} 2025-06-14T15:57:12.954Z info gracefulswitch/gracefulswitch.go:194 [pick-first-leaf-lb] [pick-first-leaf-lb 0xc00046e] Received new config { "shuffleAddressList": false }, resolver state { "Addresses": [ { "Addr": "center-collector.opentelemetry.svc:4317", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Endpoints": [ { "Addresses": [ { "Addr": "center-collector.opentelemetry.svc:4317", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Attributes": null } ], "ServiceConfig": null, "Attributes": null } {"resource": {}, "grpc_log": true} 2025-06-14T15:57:12.954Z info grpc@v1.72.1/clientconn.go:563 [core] [Channel #1]Channel Connectivity change to CONNECTI"resource": {}, "grpc_log": true} 2025-06-14T15:57:12.954Z info grpc@v1.72.1/balancer_wrapper.go:195 [core] [Channel #1 SubChannel #2]Subchannel create"resource": {}, "grpc_log": true} 2025-06-14T15:57:12.954Z info grpc@v1.72.1/clientconn.go:364 [core] [Channel #1]Channel exiting idle mode {"resource{}, "grpc_log": true} 2025-06-14T15:57:12.954Z info grpc@v1.72.1/clientconn.go:1224 [core] [Channel #1 SubChannel #2]Subchannel Connectivity cge to CONNECTING {"resource": {}, "grpc_log": true} 2025-06-14T15:57:12.954Z info grpc@v1.72.1/clientconn.go:1343 [core] [Channel #1 SubChannel #2]Subchannel picks a new adss "center-collector.opentelemetry.svc:4317" to connect {"resource": {}, "grpc_log": true} 2025-06-14T15:57:12.954Z info grpc@v1.72.1/server.go:690 [core] [Server #3]Server created {"resource": {}, "c_log": true} 2025-06-14T15:57:12.954Z info otlpreceiver@v0.127.0/otlp.go:116 Starting GRPC server {"resource": {}, "otelcol.ponent.id": "otlp", "otelcol.component.kind": "receiver", "endpoint": "0.0.0.0:4317"} 2025-06-14T15:57:12.954Z info otlpreceiver@v0.127.0/otlp.go:173 Starting HTTP server {"resource": {}, "otelcol.ponent.id": "otlp", "otelcol.component.kind": "receiver", "endpoint": "0.0.0.0:4318"} 2025-06-14T15:57:12.954Z info service@v0.127.0/service.go:289 Everything is ready. Begin running and processing data. {"ource": {}} 2025-06-14T15:57:12.955Z info grpc@v1.72.1/server.go:886 [core] [Server #3 ListenSocket #4]ListenSocket created {"ource": {}, "grpc_log": true} 2025-06-14T15:57:12.962Z info grpc@v1.72.1/clientconn.go:1224 [core] [Channel #1 SubChannel #2]Subchannel Connectivity cge to READY {"resource": {}, "grpc_log": true} 2025-06-14T15:57:12.962Z info pickfirstleaf/pickfirstleaf.go:197 [pick-first-leaf-lb] [pick-first-leaf-lb 0xc00046e] SubConn 0xc0005fccd0 reported connectivity state READY and the health listener is disabled. Transitioning SubConn to READY. {"ource": {}, "grpc_log": true} 2025-06-14T15:57:12.962Z info grpc@v1.72.1/clientconn.go:563 [core] [Channel #1]Channel Connectivity change to READY {"ource": {}, "grpc_log": true} root@k8s03:~# root@k8s03:~# kubectl logs -n opentelemetry center-collector-78f7bbdf45-j798s 2025-06-14T12:09:21.290Z info service@v0.127.0/service.go:199 Setting up own telemetry... {"resource": {}} 2025-06-14T12:09:21.291Z info builders/builders.go:26 Development component. May change in the future. {"resourceaces"} 2025-06-14T12:09:21.294Z info service@v0.127.0/service.go:266 Starting otelcol... {"resource": {}, "Version": "0.127 2025-06-14T12:09:21.294Z info extensions/extensions.go:41 Starting extensions... {"resource": {}} 2025-06-14T12:09:21.294Z info otlpreceiver@v0.127.0/otlp.go:116 Starting GRPC server {"resource": {}, "otelcol. 2025-06-14T12:09:21.295Z info otlpreceiver@v0.127.0/otlp.go:173 Starting HTTP server {"resource": {}, "otelcol. 2025-06-14T12:09:21.295Z info service@v0.127.0/service.go:289 Everything is ready. Begin running and processing data. {" 2025-06-14T16:05:11.811Z info Traces {"resource": {}, "otelcol.component.id": "debug", "otelcol.component.kind": "expor 2025-06-14T16:05:16.636Z info Traces {"resource": {}, "otelcol.component.id": "debug", "otelcol.component.kind": "expor 2025-06-14T16:05:26.894Z info Traces {"resource": {}, "otelcol.component.id": "debug", "otelcol.component.kind": "expor 2025-06-14T16:18:11.294Z info Traces {"resource": {}, "otelcol.component.id": "debug", "otelcol.component.kind": "expor 2025-06-14T16:18:21.350Z info Traces {"resource": {}, "otelcol.component.id": "debug", "otelcol.component.kind": "expor root@k8s03:~#
2025年06月14日
3 阅读
0 评论
0 点赞
2025-06-14
OpenTelemetry部署
建议使用 OpenTelemetry Operator 来部署,因为它可以帮助我们轻松部署和管理 OpenTelemetry 收集器,还可以自动检测应用程序。具体可参考文档https://opentelemetry.io/docs/platforms/kubernetes/operator/一、部署cert-manager因为 Operator 使用了 Admission Webhook 通过 HTTP 回调机制对资源进行校验/修改。Kubernetes 要求 Webhook 服务必须使用 TLS,因此 Operator 需要为其 webhook server 签发证书,所以需要先安装cert-manager。# wget https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml # kubectl apply -f cert-manager.yaml root@k8s01:~/helm/opentelemetry/cert-manager# kubectl get -n cert-manager pod NAME READY STATUS RESTARTS AGE cert-manager-7bd494778-gs44k 1/1 Running 0 37s cert-manager-cainjector-76474c8c48-w9r5p 1/1 Running 0 37s cert-manager-webhook-6797c49f67-thvcz 1/1 Running 0 37s root@k8s01:~/helm/opentelemetry/cert-manager# 二、部署Operator在 Kubernetes 上使用 OpenTelemetry,主要就是部署 OpenTelemetry 收集器。# wget https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml # kubectl apply -f opentelemetry-operator.yaml # kubectl get pod -n opentelemetry-operator-system NAME READY STATUS RESTARTS AGE opentelemetry-operator-controller-manager-6d94c5db75-cz957 2/2 Running 0 74s # kubectl get crd |grep opentelemetry instrumentations.opentelemetry.io 2025-04-21T09:48:53Z opampbridges.opentelemetry.io 2025-04-21T09:48:54Z opentelemetrycollectors.opentelemetry.io 2025-04-21T09:48:54Z targetallocators.opentelemetry.io 2025-04-21T09:48:54Zroot@k8s01:~/helm/opentelemetry/cert-manager# kubectl apply -f opentelemetry-operator.yaml namespace/opentelemetry-operator-system created customresourcedefinition.apiextensions.k8s.io/instrumentations.opentelemetry.io created customresourcedefinition.apiextensions.k8s.io/opampbridges.opentelemetry.io created customresourcedefinition.apiextensions.k8s.io/opentelemetrycollectors.opentelemetry.io created customresourcedefinition.apiextensions.k8s.io/targetallocators.opentelemetry.io created serviceaccount/opentelemetry-operator-controller-manager created role.rbac.authorization.k8s.io/opentelemetry-operator-leader-election-role created clusterrole.rbac.authorization.k8s.io/opentelemetry-operator-manager-role created clusterrole.rbac.authorization.k8s.io/opentelemetry-operator-metrics-reader created clusterrole.rbac.authorization.k8s.io/opentelemetry-operator-proxy-role created rolebinding.rbac.authorization.k8s.io/opentelemetry-operator-leader-election-rolebinding created clusterrolebinding.rbac.authorization.k8s.io/opentelemetry-operator-manager-rolebinding created clusterrolebinding.rbac.authorization.k8s.io/opentelemetry-operator-proxy-rolebinding created service/opentelemetry-operator-controller-manager-metrics-service created service/opentelemetry-operator-webhook-service created deployment.apps/opentelemetry-operator-controller-manager created Warning: spec.privateKey.rotationPolicy: In cert-manager >= v1.18.0, the default value changed from `Never` to `Always`. certificate.cert-manager.io/opentelemetry-operator-serving-cert created issuer.cert-manager.io/opentelemetry-operator-selfsigned-issuer created mutatingwebhookconfiguration.admissionregistration.k8s.io/opentelemetry-operator-mutating-webhook-configuration created validatingwebhookconfiguration.admissionregistration.k8s.io/opentelemetry-operator-validating-webhook-configuration created root@k8s01:~/helm/opentelemetry/cert-manager# kubectl get pods -n opentelemetry-operator-system NAME READY STATUS RESTARTS AGE opentelemetry-operator-controller-manager-f78fc55f7-xtjk2 2/2 Running 0 107s root@k8s01:~/helm/opentelemetry/cert-manager# kubectl get crd |grep opentelemetry instrumentations.opentelemetry.io 2025-06-14T11:30:01Z opampbridges.opentelemetry.io 2025-06-14T11:30:01Z opentelemetrycollectors.opentelemetry.io 2025-06-14T11:30:02Z targetallocators.opentelemetry.io 2025-06-14T11:30:02Z三、部署Collector(中心)接下来我们部署一个精简版的 OpenTelemetry Collector,用于接收 OTLP 格式的 trace 数据,通过 gRPC 或 HTTP 协议接入,经过内存控制与批处理后,打印到日志中以供调试使用。 root@k8s01:~/helm/opentelemetry/cert-manager# cat center-collector.yaml apiVersion: opentelemetry.io/v1beta1 kind: OpenTelemetryCollector # 元数据定义部分 metadata: name: center # Collector 的名称为 center namespace: opentelemetry # 具体的配置内容 spec: image: registry.cn-guangzhou.aliyuncs.com/xingcangku/opentelemetry-collector-0.127.0:0.127.0 replicas: 1 # 设置副本数量为1 config: # 定义 Collector 配置 receivers: # 接收器,用于接收遥测数据(如 trace、metrics、logs) otlp: # 配置 OTLP(OpenTelemetry Protocol)接收器 protocols: # 启用哪些协议来接收数据 grpc: endpoint: 0.0.0.0:4317 # 启用 gRPC 协议 http: endpoint: 0.0.0.0:4318 # 启用 HTTP 协议 processors: # 处理器,用于处理收集到的数据 batch: {} # 批处理器,用于将数据分批发送,提高效率 exporters: # 导出器,用于将处理后的数据发送到后端系统 debug: {} # 使用 debug 导出器,将数据打印到终端(通常用于测试或调试) service: # 服务配置部分 pipelines: # 定义处理管道 traces: # 定义 trace 类型的管道 receivers: [otlp] # 接收器为 OTLP processors: [batch] # 使用批处理器 exporters: [debug] # 将数据打印到终端 root@k8s01:~/helm/opentelemetry/cert-manager# kubectl get pod -n opentelemetry NAME READY STATUS RESTARTS AGE center-collector-78f7bbdf45-j798s 1/1 Running 0 43s center-collector-7b7b8b9b97-qwhdr 0/1 Terminating 0 12m root@k8s01:~/helm/opentelemetry/cert-manager# kubectl get svc -n opentelemetry NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE center-collector ClusterIP 10.105.241.233 <none> 4317/TCP,4318/TCP 49s center-collector-headless ClusterIP None <none> 4317/TCP,4318/TCP 49s center-collector-monitoring ClusterIP 10.96.61.65 <none> 8888/TCP 49s root@k8s01:~/helm/opentelemetry/cert-manager# 四、部署Collector(代理)我们使用 Sidecar 模式部署 OpenTelemetry 代理。该代理会将应用程序的追踪发送到我们刚刚部署的中心OpenTelemetry 收集器。root@k8s01:~/helm/opentelemetry/cert-manager# cat center-collector.yaml apiVersion: opentelemetry.io/v1beta1 kind: OpenTelemetryCollector # 元数据定义部分 metadata: name: center # Collector 的名称为 center namespace: opentelemetry # 具体的配置内容 spec: image: registry.cn-guangzhou.aliyuncs.com/xingcangku/opentelemetry-collector-0.127.0:0.127.0 replicas: 1 # 设置副本数量为1 config: # 定义 Collector 配置 receivers: # 接收器,用于接收遥测数据(如 trace、metrics、logs) otlp: # 配置 OTLP(OpenTelemetry Protocol)接收器 protocols: # 启用哪些协议来接收数据 grpc: endpoint: 0.0.0.0:4317 # 启用 gRPC 协议 http: endpoint: 0.0.0.0:4318 # 启用 HTTP 协议 processors: # 处理器,用于处理收集到的数据 batch: {} # 批处理器,用于将数据分批发送,提高效率 exporters: # 导出器,用于将处理后的数据发送到后端系统 debug: {} # 使用 debug 导出器,将数据打印到终端(通常用于测试或调试) service: # 服务配置部分 pipelines: # 定义处理管道 traces: # 定义 trace 类型的管道 receivers: [otlp] # 接收器为 OTLP processors: [batch] # 使用批处理器 exporters: [debug] # 将数据打印到终端 root@k8s01:~/helm/opentelemetry/cert-manager# kubectl get pod -n opentelemetry NAME READY STATUS RESTARTS AGE center-collector-78f7bbdf45-j798s 1/1 Running 0 43s center-collector-7b7b8b9b97-qwhdr 0/1 Terminating 0 12m root@k8s01:~/helm/opentelemetry/cert-manager# kubectl get svc -n opentelemetry NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE center-collector ClusterIP 10.105.241.233 <none> 4317/TCP,4318/TCP 49s center-collector-headless ClusterIP None <none> 4317/TCP,4318/TCP 49s center-collector-monitoring ClusterIP 10.96.61.65 <none> 8888/TCP 49s root@k8s01:~/helm/opentelemetry/cert-manager# vi sidecar-collector.yaml root@k8s01:~/helm/opentelemetry/cert-manager# kubectl apply -f sidecar-collector.yaml opentelemetrycollector.opentelemetry.io/sidecar created root@k8s01:~/helm/opentelemetry/cert-manager# kubectl get opentelemetrycollectors -n opentelemetry NAME MODE VERSION READY AGE IMAGE MANAGEMENT center deployment 0.127.0 1/1 3m3s registry.cn-guangzhou.aliyuncs.com/xingcangku/opentelemetry-collector-0.127.0:0.127.0 managed sidecar sidecar 0.127.0 7s managed root@k8s01:~/helm/opentelemetry/cert-manager# kubectl get opentelemetrycollectors -n opentelemetry NAME MODE VERSION READY AGE IMAGE center deployment 0.127.0 1/1 3m8s registry.cn-guangzhou.aliyuncs.com/xingcangku/opentelemetry-collector-0.127.0:0.127.0 sidecar sidecar 0.127.0 12s root@k8s01:~/helm/opentelemetry/cert-manager# kubectl get pod -n opentelemetry NAME READY STATUS RESTARTS AGE center-collector-78f7bbdf45-j798s 1/1 Running 0 3m31s center-collector-7b7b8b9b97-qwhdr 0/1 Terminating 0 15m sidecar 代理依赖于应用程序启动,因此现在创建后并不会立即启动,需要我们创建一个应用程序并使用这个 sidecar 模式的 collector。
2025年06月14日
2 阅读
0 评论
0 点赞