链路追踪数据收集与导出

axing
2025-06-16 / 0 评论 / 1 阅读 / 正在检测是否收录...

链路追踪数据收集与导出

一、链路数据收集方案

在 Kubernetes 中部署应用进行链路追踪数据收集,常见有两种方案:

1、基于 Instrumentation Operator 的自动注入(自动埋点)
通过部署 OpenTelemetry Operator,并创建 Instrumentation 自定义资源(CRD),实现对应用容器的自动注入 SDK 或 Sidecar,从而无需修改应用代码即可采集追踪数据。适合需要快速接入、统一管理、降低改造成本的场景。
2、手动在应用中集成 OpenTelemetry SDK(手动埋点)
在应用程序代码中直接引入 OpenTelemetry SDK,手动埋点关键业务逻辑,控制 trace span 的粒度和内容,并将数据通过 OTLP(OpenTelemetry Protocol)协议导出到后端(如 OpenTelemetry Collector、Jaeger、Tempo 等)。适合需要精准控制追踪数据质量或已有自定义采集需求的场景。
接下来以Instrumentation Operator自动注入方式演示如何收集并处理数据。

二、部署测试应用

接下来我们部署一个HotROD 演示程序,它内置了OpenTelemetry SDK,我们只需要配置 opentelemetry 接收地址既可,具体可参考文档:

https://github.com/jaegertracing/jaeger/tree/main/examples/hotrod
apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-demo
spec:
  selector:
    matchLabels:
      app: go-demo
  template:
    metadata:
      labels:
        app: go-demo
    spec:
      containers:
      - name: go-demo
        image: jaegertracing/example-hotrod:latest
        imagePullPolicy: IfNotPresent
        resources:
          limits:
            memory: "500Mi"
            cpu: "200m"
        ports:
        - containerPort: 8080
        env:
          - name: OTEL_EXPORTER_OTLP_ENDPOINT # opentelemetry服务地址
            value: http://center-collector.opentelemetry.svc:4318
---
apiVersion: v1
kind: Service
metadata:
  name: go-demo
spec:
  selector:
    app: go-demo
  ports:
  - port: 8080
    targetPort: 8080
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: go-demo
spec:
  entryPoints:
  - web
  routes:
  - match: Host(`go-demo.cuiliangblog.cn`)
    kind: Rule
    services:
      - name: go-demo
        port: 8080

接下来浏览器添加 hosts 解析后访问测试
mby13p1h.png

三、Jaeger方案
3.1Jaeger介绍
Jaeger 是Uber公司研发,后来贡献给CNCF的一个分布式链路追踪软件,主要用于微服务链路追踪。它优点是性能高(能处理大量追踪数据)、部署灵活(支持单节点和分布式部署)、集成方便(兼容 OpenTelemetry),并且可视化能力强,可以快速定位性能瓶颈和故障。
mby14clw.png

基于上述示意图,我们简要解析下 Jaeger 各个组件以及组件间的关系: 
Client libraries(客户端库)
功能:将追踪信息(trace/span)插入到应用程序中。
说明:
  支持多种语言,如 Go、Java、Python、Node.js 等。
  通常使用 OpenTelemetry SDK 或 Jaeger Tracer。
  将生成的追踪数据发送到 Agent 或 Collector。

Agent(代理)
功能:接收客户端发来的追踪数据,批量转发给 Collector。

说明:
  接收 UDP 数据包(更轻量)
  向 Collector 使用 gRPC 发送数据

Collector(收集器)
功能:
  接收 Agent 或直接从 SDK 发送的追踪数据。
  处理(转码、校验等)后写入存储后端。
可横向扩展,提高吞吐能力。

Ingester(摄取器)(可选)
功能:在使用 Kafka 作为中间缓冲队列时,Ingester 从 Kafka 消费数据并写入存储。
用途:解耦收集与存储、提升稳定性。

Storage Backend(存储后端)
功能:保存追踪数据,供查询和分析使用。

支持:
  Elasticsearch
  Cassandra
  Kafka(用于异步摄取)
  Badger(仅用于开发)
  OpenSearch

Query(查询服务)
功能:从存储中查询追踪数据,提供给前端 UI 使用。
提供 API 接口:供 UI 或其他系统(如 Grafana Tempo)调用。

UI(前端界面)
功能:
  可视化展示 Trace、Span、服务依赖图。
  支持搜索条件(服务名、时间范围、trace ID 等)。
常用用途:
  查看慢请求
  分析请求调用链
  排查错误或瓶颈
在本示例中,指标数据采集与收集由 OpenTelemetry 实现,仅需要使用 jaeger-collector 组件接收输入,存入 elasticsearch,使用 jaeger-query 组件查询展示数据既可。

3.2部署 Jaeger(all in one)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: jaeger
  namespace: opentelemetry
  labels:
    app: jaeger
spec:
  replicas: 1
  selector:
    matchLabels:
      app: jaeger
  template:
    metadata:
      labels:
        app: jaeger
    spec:
      containers:
        - name: jaeger
          image: jaegertracing/all-in-one:latest
          args:
            - "--collector.otlp.enabled=true"  # 启用 OTLP gRPC
            - "--collector.otlp.grpc.host-port=0.0.0.0:4317"
          resources:
            limits:
              memory: "2Gi"
              cpu: "1"
          ports:
            - containerPort: 6831
              protocol: UDP
            - containerPort: 16686
              protocol: TCP
            - containerPort: 4317
              protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  name: jaeger
  namespace: opentelemetry
  labels:
    app: jaeger
spec:
  selector:
    app: jaeger
  ports:
    - name: jaeger-udp
      port: 6831
      targetPort: 6831
      protocol: UDP
    - name: jaeger-ui
      port: 16686
      targetPort: 16686
      protocol: TCP
    - name: otlp-grpc
      port: 4317
      targetPort: 4317
      protocol: TCP
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: jaeger
  namespace: opentelemetry
spec:
  entryPoints:
  - web
  routes:
  - match: Host(`jaeger.cuiliangblog.cn`)
    kind: Rule
    services:
      - name: jaeger
        port: 16686

3.3部署 Jaeger(分布式)

all in one 数据存放在内存中不具备高可用性,生产环境中建议使用Elasticsearch 或 OpenSearch 作为 Cassandra 的存储后端,以 ElasticSearch 为例,部署操作具体可参考文档:https://www.cuiliangblog.cn/detail/section/162609409

导出 ca 证书

# kubectl -n elasticsearch get secret elasticsearch-es-http-certs-public -o go-template='{{index .data "ca.crt" | base64decode }}' > ca.crt
# kubectl create secret -n opentelemetry generic es-tls-secret --from-file=ca.crt=./ca.crt
secret/es-tls-secret created

获取 chart 包

# helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
"jaegertracing" has been added to your repositories
# helm search repo jaegertracing
NAME                            CHART VERSION   APP VERSION     DESCRIPTION                              
jaegertracing/jaeger            3.4.1           1.53.0          A Jaeger Helm chart for Kubernetes       
jaegertracing/jaeger-operator   2.57.0          1.61.0          jaeger-operator Helm chart for Kubernetes
# helm pull jaegertracing/jaeger --untar
# cd jaeger
# ls
Chart.lock  charts  Chart.yaml  README.md  templates  values.yaml

修改安装参数

apiVersion: v1
kind: ServiceAccount
metadata:
  name: jaeger-collector
  labels:
    helm.sh/chart: jaeger-3.4.1
    app.kubernetes.io/name: jaeger
    app.kubernetes.io/instance: jaeger
    app.kubernetes.io/version: "1.53.0"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/component: collector
automountServiceAccountToken: false
---
# Source: jaeger/templates/query-sa.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: jaeger-query
  labels:
    helm.sh/chart: jaeger-3.4.1
    app.kubernetes.io/name: jaeger
    app.kubernetes.io/instance: jaeger
    app.kubernetes.io/version: "1.53.0"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/component: query
automountServiceAccountToken: false
---
# Source: jaeger/templates/spark-sa.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: jaeger-spark
  labels:
    helm.sh/chart: jaeger-3.4.1
    app.kubernetes.io/name: jaeger
    app.kubernetes.io/instance: jaeger
    app.kubernetes.io/version: "1.53.0"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/component: spark
automountServiceAccountToken: false
---
# Source: jaeger/templates/collector-svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: jaeger-collector
  labels:
    helm.sh/chart: jaeger-3.4.1
    app.kubernetes.io/name: jaeger
    app.kubernetes.io/instance: jaeger
    app.kubernetes.io/version: "1.53.0"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/component: collector
spec:
  ports:
  - name: grpc
    port: 14250
    protocol: TCP
    targetPort: grpc
    appProtocol: grpc
  - name: http
    port: 14268
    protocol: TCP
    targetPort: http
    appProtocol: http
  - name: otlp-grpc
    port: 4317
    protocol: TCP
    targetPort: otlp-grpc
  - name: otlp-http
    port: 4318
    protocol: TCP
    targetPort: otlp-http
  - name: admin
    port: 14269
    targetPort: admin
  selector:
    app.kubernetes.io/name: jaeger
    app.kubernetes.io/instance: jaeger
    app.kubernetes.io/component: collector
  type: ClusterIP
---
# Source: jaeger/templates/query-svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: jaeger-query
  labels:
    helm.sh/chart: jaeger-3.4.1
    app.kubernetes.io/name: jaeger
    app.kubernetes.io/instance: jaeger
    app.kubernetes.io/version: "1.53.0"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/component: query
spec:
  ports:
  - name: query
    port: 80
    protocol: TCP
    targetPort: query
  - name: grpc
    port: 16685
    protocol: TCP
    targetPort: grpc
  - name: admin
    port: 16687
    protocol: TCP
    targetPort: admin
  selector:
    app.kubernetes.io/name: jaeger
    app.kubernetes.io/instance: jaeger
    app.kubernetes.io/component: query
  type: ClusterIP
---
# Source: jaeger/templates/collector-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: jaeger-collector
  labels:
    helm.sh/chart: jaeger-3.4.1
    app.kubernetes.io/name: jaeger
    app.kubernetes.io/instance: jaeger
    app.kubernetes.io/version: "1.53.0"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/component: collector
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: jaeger
      app.kubernetes.io/instance: jaeger
      app.kubernetes.io/component: collector
  template:
    metadata:
      annotations:
        checksum/config-env: 75a11da44c802486bc6f65640aa48a730f0f684c5c07a42ba3cd1735eb3fb070
      labels:
        app.kubernetes.io/name: jaeger
        app.kubernetes.io/instance: jaeger
        app.kubernetes.io/component: collector
    spec:
      securityContext:
        {}
      serviceAccountName: jaeger-collector
      
      containers:
      - name: jaeger-collector
        securityContext:
          {}
        image: registry.cn-guangzhou.aliyuncs.com/xingcangku/jaeger-collector:1.53.0
        imagePullPolicy: IfNotPresent
        args:
          
          
          
        env:
          - name: COLLECTOR_OTLP_ENABLED
            value: "true"
          - name: SPAN_STORAGE_TYPE
            value: elasticsearch
          - name: ES_SERVER_URLS
            value: https://elasticsearch-client.elasticsearch.svc:9200
          - name: ES_TLS_SKIP_HOST_VERIFY  # 添加临时跳过主机名验证
            value: "true"
          - name: ES_USERNAME
            value: elastic
          - name: ES_PASSWORD
            valueFrom:
              secretKeyRef:
                name: jaeger-elasticsearch
                key: password
          - name: ES_TLS_ENABLED
            value: "true"
          - name: ES_TLS_CA
            value: /es-tls/ca.crt
        ports:
        - containerPort: 14250
          name: grpc
          protocol: TCP
        - containerPort: 14268
          name: http
          protocol: TCP
        - containerPort: 14269
          name: admin
          protocol: TCP
        - containerPort: 4317
          name: otlp-grpc
          protocol: TCP
        - containerPort: 4318
          name: otlp-http
          protocol: TCP
        readinessProbe:
          httpGet:
            path: /
            port: admin
        livenessProbe:
          httpGet:
            path: /
            port: admin
        resources:
          {}
        volumeMounts:
          - name: es-tls-secret
            mountPath: /es-tls/ca.crt
            subPath: ca-cert.pem
            readOnly: true
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      volumes:
        - name: es-tls-secret
          secret:
            secretName: es-tls-secret
---
# Source: jaeger/templates/query-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: jaeger-query
  labels:
    helm.sh/chart: jaeger-3.4.1
    app.kubernetes.io/name: jaeger
    app.kubernetes.io/instance: jaeger
    app.kubernetes.io/version: "1.53.0"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/component: query
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: jaeger
      app.kubernetes.io/instance: jaeger
      app.kubernetes.io/component: query
  template:
    metadata:
      labels:
        app.kubernetes.io/name: jaeger
        app.kubernetes.io/instance: jaeger
        app.kubernetes.io/component: query
    spec:
      securityContext:
        {}
      serviceAccountName: jaeger-query
        
      containers:
      - name: jaeger-query
        securityContext:
          {}
        image: registry.cn-guangzhou.aliyuncs.com/xingcangku/jaegertracing-jaeger-query:1.53.0
        imagePullPolicy: IfNotPresent
        args:
          
          
          
        env:
          - name: SPAN_STORAGE_TYPE
            value: elasticsearch
          - name: ES_SERVER_URLS
            value: https://elasticsearch-client.elasticsearch.svc:9200
          - name: ES_TLS_SKIP_HOST_VERIFY  # 添加临时跳过主机名验证
            value: "true"
          - name: ES_USERNAME
            value: elastic
          - name: ES_PASSWORD
            valueFrom:
              secretKeyRef:
                name: jaeger-elasticsearch
                key: password
          - name: ES_TLS_ENABLED
            value: "true"
          - name: ES_TLS_CA
            value: /es-tls/ca.crt
          - name: QUERY_BASE_PATH
            value: "/"
          - name: JAEGER_AGENT_PORT
            value: "6831"
        ports:
        - name: query
          containerPort: 16686
          protocol: TCP
        - name: grpc
          containerPort: 16685
          protocol: TCP
        - name: admin
          containerPort: 16687
          protocol: TCP
        resources:
          {}
        volumeMounts:
          - name: es-tls-secret
            mountPath: /es-tls/ca.crt
            subPath: ca-cert.pem
            readOnly: true
        livenessProbe:
          httpGet:
            path: /
            port: admin
        readinessProbe:
          httpGet:
            path: /
            port: admin
      - name: jaeger-agent-sidecar
        securityContext:
          {}
        image: registry.cn-guangzhou.aliyuncs.com/xingcangku/jaegertracing-jaeger-agent:1.53.0
        imagePullPolicy: IfNotPresent
        args:
        env:
        - name: REPORTER_GRPC_HOST_PORT
          value: jaeger-collector:14250
        ports:
        - name: admin
          containerPort: 14271
          protocol: TCP
        resources:
          null
        volumeMounts:
        livenessProbe:
          httpGet:
            path: /
            port: admin
        readinessProbe:
          httpGet:
            path: /
            port: admin
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      volumes:
        - name: es-tls-secret
          secret:
            secretName: es-tls-secret
---
# Source: jaeger/templates/spark-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: jaeger-spark
  labels:
    helm.sh/chart: jaeger-3.4.1
    app.kubernetes.io/name: jaeger
    app.kubernetes.io/instance: jaeger
    app.kubernetes.io/version: "1.53.0"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/component: spark
spec:
  schedule: "49 23 * * *"
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 5
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      template:
        metadata:
          labels:
            app.kubernetes.io/name: jaeger
            app.kubernetes.io/instance: jaeger
            app.kubernetes.io/component: spark
        spec:
          serviceAccountName: jaeger-spark
          
          securityContext:
            {}
          containers:
          - name: jaeger-spark
            image: registry.cn-guangzhou.aliyuncs.com/xingcangku/jaegertracing-spark-dependencies:latest
            imagePullPolicy: IfNotPresent
            args:
              
              
            env:
              - name: STORAGE
                value: elasticsearch
              - name: ES_SERVER_URLS
                value: https://elasticsearch-client.elasticsearch.svc:9200
              - name: ES_USERNAME
                value: elastic
              - name: ES_PASSWORD
                valueFrom:
                  secretKeyRef:
                    name: jaeger-elasticsearch
                    key: password
              - name: ES_TLS_ENABLED
                value: "true"
              - name: ES_TLS_CA
                value: /es-tls/ca.crt
              - name: ES_NODES
                value: https://elasticsearch-client.elasticsearch.svc:9200
              - name: ES_NODES_WAN_ONLY
                value: "false"
            resources:
              {}
            volumeMounts:
            securityContext:
              {}
          restartPolicy: OnFailure
          volumes:
---
# Source: jaeger/templates/elasticsearch-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: jaeger-elasticsearch
  labels:
    helm.sh/chart: jaeger-3.4.1
    app.kubernetes.io/name: jaeger
    app.kubernetes.io/instance: jaeger
    app.kubernetes.io/version: "1.53.0"
    app.kubernetes.io/managed-by: Helm
  annotations:
    "helm.sh/hook": pre-install,pre-upgrade
    "helm.sh/hook-weight": "-1"
    "helm.sh/hook-delete-policy": before-hook-creation
    "helm.sh/resource-policy": keep
type: Opaque
data:
  password: "ZWdvbjY2Ng=="

安装 jaeger

root@k8s01:~/helm/jaeger/jaeger# kubectl delete  -n opentelemetry -f test.yaml 
serviceaccount "jaeger-collector" deleted
serviceaccount "jaeger-query" deleted
serviceaccount "jaeger-spark" deleted
service "jaeger-collector" deleted
service "jaeger-query" deleted
deployment.apps "jaeger-collector" deleted
deployment.apps "jaeger-query" deleted
cronjob.batch "jaeger-spark" deleted
secret "jaeger-elasticsearch" deleted
root@k8s01:~/helm/jaeger/jaeger# vi test.yaml 
root@k8s01:~/helm/jaeger/jaeger# kubectl apply   -n opentelemetry -f test.yaml 
serviceaccount/jaeger-collector created
serviceaccount/jaeger-query created
serviceaccount/jaeger-spark created
service/jaeger-collector created
service/jaeger-query created
deployment.apps/jaeger-collector created
deployment.apps/jaeger-query created
cronjob.batch/jaeger-spark created
secret/jaeger-elasticsearch created
root@k8s01:~/helm/jaeger/jaeger# kubectl get pods -n opentelemetry -w
NAME                                READY   STATUS    RESTARTS       AGE
center-collector-78f7bbdf45-j798s   1/1     Running   2 (6h2m ago)   30h
jaeger-7989549bb9-hn8jh             1/1     Running   2 (6h2m ago)   25h
jaeger-collector-7f8fb4c946-nkg4m   1/1     Running   0              3s
jaeger-query-5cdb7b68bd-xpftn       2/2     Running   0              3s
^Croot@k8s01:~/helm/jaeger/jaeger# kubectl get svc -n opentelemetry | grep jaeger
jaeger                         ClusterIP   10.100.251.219   <none>        6831/UDP,16686/TCP,4317/TCP                       25h
jaeger-collector               ClusterIP   10.111.17.41     <none>        14250/TCP,14268/TCP,4317/TCP,4318/TCP,14269/TCP   51s
jaeger-query                   ClusterIP   10.98.118.118    <none>        80/TCP,16685/TCP,16687/TCP                        51s

创建 ingress 资源

code here...
0

评论 (0)

取消