OpentelemetryCollector的配置和使⽤
Collector的配置和使⽤
⽬录
Collector配置
collector通过pipeline处理中启⽤的数据。pipeline由接收遥测数据的组件构成,包括:
其次还可以通过扩展来为Collector添加功能,但扩展不需要直接访问遥测数据,且不是pipeline的⼀部分。扩展同样可以在中启⽤。Receivers
receiver定义了数据如何进⼊OpenTelemetry Collector。必须配置⼀个或多个receiver,默认不会配置任何receivers。下⾯给出了所有可⽤的receivers的基本例⼦,更多配置可以参见。
receivers: opencensus:
address: \"localhost:55678\" zipkin:
address: \"localhost:9411\" jaeger:
protocols: grpc:
thrift_http:
thrift_tchannel: thrift_compact: thrift_binary:
prometheus: config:
scrape_configs:
- job_name: \"caching_cluster\" scrape_interval: 5s static_configs:
- targets: [\"localhost:88\"]
Processors
Processors运⾏在数据的接收和导出之间。虽然Processors是可选的,但有时候会建议使⽤Processors。下⾯给出了所有可⽤的Processors的基本例⼦,更多参见。
processors:
attributes/example:
actions:
- key: db.statement action: delete batch:
timeout: 5s
send_batch_size: 1024 probabilistic_sampler: disabled: true span: name:
from_attributes: [\"db.svc\ separator: \"::\" queued_retry: {} tail_sampling: policies:
- name: policy1 type: rate_limiting rate_limiting:
spans_per_second: 100
Exporters
exporter指定了如何将数据发往⼀个或多个后端/⽬标。必须配置⼀个或多个exporter,默认不会配置任何exporter。下⾯给出了所有可⽤的exporters的基本例⼦,更多参见。
exporters: opencensus:
headers: {\"X-test-header\": \"test-header\ compression: \"gzip\"
cert_pem_file: \"server-ca-public.pem\" # optional to enable TLS endpoint: \"localhost:55678\" reconnection_delay: 2s logging:
loglevel: debug
jaeger_grpc:
endpoint: \"http://localhost:14250\"
jaeger_thrift_http:
headers: {\"X-test-header\": \"test-header\ timeout: 5
endpoint: \"http://localhost:14268/api/traces\" zipkin:
endpoint: \"http://localhost:9411/api/v2/spans\" prometheus:
endpoint: \"localhost:88\" namespace: \"default\"
Service
Service部分⽤于配置OpenTelemetry Collector根据receivers, processors, exporters, 和extensions sections的配置会启⽤那些特性。service分为两部分:
extensionspipelines
extensions包含启⽤的扩展,如:
service:
extensions: [health_check, pprof, zpages]
Pipelines有两类:
metrics: 采集和处理metrics数据traces: 采集和处理trace数据
⼀个pipeline是⼀组 receivers, processors, 和exporters的集合。必须在service之外定义每个receiver/processor/exporter的配置,然后将其包含到pipeline中。
注:每个receiver/processor/exporter都可以⽤到多个pipeline中。当多个pipeline引⽤processor(s)时,每个pipeline都会获得该processor(s)的⼀个实例,这与多个pipeline中引⽤receiver(s)/exporter(s)的情况不同(所有pipelines仅能获得receiver/exporter的⼀个实例)。下⾯给出了⼀个pipeline配置的例⼦,更多可以参见。
service:
pipelines: metrics:
receivers: [opencensus, prometheus] exporters: [opencensus, prometheus] traces:
receivers: [opencensus, jaeger] processors: [batch, queued_retry] exporters: [opencensus, zipkin]
Extensions
Extensions可以⽤于监控OpenTelemetry Collector的健康状态。Extensions是可选的,默认不会配置任何Extensions。下⾯给出了所有可⽤的extensions的基本例⼦,更多参见。
extensions:
health_check: {} pprof: {} zpages: {}
使⽤环境变量
collector配置中可以使⽤环境变量,如:
processors:
attributes/example: actions:
- key: \"$DB_KEY\"
action: \"$OPERATION\"
Collector的使⽤
下⾯使⽤来体验⼀下Collector的功能
本例展⽰如何从OpenTelemetry-Go SDK 中导出trace和metric数据,并将其导⼊OpenTelemetry Collector,最后通过Collector将trace数据传递给Jaeger,将metric数据传递给Prometheus。完整的流程为:
-----> Jaeger (trace)App + SDK ---> OpenTelemtry Collector ---|
-----> Prometheus (metrics)
部署到Kubernetes
⽬录中包含本demo所需要的所有部署⽂件。为了简化⽅便,官⽅将部署⽬录集成到了⼀个⽂件中。在必要时可以⼿动执⾏Makefile中的命令。
部署Prometheus operator
git clone https://github.com/coreos/kube-prometheus.gitcd kube-prometheus
kubectl create -f manifests/setup
# wait for namespaces and CRDs to become available, thenkubectl create -f manifests/
可以使⽤如下⽅式清理环境:
kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup
等待prometheus所有组件变为running状态
# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGEalertmanager-main-0 2/2 Running 0 16malertmanager-main-1 2/2 Running 0 16malertmanager-main-2 2/2 Running 0 16mgrafana-7f567cccfc-4pmhq 1/1 Running 0 16mkube-state-metrics-85cb9cfd7c-x6kq6 3/3 Running 0 16mnode-exporter-c4svg 2/2 Running 0 16mnode-exporter-n6tnv 2/2 Running 0 16m
prometheus-adapter-5578f58c-vmzr8 1/1 Running 0 16mprometheus-k8s-0 3/3 Running 0 16mprometheus-k8s-1 3/3 Running 1 16m
prometheus-operator-5b469f4f66-qx2jc 2/2 Running 0 16m
使⽤Makefile
下⾯使⽤部署Jaeger,Prometheus monitor和Collector,依次执⾏如下命令即可:
# Create the namespacemake namespace-k8s# Deploy Jaeger operatormake jaeger-operator-k8s
# After the operator is deployed, create the Jaeger instancemake jaeger-k8s
# Then the Prometheus instance. Ensure you have enabled a Prometheus operator# before executing (see above).make prometheus-k8s
# Finally, deploy the OpenTelemetry Collectormake otel-collector-k8s
等待observability命名空间下的Jaeger和Collector的Pod变为running状态
# kubectl get pod -n observability
NAME READY STATUS RESTARTS AGEjaeger-7b868df4d6-w4tk8 1/1 Running 0 97sjaeger-operator-9b4b7bb48-q6k59 1/1 Running 0 110sotel-collector-7cfdcb7658-ttc8j 1/1 Running 0 14s
可以使⽤make clean-k8s命令来清理环境,但该命令不会移除命名空间,需要⼿动删除命名空间:
kubectl delete namespaces observability
配置OpenTelemetry Collector
完成上述步骤之后,就部署好了所需要的所有资源。下⾯看⼀下Collector的:
为了使应⽤发送数据到OpenTelemetry Collector,⾸先需要配置otlp类型的receiver,它使⽤gRpc进⾏通信:
...
otel-collector-config: | receivers:
# Make sure to add the otlp receiver.
# This will open up the receiver on port 55680. otlp:
endpoint: 0.0.0.0:55680 processors:...
上述配置会在Collector侧创建receiver,并打开55680端⼝,⽤于接收trace。剩下的配置都⽐较标准,唯⼀需要注意的是需要创建Jaeger和Prometheus exporters:
...
exporters: jaeger_grpc:
endpoint: \"jaeger-collector.observability.svc.cluster.local:14250\" prometheus:
endpoint: 0.0.0.0:88 namespace: \"testapp\"...
OpenTelemetry Collector service
中另外⼀个值得注意的是⽤于访问OpenTelemetry Collector的NodePort
apiVersion: v1kind: Servicemetadata: ...spec: ports:
- name: otlp # Default endpoint for otlp receiver. port: 55680 protocol: TCP targetPort: 55680 nodePort: 30080
- name: metrics # Endpoint for metrics from our app. port: 88 protocol: TCP targetPort: 88 selector:
component: otel-collector type:
NodePort
该service 会将⽤于访问otlp receiver的30080端⼝与cluster node的55680端⼝进⾏绑定,这样就可以通过静态地址:30080来访问Collector。运⾏代码在⽂件中可以看到完整的⽰例代码。要运⾏该代码,需要满⾜Go的版本>=1.13
# go run main.go
2020/10/20 09:19:17 Waiting for connection...
2020/10/20 09:19:17 Doing really hard work (1 / 10)2020/10/20 09:19:18 Doing really hard work (2 / 10)2020/10/20 09:19:19 Doing really hard work (3 / 10)2020/10/20 09:19:20 Doing really hard work (4 / 10)2020/10/20 09:19:21 Doing really hard work (5 / 10)2020/10/20 09:19:22 Doing really hard work (6 / 10)2020/10/20 09:19:23 Doing really hard work (7 / 10)2020/10/20 09:19:24 Doing really hard work (8 / 10)2020/10/20 09:19:25 Doing really hard work (9 / 10)2020/10/20 09:19:26 Doing really hard work (10 / 10)2020/10/20 09:19:27 Done!
2020/10/20 09:19:27 exporter stopped
该⽰例模拟了⼀个正在运⾏应⽤程序,计算10秒之后结束。查看采集到的数据
运⾏go run main.go的数据流如下:
Jaeger UI
Jaeger上查询trace内容如下:
Prometheus
运⾏main.go结束之后,可以在Prometheus中查看该metric。其对应的Prometheus target为observability/otel-collector/0Prometheus上查询metric内容如下:
FAQ:
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1metadata:
name: prometheus-k8s namespace: monitoringrules:
- apiGroups: [\"\"]
resources: [\"services\ verbs: [\"get\- apiGroups: [\"extensions\"] resources: [\"ingresses\"] verbs: [\"get\- nonResourceURLs: [\"/metrics\"] verbs: [\"get\
在运⾏\"go run main.go\"时可能会遇到rpc error: code = Internal desc = grpc: error unmarshalling request: unexpected EOF这样的错误,通常因为client和server使⽤的proto不⼀致导致的。client端(即)使⽤的proto⽂件⽬录为go.opentelemetry.io/otel/exporters/otlp/internal/opentelemetry-proto-gen,⽽使⽤proto⽂件⽬录为go.opentelemetry.io/collector/internal/data/opentelemetry-proto-gen,需要⽐较这两个⽬录下的⽂件是否⼀致。如果不⼀致,则需要根据collector的版本为⽣成对应的proto⽂件(或者可以直接更换collector的镜像,注意使⽤的的镜像版本)。在collector的下可以看到对应的注释和使⽤的proto版本,如下:
collector使⽤的proto git库为。clone该库,切换到对应版本后,执⾏make gen-go即可⽣成对应的⽂件。
ComponentBinary Protobuf Encodingcollector/metrics/*collector/trace/*common/*metrics/*resource/*
AlphaStableStableAlphaStableMaturity
trace/trace.protoComponenttrace/trace_config.protoJSON encodingAll messages
StableMaturityAlphaAlpha