使用 Prometheus 进行监控报警 ¶
1. 配置 Traefik 监控 ¶
Prometheus Operator 提供了 ServiceMonitor 这个 CRD 来配置监控指标的采集,这里我们定义一个如下所示的对象
008-traefik-service-monitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: traefik
namespace: default
labels:
app: traefik
release: prometheus-stack
spec:
jobLabel: traefik-metrics
selector:
matchLabels:
app.kubernetes.io/name: traefik-dashboard
app: traefik
namespaceSelector:
matchNames:
- kube-system
endpoints:
- port: admin
path: /metrics
# 注意 traefik-dashboard 服务是在 kube-system 命名空间中创建的。
# ServiceMonitor 则部署在默认的 default 命名空间中,所以使用 namespaceSelector 进行命名空间匹配。
1. 创建资源后 Prometheus 将获取 traefik-dashboard 服务的 /metrics 端点。
kubectl apply -f 008-traefik-service-monitor.yaml
验证一下 Prometheus 是否已经开始抓取 Traefik 的指标 ¶
2. 配置 Traefik 报警 ¶
添加一个报警规则,当条件匹配的时候会触发报警,同样 Prometheus Operator 也提供了一个名为 PrometheusRule 的 CRD 对象来配置报警规则:
009-traefik-rules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
annotations:
meta.helm.sh/release-name: kps
meta.helm.sh/release-namespace: monitoring
labels:
app: kube-prometheus-stack
release: kps
name: traefik-alert-rules
namespace: monitoring
spec:
groups:
- name: Traefik
rules:
- alert: TooManyRequest
expr: avg(traefik_entrypoint_open_connections{job="traefik-dashboard",namespace="kube-system"}) > 5
for: 1m
labels:
severity: critical
1. 定义了一个规则:如果 1 分钟内有超过 5 个 open connections 机会触发一个 TooManyRequest 报警。
kubectl apply -f 009-traefik-rules.yaml
注意: PrometheusRule 的 annotations & labels 可以借鉴其它已经运行的 rule
kubectl get PrometheusRule kps-kube-prometheus-stack-prometheus-operator -n monitoring -oyaml |head
查看 rule 是否成功 ¶
kubectl exec -n monitoring prometheus-kps-kube-prometheus-stack-prometheus-0 -- ls /etc/prometheus/rules/prometheus-kps-kube-prometheus-stack-prometheus-rulefiles-0/
创建完成后正常在 Promethues 的 Dashboard 下的 Status > Rules 页面就可以看到对应的报警规则:
3. Grafana 配置 ¶
4. 测试 ¶
Traefik 已经开始工作了,并且指标也被 Prometheus 和 Grafana 获取到了,接下来我们需要使用一个应用程序来测试。这里我们部署 HTTPBin 服务,它提供了许多端点,可用于模拟不同类型的用户流量。对应的资源清单文件如下所示:
010-traefik-httpbin.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: httpbin
labels:
app: httpbin
spec:
replicas: 1
selector:
matchLabels:
app: httpbin
template:
metadata:
labels:
app: httpbin
spec:
containers:
- image: kennethreitz/httpbin
name: httpbin
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: httpbin
spec:
ports:
- name: http
port: 8000
targetPort: 80
selector:
app: httpbin
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: httpbin
spec:
entryPoints:
- web
routes:
- match: Host(`httpbin.local`)
kind: Rule
services:
- name: httpbin
port: 8000
1. httpbin 路由会匹配 httpbin.local 的主机名,然后将请求转发给 httpbin Service:。
kubectl apply -f 010-traefik-httpbin.yaml
2. 查看 Traefik 的地址
kubectl get svc traefik -n kube-system
3. httpbin 路由会匹配 httpbin.local 的主机名,然后将请求转发给 httpbin Service:
curl -I http://10.96.190.2 -H "host:httpbin.local"
4. 使用 ab 来访问 HTTPBin 服务模拟一些流量,这些请求会产生对应的指标,执行以下脚本:
host=10.96.190.2
ab -c 5 -n 10000 -m PATCH -H "host:httpbin.local" -H "accept: application/json" http://${host}/patch
ab -c 5 -n 10000 -m GET -H "host:httpbin.local" -H "accept: application/json" http://${host}/get
ab -c 5 -n 10000 -m POST -H "host:httpbin.local" -H "accept: application/json" http://${host}/post