kube-prometheus监控ingress-nginx

1.配置ingress service,暴露metrics端口

  1. kubectl get svc -n ingress-controller
  2. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  3. ingress-nginx-controller NodePort 10.244.100.195 <none> 80:30080/TCP,443:30443/TCP 562d

默认情况下ingress-nginx的监控指标端口为10254,监控路径为其下的/metrics。调整配置ingress-nginx的配置文件,打开service及pod的10254端口。

新建service.yaml

  1. apiVersion: v1
  2. kind: Service
  3. metadata:
  4. name: ingress-nginx-controller-metrics
  5. namespace: ingress-controller
  6. labels:
  7. #k8s-app: ingress-nginx #注意,与pod的labels保持一致
  8. app.kubernetes.io/component: controller
  9. app.kubernetes.io/instance: ingress-nginx
  10. app.kubernetes.io/name: ingress-nginx
  11. annotations:
  12. prometheus.io/port: "10254"
  13. prometheus.io/scrape: "true"
  14. spec:
  15. type: ClusterIP
  16. ports:
  17. - name: metrics
  18. port: 9913
  19. protocol: TCP
  20. selector:
  21. #k8s-app: ingress-nginx #注意,与pod的labels保持一致
  22. app.kubernetes.io/component: controller
  23. app.kubernetes.io/instance: ingress-nginx
  24. app.kubernetes.io/name: ingress-nginx

创建该资源

  1. kubectl apply -f service.yaml

再次查看service

  1. kubectl get svc -n ingress-controller
  2. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  3. ingress-nginx-controller NodePort 10.244.100.195 <none> 80:30080/TCP,443:30443/TCP 562d
  4. ingress-nginx-controller-metrics ClusterIP 10.244.66.153 <none> 9913/TCP 562d

2.配置servicemonitors抓取

servicemonitor.yaml

  1. apiVersion: monitoring.coreos.com/v1
  2. kind: ServiceMonitor
  3. metadata:
  4. name: ingress-nginx-controller-metrics
  5. labels:
  6. k8s-app: ingress-nginx
  7. namespace: monitoring
  8. spec:
  9. endpoints:
  10. - interval: 30s
  11. port: metrics
  12. jobLabel: k8s-app
  13. namespaceSelector:
  14. matchNames:
  15. - ingress-controller
  16. selector:
  17. matchLabels:
  18. #k8s-app: ingress-nginx #注意labels与上面的service保持一致
  19. app.kubernetes.io/component: controller
  20. app.kubernetes.io/instance: ingress-nginx
  21. app.kubernetes.io/name: ingress-nginx

创建该资源

  1. kubectl apply -f servicemonitor.yaml

3.在kube-prometheus中查看ingress-nginx Target

kube-prometheus监控ingress-nginx

4.导入Grafana模板

Ingress-nginx模板ID:9614

kube-prometheus监控ingress-nginx

Ingress-nginx模板ID:14314

kube-prometheus监控ingress-nginx

5.配置告警规则

rules.yaml

  1. apiVersion: monitoring.coreos.com/v1
  2. kind: PrometheusRule
  3. metadata:
  4. labels:
  5. prometheus: k8s
  6. role: alert-rules
  7. name: nginx-ingress-rules
  8. namespace: monitoring
  9. spec:
  10. groups:
  11. - name: nginx-ingress-rules
  12. rules:
  13. - alert: NginxFailedtoLoadConfiguration
  14. expr: nginx_ingress_controller_config_last_reload_successful == 0
  15. for: 1m
  16. labels:
  17. severity: critical
  18. annotations:
  19. summary: "Nginx Ingress Controller配置文件加载失败"
  20. description: "Nginx Ingress Controller的配置文件加载失败,请检查配置文件是否正确。"
  21. - alert: NginxHighHttp4xxErrorRate
  22. expr: rate(nginx_ingress_controller_requests{status=~"^404"}[5m]) * 100 > 1
  23. for: 1m
  24. labels:
  25. severity: warining
  26. annotations:
  27. description: Nginx high HTTP 4xx error rate ( namespaces {{ $labels.exported_namespace }} host {{ $labels.host }} )
  28. summary: "Too many HTTP requests with status 404 (> 1%)"
  29. - alert: NginxHighHttp5xxErrorRate
  30. expr: rate(nginx_ingress_controller_requests{status=~"^5.."}[5m]) * 100 > 1
  31. for: 1m
  32. labels:
  33. severity: warining
  34. annotations:
  35. description: Nginx high HTTP 5xx error rate ( namespaces {{ $labels.exported_namespace }} host {{ $labels.host }} )
  36. summary: "Too many HTTP requests with status 5xx (> 1%)"
  37. - alert: NginxLatencyHigh
  38. expr: histogram_quantile(0.99, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket[2m])) by (host, node)) > 3
  39. for: 2m
  40. labels:
  41. severity: warining
  42. annotations:
  43. description: Nginx latency high ( namespaces {{ $labels.exported_namespace }} host {{ $labels.host }} )
  44. summary: "Nginx p99 latency is higher than 3 seconds"
  45. - alert: NginxHighRequestRate
  46. expr: rate(nginx_ingress_controller_nginx_process_requests_total[5m]) * 100 > 1000
  47. for: 1m
  48. labels:
  49. severity: warning
  50. annotations:
  51. description: Nginx ingress controller high request rate ( instance {{ $labels.instance }} namespaces {{ $labels.namespaces }} pod {{$labels.pod}})
  52. summary: "Nginx ingress controller high request rate (> 1000 requests per second)"
  53. - alert: SSLCertificateExpiration15day
  54. expr: nginx_ingress_controller_ssl_expire_time_seconds < 1296000
  55. for: 30m
  56. labels:
  57. severity: warning
  58. annotations:
  59. summary: SSL/TLS certificate for {{ $labels.host $labels.secret_name }} is about to expire
  60. description: The SSL/TLS certificate for {{ $labels.host $labels.secret_name }} will expire in less than 15 days.
  61. - alert: SSLCertificateExpiration7day
  62. expr: nginx_ingress_controller_ssl_expire_time_seconds < 604800
  63. for: 30m
  64. labels:
  65. severity: critical
  66. annotations:
  67. summary: SSL/TLS certificate for {{ $labels.host $labels.secret_name }} is about to expire
  68. description: The SSL/TLS certificate for {{ $labels.host $labels.secret_name }} will expire in less than 7 days.
声明: 本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。
prometheus

云原生夜莺监控 v7 最终版来了,可以上车了

2024-11-21 4:47:00

prometheus

kube-prometheus监控Harbor镜像仓库平台

2024-11-21 14:47:58

0 条回复 A文章作者 M管理员
欢迎您,新朋友,感谢参与互动!
    暂无讨论,说说你的看法吧
个人中心
购物车
优惠劵
今日签到
私信列表
搜索