만들기/EagleEye

[EagleEye][환경세팅] K8s Promehteus metrics 연결 문제 해결 (connection refused, connection reset by peer)

pythaac 2022. 5. 17. 17:55

  • [에러] serviceMonitor/prometheus : Get - dial tcp - connect: connection refused
    - 몇몇 메트릭을 수집하지 못하는 문제 발생

 

kube-proxy

  • metricsBindAddress 수정
    >> kubectl -n kube-system edit cm kube-proxy
    - metricsBidnAddress: "" -> metricsBidnAddress: 0.0.0.0:10249
    (이것 덕분에 해결되는건지 확실하진 않음)
  • reboot (control plane, worker nodes)
    >> sudo reboot
    (reboot을 안해도 기다리면 해결되는 항목도 있는 듯)

 

  • reboot 후 상태

 

그 외 kube-system

  • 해결되지 않은 항목
    • kube-controller-manager
    • kube-etcd
    • kube-scheduler

https://groups.google.com/g/prometheus-users/c/_aI-HySJ-xM

 

Alerts firing right after setting up kube-prometheus-stack

kube-prometheus-stack is not part of prometheus.  Therefore, your best chance of getting help is on a mailing list or tracker for that project. From a prometheus point of view, all I can say is that you are scraping some targets and they are down.  You n

groups.google.com

  • 위 글에 의하면 kube-system 친구들의 bind address를 확인하여 127.0.0.1을 0.0.0.0으로 바꿔줘야함
    (🙆‍♂️🙇‍♂️정말 감사한 글이다. 문제 인식 과정과 해결 방법을 상세하게 적어주셨다)
  • 수정할 파일의 위치는 아래에서 가져옴

https://stackoverflow.com/questions/60767427/kubernetes-kube-controller-manager-how-can-i-apply-a-flag

 

Kubernetes kube-controller-manager. How can I apply a flag?

In the documentation, I found that the following flag should be applied on kube-controller-manager to solve my problem: --horizontal-pod-autoscaler-use-rest-clients=1m0s But how can I apply this ...

stackoverflow.com

 

  • 1. kube-controller-manager 확인
    >> kubectl -n kube-system describe pod kube-controller-manager

  • 2. etcd 확인
    >> kubectl -n kube-system describe pod etcd

  • 3. kube-scheduler 확인
    >> kubectl -n kube-system describe pod kube-scheduler

 

  • bind address 수정
    • kube-conroller-manager
      >> sudo vi /etc/kubernetes/manifests/kube-controller-manager.yaml
      - command: --bind-address=0.0.0.0
    • kube-scheduler
      >> sudo vi /etc/kubernetes/manifests/kube-scheduler.yaml
      - command: --bind-address=0.0.0.0
    • etcd
      >> sudo vi /etc/kubernetes/manifests/etcd.yaml
      - command: --listen-metrics-urls=http://127.0.0.1:2381,http://192.168.1.7:2381
    • 수정하면 자동으로 재시작하는듯

 

https://github.com/prometheus-community/helm-charts/issues/1005

 

serviceMonitor/default/kube-prometheus-stack-kube-etcd detect Incorrect metric port and connection reset by peer · Issue #1005

Describe the bug http://192.168.71.136:2379/metrics DOWN endpoint="http-metrics"instance="192.168.71.136:2379"job="kube-etcd"namespace="kube-system"pod="...

github.com

  • etcd 추가 수정
    >> vi ~/prometheus-stack/values.yaml
    - kubeEtcd: service: enabled: true port: 2381 targetPort: 2381 (2379 -> 2381)
  • 수정사항 적용
    >> helm upgrade --install prometheus -f values.yaml ./ -n prometheus
  • 확인👀