Prometheus的简介与安装

Prometheus简介

Prometheus官网地址： https://prometheus.io

Prometheus是一款开源的监控系统和时间序列数据库，由SoundCloud开发并于2012年发布。它被广泛应用于容器化环境、云原生架构和微服务体系中。

以下是Prometheus的一些关键特性和概念：

多维度数据模型： Prometheus使用多维度数据模型来描述监控数据。每个时间序列数据都由标识和一组键值对标签（labels）唯一标识。这样可以实现更灵活的查询和聚合。
灵活的查询语言： Prometheus提供了PromQL查询语言，可以进行丰富的数据查询和分析操作。用户可以根据需要组合、聚合和过滤监控数据。
实时监控和警报： Prometheus具有实时监控能力，可以按照设定的规则进行实时监测，并生成警报。用户可以定义自定义的警报规则，并通过警报通知渠道（如电子邮件、Slack等）接收警报信息。
可视化和仪表盘： Prometheus提供了基本的数据可视化功能，并支持与Grafana等外部仪表盘工具集成，以实现更强大的数据可视化和监控仪表盘。
可扩展性和高度可靠性： Prometheus的设计考虑了可扩展性和高度可靠性。它支持分布式架构，并具有自动发现和自动配置的能力。可以通过添加额外的Prometheus实例和使用适当的存储解决方案来实现水平扩展和高可用性。
开放的生态系统： Prometheus具有活跃的开源社区，提供了丰富的插件和集成，以扩展和定制其功能。它可以与各种应用和系统集成，包括容器编排平台（如Kubernetes）、云服务提供商和各种监控和警报工具。

tip

总的来说，Prometheus是一个功能强大、易于使用和高度可扩展的监控系统，适用于监控和分析各种类型的系统、服务和应用程序。它为用户提供了实时监控、警报、可视化和数据查询等功能，帮助用户更好地了解系统的性能和健康状态。

Prometheus组件介绍：

prometheus server：主服务，接受外部http请求，收集、存储与查询数据等
prometheus targets: 静态收集的⽬标服务数据
service discovery：动态发现服务
prometheus alerting：报警通知
push gateway：数据收集代理服务器(类似于zabbix proxy)
data visualization and export：数据可视化与数据导出(访问客户端)

Prometheus架构图:

prometheus架构图

kubernetes安装Prometheus

安装

由于我们这里是要运行在 Kubernetes 系统中，所以我们直接用 Docker 镜像的方式运行。这里我们使用的实验环境是基于 Kubernetes v1.24.3 版本

root@master01:~# kubectl get nodes
NAME                 STATUS   ROLES           AGE   VERSION
master01.k8s.local   Ready    control-plane   65d   v1.24.3
node01.k8s.local     Ready    <none>          65d   v1.24.3
node02.k8s.local     Ready    <none>          65d   v1.24.3
node03.k8s.local     Ready    <none>          65d   v1.24.3
node04.k8s.local     Ready    <none>          65d   v1.24.3

为了方便管理，我们将监控相关的所有资源对象都安装在 monitoring 这个 namespace 下面，没有的话可以提前创建：

# 创建命名空间
root@master01:~# kubectl create ns monitoring
namespace/monitoring created

为了能够方便的管理配置文件，我们这里将 prometheus.yml 配置文件用 ConfigMap 的形式进行管理

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      scrape_timeout: 15s
    scrape_configs:
    - job_name: 'prometheus'
      static_configs:
      - targets: ['localhost:9090']

我们这里暂时只配置了对 prometheus 本身的监控，直接创建该资源对象

root@master01:~/monitoring# kubectl apply -f prometheus-cm.yaml 
configmap/prometheus-config created

配置文件创建完成了，以后如果我们有新的资源需要被监控，我们只需要将上面的 ConfigMap 对象更新即可。现在我们来创建 prometheus 的 Pod 资源

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: monitoring
  labels:
    app: prometheus
spec:
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus
      containers:
        - image: prom/prometheus:v2.31.1
          name: prometheus
          args:
            - '--config.file=/etc/prometheus/prometheus.yml'
            - '--storage.tsdb.path=/prometheus' # 指定tsdb数据路径
            - '--storage.tsdb.retention.time=24h'
            - '--web.enable-admin-api' # 控制对admin HTTP API的访问，其中包括删除时间序列等功能
            - '--web.enable-lifecycle' # 支持热更新，直接执行localhost:9090/-/reload立即生效
          ports:
            - containerPort: 9090
              name: http
          volumeMounts:
            - mountPath: '/etc/prometheus'
              name: config-volume
            - mountPath: '/prometheus'
              name: data
          resources:
            requests:
              cpu: 200m
              memory: 1024Mi
            limits:
              cpu: 200m
              memory: 1024Mi
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: prometheus-data
        - configMap:
            name: prometheus-config
          name: config-volume

持久化数据

另外为了 prometheus 的性能和数据持久化我们这里是直接将通过一个 LocalPV 来进行数据持久化的，注意一定不能使用 nfs 来持久化数据，通过 --storage.tsdb.path=/prometheus 指定数据目录，创建如下所示的一个 PVC 资源对象，注意是一个 LocalPV，和 node2 节点具有亲和性

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-local
  labels:
    app: prometheus
spec:
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 20Gi
  storageClassName: local-storage
  local:
    path: /data/k8s/prometheus
  persistentVolumeReclaimPolicy: Retain
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - node02.k8s.local
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-data
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: prometheus
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  storageClassName: local-storage

由于 prometheus 可以访问 Kubernetes 的一些资源对象，所以需要配置 rbac 相关认证，这里我们使用了一个名为 prometheus 的 serviceAccount 对象

# rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
  - apiGroups:
      - ''
    resources:
      - nodes
      - services
      - endpoints
      - pods
      - nodes/proxy
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - 'extensions'
    resources:
      - ingresses
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ''
    resources:
      - configmaps
      - nodes/metrics
    verbs:
      - get
  - nonResourceURLs:
      - /metrics
    verbs:
      - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
  - kind: ServiceAccount
    name: prometheus
    namespace: monitoring

由于我们要获取的资源信息，在每一个 namespace 下面都有可能存在，所以我们这里使用的是 ClusterRole 的资源对象，值得一提的是我们这里的权限规则声明中有一个 nonResourceURLs 的属性，是用来对非资源型 metrics 进行操作的权限声明，这个在以前我们很少遇到过，然后直接创建上面的资源对象即可

root@master01:~/monitoring# kubectl apply -f prometheus-rbac.yaml
serviceaccount/prometheus created
clusterrole.rbac.authorization.k8s.io/prometheus created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created

现在我们就可以添加 promethues 的资源对象了

root@master01:~/monitoring# kubectl apply -f prometheus-deploy.yaml 
deployment.apps/prometheus created

# 查看pod状态
root@master01:~/monitoring# kubectl -n monitoring get pods 
NAME                          READY   STATUS   RESTARTS      AGE
prometheus-6d7f58745c-g46ch   0/1     Error    3 (28s ago)   79s

# 查看错误日志
root@master01:~/monitoring# kubectl -n monitoring logs -f prometheus-6d7f58745c-g46ch 
ts=2023-07-14T02:55:54.404Z caller=main.go:444 level=info msg="Starting Prometheus" version="(version=2.31.1, branch=HEAD, revision=411021ada9ab41095923b8d2df9365b632fd40c3)"
ts=2023-07-14T02:55:54.404Z caller=main.go:449 level=info build_context="(go=go1.17.3, user=root@9419c9c2d4e0, date=20211105-20:35:02)"
ts=2023-07-14T02:55:54.404Z caller=main.go:450 level=info host_details="(Linux 5.4.0-153-generic #170-Ubuntu SMP Fri Jun 16 13:43:31 UTC 2023 x86_64 prometheus-6d7f58745c-g46ch (none))"
ts=2023-07-14T02:55:54.404Z caller=main.go:451 level=info fd_limits="(soft=1048576, hard=1048576)"
ts=2023-07-14T02:55:54.404Z caller=main.go:452 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2023-07-14T02:55:54.405Z caller=query_logger.go:87 level=error component=activeQueryTracker msg="Error opening query log file" file=/prometheus/queries.active err="open /prometheus/queries.active: permission denied"
panic: Unable to create mmap-ed active query log

goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker({0x7fff17f57e00, 0xb}, 0x14, {0x34442c0, 0xc0004bb9f0})
        /app/promql/query_logger.go:117 +0x3d7
main.main()
        /app/cmd/prometheus/main.go:491 +0x6bbf

权限

创建 Pod 后，我们可以看到并没有成功运行，出现了 open /prometheus/queries.active: permission denied 这样的错误信息，这是因为我们的 prometheus 的镜像中是使用的 nobody 这个用户，然后现在我们通过 LocalPV 挂载到宿主机上面的目录的 ownership 却是 root:

root@node02:~# ls -l /data/k8s/
total 4
drwxr-xr-x 2 root root 4096 Jul 14 10:50 prometheus

所以当然会出现操作权限问题了，这个时候我们就可以通过 securityContext 来为 Pod 设置下 volumes 的权限，通过设置 runAsUser=0 指定运行的用户为 root，也可以通过设置一个 initContainer 来修改数据目录权限

initContainers:
- name: fix-permissions
  image: busybox
  command: [chown, -R, "nobody:nobody", /prometheus]
  volumeMounts:
  - name: data
    mountPath: /prometheus

重新更新一个prometheus-deploy.yaml文件

root@master01:~/monitoring# kubectl apply -f prometheus-deploy.yaml 
deployment.apps/prometheus configured

# 查看容器状态
root@master01:~/monitoring# kubectl -n monitoring get pods
NAME                         READY   STATUS    RESTARTS   AGE
prometheus-5c879447f-9qv6q   1/1     Running   0          46s

# 查看容器日志
root@master01:~/monitoring# kubectl -n monitoring logs -f prometheus-5c879447f-9qv6q 
Defaulted container "prometheus" out of: prometheus, fix-permissions (init)
ts=2023-07-14T03:10:13.118Z caller=main.go:444 level=info msg="Starting Prometheus" version="(version=2.31.1, branch=HEAD, revision=411021ada9ab41095923b8d2df9365b632fd40c3)"
ts=2023-07-14T03:10:13.118Z caller=main.go:449 level=info build_context="(go=go1.17.3, user=root@9419c9c2d4e0, date=20211105-20:35:02)"
ts=2023-07-14T03:10:13.118Z caller=main.go:450 level=info host_details="(Linux 5.4.0-153-generic #170-Ubuntu SMP Fri Jun 16 13:43:31 UTC 2023 x86_64 prometheus-5c879447f-9qv6q (none))"
ts=2023-07-14T03:10:13.118Z caller=main.go:451 level=info fd_limits="(soft=1048576, hard=1048576)"
ts=2023-07-14T03:10:13.118Z caller=main.go:452 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2023-07-14T03:10:13.205Z caller=web.go:542 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
ts=2023-07-14T03:10:13.206Z caller=main.go:839 level=info msg="Starting TSDB ..."
ts=2023-07-14T03:10:13.208Z caller=tls_config.go:195 level=info component=web msg="TLS is disabled." http2=false
ts=2023-07-14T03:10:13.209Z caller=head.go:479 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
ts=2023-07-14T03:10:13.209Z caller=head.go:513 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=2.474µs
ts=2023-07-14T03:10:13.209Z caller=head.go:519 level=info component=tsdb msg="Replaying WAL, this may take a while"
ts=2023-07-14T03:10:13.210Z caller=head.go:590 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
ts=2023-07-14T03:10:13.210Z caller=head.go:596 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=27.869µs wal_replay_duration=257.679µs total_replay_duration=307.596µs
ts=2023-07-14T03:10:13.211Z caller=main.go:866 level=info fs_type=EXT4_SUPER_MAGIC
ts=2023-07-14T03:10:13.211Z caller=main.go:869 level=info msg="TSDB started"
ts=2023-07-14T03:10:13.211Z caller=main.go:996 level=info msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
ts=2023-07-14T03:10:13.212Z caller=main.go:1033 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml totalDuration=1.55303ms db_storage=933ns remote_storage=1.444µs web_handler=486ns query_engine=1.306µs scrape=1.296118ms scrape_sd=29.621µs notify=930ns notify_sd=4.408µs rules=1.359µs
ts=2023-07-14T03:10:13.212Z caller=main.go:811 level=info msg="Server is ready to receive web requests."

Pod 创建成功后，为了能够在外部访问到 prometheus 的 webui 服务，我们还需要创建一个 NodePort 类型的Service 对象

apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: monitoring
  labels:
    app: prometheus
spec:
  selector:
    app: prometheus
  type: NodePort
  ports:
    - name: web
      port: 9090
      targetPort: http

也可以创建 Ingress 对象，通过域名来进行访问

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: prometheus
  namespace: monitoring
spec:
  ingressClassName: nginx
  rules:
    - host: prometheus.k8s.local
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service: 
                name: prometheus
                port:
                  number: 9090

# 创建上述资源
root@master01:~/monitoring# kubectl apply -f prometheus-svc.yaml
service/prometheus created

# 查看暴露的NodePort端口
root@master01:~/monitoring# kubectl -n monitoring get svc
NAME         TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
prometheus   NodePort   10.109.15.202   <none>        9090:31846/TCP   5s

或者

# 创建上述资源
root@master01:~/monitoring# kubectl apply -f prometheus-ingress.yaml 
ingress.networking.k8s.io/prometheus created

# 查看创建的ingress资源
root@master01:~/monitoring# kubectl -n monitoring get ingress
NAME         CLASS   HOSTS                  ADDRESS   PORTS   AGE
prometheus   nginx   prometheus.k8s.local             80      3m57s

如上所述，通过http://任意节点IP:31846访问 prometheus 的 webui 服务即可

prometheus ui

Prometheus简介​

kubernetes安装Prometheus​

安装​

持久化数据​

权限​

Prometheus简介

kubernetes安装Prometheus

安装

持久化数据

权限