推荐使用一个普通用户运行 prometheus 。
useradd -m -s /bin/bash prometheus su - prometheus # 切换用户
打开网址 https://prometheus.io/download/ 找到 prometheus 下载那部分,找相应的版本下载。
然后解压到 /home/prometheus/prometheus
目录中。
以 root 用户新建文件 /etc/systemd/system/prometheus.service
[Unit] Description=Prometheus Server Documentation=https://prometheus.io/docs/introduction/overview/ After=network-online.target [Service] User=prometheus Restart=on-failure #Change this line if you download the #Prometheus on different path user ExecStart=/home/prometheus/prometheus/prometheus \ --config.file=/home/prometheus/prometheus/prometheus.yml \ --storage.tsdb.path=/home/prometheus/prometheus/data [Install] WantedBy=multi-user.target
创建配置文件 /home/prometheus/prometheus/prometheus.yml
以下是一份示例,从官方的 docker 版本里拿的配置文件。
# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090']
以 root 身份使用 systemctl
运行。
# 设置开机启动 systemctl enable prometheus # 立即运行 systemctl start prometheus # 查看运行状态 systemctl status prometheus
正常启动后,即可通过 http://localhost:9090 进行访问了 dashboard ,而 http://localhost:9090/metrics 则可以收集自身的状态信息。
Node_exporter
是 prometheus 里监控电脑状态的组件,一般用得比较多。
打开网址 https://prometheus.io/download/ 找到 node_exporter 下载那部分,找相应的版本下载。
然后解压到 /home/prometheus/node_exporter
目录中。
以 root 用户新建文件 /etc/systemd/system/node_exporter.service
[Unit] Description=Node Exporter Wants=network-online.target After=network-online.target [Service] User=prometheus ExecStart=/home/prometheus/node_exporter/node_exporter [Install] WantedBy=default.target
以 root 身份使用 systemctl
运行。
# 设置开机启动 systemctl enable node_exporter # 立即运行 systemctl start node_exporter # 查看运行状态 systemctl status node_exporter
正常启动后,即可通过 http://localhost:9100/metrics 查看并收集状态信息。
编辑上面 2.3 节的配置文件,在 scrape_configs 底下增加配置:
scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090'] # 以下为新增的配置 - job_name: 'node' static_configs: - targets: ['localhost:9100']
改完后重启 prometheus 即可。
systemctl restart prometheus
增加其他节点同理。
详见: https://prometheus.io/docs/guides/basic-auth/
需要使用 nginx 作为反射代理。
1) 安装 htpasswd
apt install apache2-utils
2) 创建一个密码文件
# -c 指定输出文件路径; admin 为用户名; 运行后会要求输入密码。 htpasswd -c /home/prometheus/node_htpasswd admin
3) 配置 nginx
server { location / { auth_basic "Prometheus"; auth_basic_user_file /home/prometheus/node_htpasswd; proxy_pass http://localhost:9100/; } }
一个比较完整的配置
server { listen 443 ssl http2; server_name example.com; ssl on; ssl_certificate /etc/nginx/ssl/example.com.cer; ssl_certificate_key /etc/nginx/ssl/example.com.key; ssl_session_cache shared:SSL:5m; ssl_session_timeout 20m; ssl_protocols TLSv1.2; ssl_ciphers 'ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256'; #ssl_prefer_server_ciphers on; gzip_vary on; gzip_comp_level 1; gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml font/ttf font/opentype; location / { auth_basic "Prometheus"; auth_basic_user_file /home/prometheus/node_htpasswd; proxy_pass http://localhost:9100/; } }
4) 配置 prometheus 增加 basic_auth
- job_name: 'node' static_configs: - targets: ['localhost:9100'] # scheme: "https" basic_auth: username: admin password: your_password
docker 版
docker run -d --name=grafana -p 3000:3000 grafana/grafana
默认的用户名密码为: admin/admin
普通 Linux 版本 (https://grafana.com/grafana/download)
wget https://dl.grafana.com/oss/release/grafana-6.6.1.linux-amd64.tar.gz tar -zxvf grafana-6.6.1.linux-amd64.tar.gz mv grafana-6.6.1 grafana cd grafana ./bin/grafana-server
配置系统服务:
vi /etc/systemd/system/grafana.service
[Unit] Description=Grafana Server Wants=network-online.target After=network-online.target [Service] User=prometheus WorkingDirectory=/home/prometheus/grafana ExecStart=/home/prometheus/grafana/bin/grafana-server [Install] WantedBy=multi-user.target
# my global config global: scrape_interval: 1m evaluation_interval: 1m # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'example1' static_configs: - targets: ['example1.com'] scheme: "https" basic_auth: username: user password: password - job_name: 'example2' static_configs: - targets: ['example2.com:9100'] scheme: "https" basic_auth: username: user password: password
version: "3.7" services: prometheus: image: prom/prometheus restart: always volumes: - prom_data:/prometheus - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro # ports: # - "9090:9090" grafana: # default admin user is admin/admin image: grafana/grafana restart: always volumes: - grafana_data:/var/lib/grafana ports: - "3000:3000" volumes: prom_data: grafana_data:
一个比较全面的中文 grafana dashboard 的 id 为 8919
version: '3.8' services: node_exporter: image: prom/node-exporter container_name: node_exporter command: - '--path.rootfs=/host' network_mode: host pid: host restart: unless-stopped volumes: - '/:/host:ro,rslave'