广告位

美国服务器如何使用Prometheus + Grafana 搭建炫酷的服务器监控面板?

频道: 日期: 浏览:82

美国服务器本文将详细介绍如何搭建功能强大、视觉炫酷的服务器监控系统,使用Prometheus进行指标收集,Grafana进行数据可视化。

一、架构设计与环境准备

监控系统架构

text

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐

│   监控目标       │    │   Prometheus     │    │     Grafana     │

│  - Node Exporter│ ──>│   数据收集与存储  │ ──>│   数据可视化    │

│  - 应用指标      │    │                  │    │                 │

│  - 自定义指标    │    │                  │    │                 │

└─────────────────┘    └──────────────────┘    └─────────────────┘

环境要求与检查

bash

#!/bin/bash# environment-check.shecho "=== 监控系统环境检查 ==="echo "检查时间: $(date)"echo ""# 系统信息echo "1. 系统信息:"echo "  操作系统: $(lsb_release -d | cut -f2)"echo "  内核版本: $(uname -r)"echo "  CPU架构: $(uname -m)"echo ""# 资源检查echo "2. 资源检查:"echo "  CPU核心: $(nproc)"echo "  内存: $(free -h | awk '/Mem:/ {print $2}')"echo "  磁盘空间:"df -h / | tail -1echo ""# 端口检查echo "3. 端口可用性检查:"for port in 9090 3000 9100; do

    if ss -tulpn | grep ":$port " > /dev/null; then

        echo "  端口 $port: 已被占用"

    else

        echo "  端口 $port: 可用"

    fidoneecho ""# 依赖检查echo "4. 依赖检查:"for cmd in wget curl tar systemctl; do

    if command -v $cmd &> /dev/null; then

        echo "  $cmd: 已安装"

    else

        echo "  $cmd: 未安装"

    fidoneecho ""echo "环境检查完成"

二、Prometheus 安装与配置

2.1 Prometheus 安装

bash

#!/bin/bash# install-prometheus.shset -eecho "开始安装 Prometheus..."echo ""# 创建用户和目录sudo useradd --no-create-home --shell /bin/false prometheussudo mkdir /etc/prometheus /var/lib/prometheussudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus# 下载 Prometheuscd /tmpPROMETHEUS_VERSION="2.45.0"wget https://github.com/prometheus/prometheus/releases/download/v${PROMETHEUS_VERSION}/prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz# 解压安装tar xvf prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gzcd prometheus-${PROMETHEUS_VERSION}.linux-amd64sudo cp prometheus promtool /usr/local/bin/sudo cp -r consoles console_libraries /etc/prometheus/sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtoolsudo chown -R prometheus:prometheus /etc/prometheus/consoles /etc/prometheus/console_libraries# 创建配置文件sudo tee /etc/prometheus/prometheus.yml > /dev/null <<'EOF'

global:

  scrape_interval: 15s

  evaluation_interval: 15s



# 告警规则文件

rule_files:

  - "alert_rules.yml"



# 告警管理器配置

alerting:

  alertmanagers:

    - static_configs:

        - targets:



# 抓取配置

scrape_configs:

  # Prometheus 自身监控

  - job_name: 'prometheus'

    static_configs:

      - targets: ['localhost:9090']

    scrape_interval: 30s

    metrics_path: /metrics



  # Node Exporter 监控

  - job_name: 'node_exporter'

    static_configs:

      - targets: ['localhost:9100']

    scrape_interval: 15s



  # 其他服务器监控(示例)

  - job_name: 'web_servers'

    static_configs:

      - targets: ['192.168.1.101:9100', '192.168.1.102:9100']

    scrape_interval: 15s



  # 自定义应用监控

  - job_name: 'custom_apps'

    static_configs:

      - targets: ['localhost:8080']

    scrape_interval: 30s

    metrics_path: /metrics

EOF# 创建告警规则sudo tee /etc/prometheus/alert_rules.yml > /dev/null <<'EOF'

groups:

- name: node_alerts

  rules:

  - alert: NodeDown

    expr: up{job="node_exporter"} == 0

    for: 1m

    labels:

      severity: critical

    annotations:

      summary: "节点 {{ $labels.instance }} 宕机"

      description: "节点 {{ $labels.instance }} 已经宕机超过 1 分钟"



  - alert: HighCPUUsage

    expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80

    for: 2m

    labels:

      severity: warning

    annotations:

      summary: "CPU 使用率过高 - {{ $labels.instance }}"

      description: "CPU 使用率超过 80% 持续 2 分钟"



  - alert: HighMemoryUsage

    expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90

    for: 2m

    labels:

      severity: warning

    annotations:

      summary: "内存使用率过高 - {{ $labels.instance }}"

      description: "内存使用率超过 90% 持续 2 分钟"



  - alert: DiskSpaceLow

    expr: (1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes)) * 100 > 85

    for: 2m

    labels:

      severity: warning

    annotations:

      summary: "磁盘空间不足 - {{ $labels.instance }}"

      description: "磁盘使用率超过 85%"

EOFsudo chown prometheus:prometheus /etc/prometheus/*.yml# 创建 systemd 服务sudo tee /etc/systemd/system/prometheus.service > /dev/null <<'EOF'

[Unit]

Description=Prometheus Time Series Collection and Processing Server

Wants=network-online.target

After=network-online.target



[Service]

User=prometheus

Group=prometheus

Type=simple

ExecStart=/usr/local/bin/prometheus 

    --config.file /etc/prometheus/prometheus.yml 

    --storage.tsdb.path /var/lib/prometheus/ 

    --web.console.templates=/etc/prometheus/consoles 

    --web.console.libraries=/etc/prometheus/console_libraries 

    --web.listen-address=0.0.0.0:9090 

    --web.external-url=



Restart=always



[Install]

WantedBy=multi-user.target

EOF# 启动服务sudo systemctl daemon-reloadsudo systemctl enable prometheussudo systemctl start prometheusecho ""echo "Prometheus 安装完成!"echo "访问地址: http://$(curl -s ifconfig.me):9090"echo "检查状态: sudo systemctl status prometheus"

2.2 Node Exporter 安装

bash

#!/bin/bash# install-node-exporter.shset -eecho "开始安装 Node Exporter..."echo ""# 创建用户sudo useradd --no-create-home --shell /bin/false node_exporter# 下载 Node Exportercd /tmpNODE_EXPORTER_VERSION="1.6.1"wget https://github.com/prometheus/node_exporter/releases/download/v${NODE_EXPORTER_VERSION}/node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz# 解压安装tar xvf node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gzcd node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64sudo cp node_exporter /usr/local/bin/sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter# 创建 systemd 服务sudo tee /etc/systemd/system/node_exporter.service > /dev/null <<'EOF'

[Unit]

Description=Node Exporter

Wants=network-online.target

After=network-online.target



[Service]

User=node_exporter

Group=node_exporter

Type=simple

ExecStart=/usr/local/bin/node_exporter 

    --web.listen-address=:9100 

    --collector.systemd 

    --collector.systemd.unit-whitelist=(sshd|nginx|mysql|docker).service 

    --collector.processes 

    --collector.tcpstat



Restart=always



[Install]

WantedBy=multi-user.target

EOF# 启动服务sudo systemctl daemon-reloadsudo systemctl enable node_exportersudo systemctl start node_exporterecho ""echo "Node Exporter 安装完成!"echo "访问地址: http://$(curl -s ifconfig.me):9100"echo "检查状态: sudo systemctl status node_exporter"echo "指标端点: http://$(curl -s ifconfig.me):9100/metrics"

2.3 高级 Prometheus 配置

yaml

# /etc/prometheus/advanced_config.ymlglobal:

  scrape_interval: 15s  external_labels:

    cluster: 'production'

    environment: 'prod'# 远程写入配置(可选,用于长期存储)remote_write:

  - url: "http://thanos-receive:10908/api/v1/receive"

    queue_config:

      capacity: 2500

      max_shards: 200

      min_shards: 1# 远程读取配置(可选)remote_read:

  - url: "http://thanos-query:10902/api/v1/read"# 服务发现配置示例scrape_configs:

  - job_name: 'node_exporter'

    scrape_interval: 15s    consul_sd_configs:

      - server: 'localhost:8500'

        services: ['node_exporter']

    relabel_configs:

      - source_labels: [__meta_consul_node]

        target_label: instance  # 文件服务发现

  - job_name: 'file_sd_nodes'

    file_sd_configs:

      - files:

          - /etc/prometheus/targets/nodes.yml        refresh_interval: 5m  # Blackbox Exporter(网络探测)

  - job_name: 'blackbox_http'

    metrics_path: /probe    params:

      module: [http_2xx]

    static_configs:

      - targets:

          - 'https://example.com'

          - 'https://google.com'

    relabel_configs:

      - source_labels: [__address__]

        target_label: __param_target      - source_labels: [__param_target]

        target_label: instance      - target_label: __address__        replacement: localhost:9115

三、Grafana 安装与配置

3.1 Grafana 安装

bash

#!/bin/bash# install-grafana.shset -eecho "开始安装 Grafana..."echo ""# 添加 Grafana 仓库sudo apt-get install -y software-properties-common wgetwget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list# 安装 Grafanasudo apt-get updatesudo apt-get install -y grafana# 配置 Grafanasudo tee /etc/grafana/grafana.ini > /dev/null <<'EOF'

[server]

protocol = http

http_addr = 0.0.0.0

http_port = 3000

domain = localhost

root_url = %(protocol)s://%(domain)s:%(http_port)s/

enable_gzip = true



[database]

type = sqlite3

path = grafana.db



[security]

admin_user = admin

admin_password = admin123

secret_key = SW2YcwTIb9zpOOhoPsMm



[users]

allow_sign_up = false

auto_assign_org = true

auto_assign_org_role = Viewer



[auth]

disable_login_form = false

disable_signout_menu = false



[analytics]

reporting_enabled = false

check_for_updates = false



[log]

mode = console file

level = info



[grafana_net]

url = https://grafana.net

EOF# 启动服务sudo systemctl daemon-reloadsudo systemctl enable grafana-serversudo systemctl start grafana-serverecho ""echo "Grafana 安装完成!"echo "访问地址: http://$(curl -s ifconfig.me):3000"echo "用户名: admin"echo "密码: admin123"echo "检查状态: sudo systemctl status grafana-server"

3.2 数据源配置脚本

bash

#!/bin/bash# configure-grafana-datasource.shset -eecho "配置 Grafana 数据源..."echo ""GRAFANA_URL="http://localhost:3000"USERNAME="admin"PASSWORD="admin123"# 等待 Grafana 启动echo "等待 Grafana 启动..."until curl -s $GRAFANA_URL/api/health; do

    sleep 2done# 配置 Prometheus 数据源curl -X POST 

  -H "Content-Type: application/json" 

  -d '{

    "name": "Prometheus",

    "type": "prometheus",

    "url": "http://localhost:9090",

    "access": "proxy",

    "isDefault": true,

    "jsonData": {

      "timeInterval": "15s",

      "queryTimeout": "60s",

      "httpMethod": "POST",

      "manageAlerts": true,

      "prometheusType": "Prometheus",

      "prometheusVersion": "2.45.0",

      "cacheLevel": "High",

      "disableMetricsLookup": false

    }

  }' 

  http://$USERNAME:$PASSWORD@localhost:3000/api/datasourcesecho ""echo "Prometheus 数据源配置完成!"# 配置 Loki 数据源(日志)curl -X POST 

  -H "Content-Type: application/json" 

  -d '{

    "name": "Loki",

    "type": "loki",

    "url": "http://localhost:3100",

    "access": "proxy"

  }' 

  http://$USERNAME:$PASSWORD@localhost:3000/api/datasourcesecho ""echo "Loki 数据源配置完成!"

四、炫酷监控面板配置

4.1 Node Exporter 全面监控面板

创建节点监控仪表板:

bash

#!/bin/bash# create-node-dashboard.shset -eecho "创建 Node Exporter 监控仪表板..."echo ""GRAFANA_URL="http://localhost:3000"USERNAME="admin"PASSWORD="admin123"DASHBOARD_JSON=$(cat << 'EOF'{

  "dashboard": {

    "id": null,    "title": "Node Exporter 全面监控",    "tags": ["node", "prometheus", "system"],    "timezone": "browser",    "panels": [

      {

        "id": 1,        "title": "CPU 使用率",        "type": "stat",        "targets": [

          {

            "expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)",

            "legendFormat": "{{instance}}",

            "refId": "A"

          }

        ],

        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},

        "fieldConfig": {

          "defaults": {

            "unit": "percent",

            "thresholds": {

              "steps": [

                {"color": "green", "value": null},

                {"color": "red", "value": 80}

              ]

            }

          }

        }

      },

      {

        "id": 2,

        "title": "内存使用率",

        "type": "stat",

        "targets": [

          {

            "expr": "(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100",

            "legendFormat": "{{instance}}",

            "refId": "A"

          }

        ],

        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},

        "fieldConfig": {

          "defaults": {

            "unit": "percent",

            "thresholds": {

              "steps": [

                {"color": "green", "value": null},

                {"color": "orange", "value": 70},

                {"color": "red", "value": 90}

              ]

            }

          }

        }

      },

      {

        "id": 3,

        "title": "CPU 使用率趋势",

        "type": "timeseries",

        "targets": [

          {

            "expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)",

            "legendFormat": "{{instance}}",

            "refId": "A"

          }

        ],

        "gridPos": {"h": 8, "w": 24, "x": 0, "y": 8},

        "fieldConfig": {

          "defaults": {

            "unit": "percent",

            "color": {"mode": "palette-classic"}

          }

        }

      },

      {

        "id": 4,

        "title": "内存使用趋势",

        "type": "timeseries",

        "targets": [

          {

            "expr": "node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes",

            "legendFormat": "已用内存",

            "refId": "A"

          },

          {

            "expr": "node_memory_MemAvailable_bytes",

            "legendFormat": "可用内存",

            "refId": "B"

          }

        ],

        "gridPos": {"h": 8, "w": 24, "x": 0, "y": 16},

        "fieldConfig": {

          "defaults": {

            "unit": "bytes",

            "color": {"mode": "palette-classic"}

          }

        }

      },

      {

        "id": 5,

        "title": "磁盘 I/O",

        "type": "timeseries",

        "targets": [

          {

            "expr": "rate(node_disk_read_bytes_total[1m])",

            "legendFormat": "{{device}} 读取",

            "refId": "A"

          },

          {

            "expr": "rate(node_disk_written_bytes_total[1m])",

            "legendFormat": "{{device}} 写入",

            "refId": "B"

          }

        ],

        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 24},

        "fieldConfig": {

          "defaults": {

            "unit": "Bps",

            "color": {"mode": "palette-classic"}

          }

        }

      },

      {

        "id": 6,

        "title": "网络流量",

        "type": "timeseries",

        "targets": [

          {

            "expr": "rate(node_network_receive_bytes_total[1m])",

            "legendFormat": "{{device}} 接收",

            "refId": "A"

          },

          {

            "expr": "rate(node_network_transmit_bytes_total[1m])",

            "legendFormat": "{{device}} 发送",

            "refId": "B"

          }

        ],

        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 24},

        "fieldConfig": {

          "defaults": {

            "unit": "Bps",

            "color": {"mode": "palette-classic"}

          }

        }

      },

      {

        "id": 7,

        "title": "系统负载",

        "type": "timeseries",

        "targets": [

          {

            "expr": "node_load1",

            "legendFormat": "1分钟负载",

            "refId": "A"

          },

          {

            "expr": "node_load5",

            "legendFormat": "5分钟负载",

            "refId": "B"

          },

          {

            "expr": "node_load15",

            "legendFormat": "15分钟负载",

            "refId": "C"

          }

        ],

        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 32},

        "fieldConfig": {

          "defaults": {

            "color": {"mode": "palette-classic"}

          }

        }

      },

      {

        "id": 8,

        "title": "磁盘使用率",

        "type": "bargauge",

        "targets": [

          {

            "expr": "(1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes)) * 100",

            "legendFormat": "{{mountpoint}}",

            "refId": "A"

          }

        ],

        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 32},

        "fieldConfig": {

          "defaults": {

            "unit": "percent",

            "thresholds": {

              "steps": [

                {"color": "green", "value": null},

                {"color": "orange", "value": 70},

                {"color": "red", "value": 90}

              ]

            }

          }

        }

      }

    ],

    "time": {

      "from": "now-6h",

      "to": "now"

    },

    "refresh": "10s"

  },

  "folderId": 0,

  "overwrite": true

}

EOF

)



# 创建仪表板

curl -X POST 

  -H "Content-Type: application/json" 

  -d "$DASHBOARD_JSON" 

  http://$USERNAME:$PASSWORD@localhost:3000/api/dashboards/db



echo ""echo "Node Exporter 监控仪表板创建完成!"

4.2 高级监控面板配置

创建企业级监控仪表板:

json

{

  "dashboard": {

    "title": "企业级服务器监控",

    "tags": ["enterprise", "servers", "monitoring"],

    "style": "dark",

    "timezone": "browser",

    "panels": [

      {

        "id": 1,

        "type": "stat",

        "title": "服务器状态",

        "targets": [{

          "expr": "up",

          "legendFormat": "{{instance}}"

        }],

        "gridPos": {"h": 4, "w": 6, "x": 0, "y": 0},

        "fieldConfig": {

          "defaults": {

            "color": {"mode": "thresholds"},

            "thresholds": {

              "steps": [

                {"color": "red", "value": 0},

                {"color": "green", "value": 1}

              ]

            },

            "mappings": [

              {

                "type": "value",

                "options": {

                  "0": {"text": "离线"},

                  "1": {"text": "在线"}

                }

              }

            ]

          }

        }

      },

      {

        "id": 2,

        "type": "heatmap",

        "title": "请求延迟分布",

        "targets": [{

          "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))",

          "legendFormat": "P95 延迟"

        }],

        "gridPos": {"h": 8, "w": 12, "x": 6, "y": 0}

      }

    ],

    "templating": {

      "list": [

        {

          "name": "instance",

          "type": "query",

          "query": "label_values(up, instance)",

          "refresh": 1,

          "includeAll": true,

          "multi": true

        },

        {

          "name": "job",

          "type": "query",

          "query": "label_values(up, job)",

          "refresh": 1,

          "includeAll": true,

          "multi": false

        }

      ]

    },

    "annotations": {

      "list": [{

        "name": "告警事件",

        "datasource": "Prometheus",

        "enable": true,

        "expr": "ALERTS",

        "iconColor": "red"

      }]

    }

  }}

五、高级功能与集成

5.1 告警配置

bash

#!/bin/bash# setup-alerting.shecho "配置 Grafana 告警..."echo ""# 创建告警通道(邮件)curl -X POST 

  -H "Content-Type: application/json" 

  -d '{

    "name": "Email Alerts",

    "type": "email",

    "settings": {

      "addresses": "admin@example.com",

      "singleEmail": true

    }

  }' 

  http://admin:admin123@localhost:3000/api/alert-notificationsecho ""echo "邮件告警通道配置完成!"# 创建告警通道(Slack)curl -X POST 

  -H "Content-Type: application/json" 

  -d '{

    "name": "Slack Alerts",

    "type": "slack",

    "settings": {

      "url": "https://hooks.slack.com/services/XXX/XXX/XXX",

      "recipient": "#monitoring",

      "username": "Grafana"

    }

  }' 

  http://admin:admin123@localhost:3000/api/alert-notificationsecho ""echo "Slack 告警通道配置完成!"

5.2 自动发现配置

yaml

# /etc/prometheus/targets/nodes.yml- targets:

    - '192.168.1.101:9100'

    - '192.168.1.102:9100'

    - '192.168.1.103:9100'

  labels:

    environment: 'production'

    role: 'web-server'- targets:

    - '192.168.1.201:9100'

    - '192.168.1.202:9100'

  labels:

    environment: 'production'

    role: 'database'

六、维护与优化

6.1 备份脚本

bash

#!/bin/bash# backup-monitoring.shBACKUP_DIR="/backup/monitoring"DATE=$(date +%Y%m%d_%H%M%S)echo "开始备份监控系统..."echo ""# 创建备份目录mkdir -p $BACKUP_DIR/$DATE# 备份 Prometheus 数据echo "备份 Prometheus 数据..."sudo systemctl stop prometheussudo tar -czf $BACKUP_DIR/$DATE/prometheus_data.tar.gz /var/lib/prometheus/sudo systemctl start prometheus# 备份 Grafana 配置echo "备份 Grafana 配置..."sudo tar -czf $BACKUP_DIR/$DATE/grafana_config.tar.gz /etc/grafana/# 备份仪表板echo "备份 Grafana 仪表板..."curl -s http://admin:admin123@localhost:3000/api/search | jq -r '.[] | select(.type == "dash-db") | .uid' | while read uid; do

    curl -s http://admin:admin123@localhost:3000/api/dashboards/uid/$uid > $BACKUP_DIR/$DATE/dashboard_${uid}.jsondone# 清理旧备份(保留30天)find $BACKUP_DIR -type d -mtime +30 -exec rm -rf {} ;echo ""echo "备份完成: $BACKUP_DIR/$DATE"

6.2 性能优化配置

ini

# /etc/prometheus/prometheus.yml - 优化版本global:

  scrape_interval: 30s

  scrape_timeout: 10s

  evaluation_interval: 30s# 优化存储配置storage:

  tsdb:

    retention: 15d

    out_of_order_time_window: 2h# 优化资源限制query:

  lookback-delta: 5m

  max-concurrency: 20

  timeout: 2m# WAL 配置wal:

  segment_size: 128MB

  truncate_frequency: 2h

七、监控系统验证

7.1 系统健康检查

bash

#!/bin/bash# monitoring-health-check.shecho "=== 监控系统健康检查 ==="echo "检查时间: $(date)"echo ""# 检查服务状态echo "1. 服务状态检查:"for service in prometheus node_exporter grafana-server; do

    if systemctl is-active --quiet $service; then

        echo "  ✅ $service: 运行中"

    else

        echo "  ❌ $service: 未运行"

    fidoneecho ""# 检查端口监听echo "2. 端口监听检查:"for port in 9090 9100 3000; do

    if ss -tulpn | grep ":$port " > /dev/null; then

        echo "  ✅ 端口 $port: 监听中"

    else

        echo "  ❌ 端口 $port: 未监听"

    fidoneecho ""# 检查数据收集echo "3. 数据收集检查:"PROMETHEUS_URL="http://localhost:9090"if curl -s "$PROMETHEUS_URL/api/v1/query?query=up" | grep -q "result"; then

    echo "  ✅ Prometheus 数据收集正常"else

    echo "  ❌ Prometheus 数据收集异常"fiecho ""# 检查指标端点echo "4. 指标端点检查:"if curl -s http://localhost:9100/metrics | grep -q "node_"; then

    echo "  ✅ Node Exporter 指标正常"else

    echo "  ❌ Node Exporter 指标异常"fiecho ""# 检查 Grafana 访问echo "5. Grafana 访问检查:"if curl -s http://localhost:3000/api/health | grep -q "Database"; then

    echo "  ✅ Grafana 访问正常"else

    echo "  ❌ Grafana 访问异常"fiecho ""echo "健康检查完成"

总结

通过本文的详细指南,你已经成功搭建了一个功能强大、视觉炫酷的服务器监控系统:

核心组件:

  1. Prometheus - 高性能指标收集和存储

  2. Node Exporter - 系统级指标暴露

  3. Grafana - 强大的数据可视化

关键特性:

  • 实时监控:15秒级别的数据收集和展示

  • 多维度监控:CPU、内存、磁盘、网络等全面覆盖

  • 智能告警:基于阈值的自动告警机制

  • 炫酷可视化:现代化的仪表板和图表

  • 易于扩展:支持多节点和服务发现

最佳实践建议:

  1. 定期备份:配置和数据的定期备份

  2. 容量规划:根据数据量规划存储空间

  3. 安全加固:配置适当的访问控制

  4. 性能监控:监控监控系统本身的性能

  5. 文档维护:记录配置变更和故障处理

后续扩展方向:

  • 集成应用级监控(Java, Python, Go应用)

  • 添加日志监控(Loki + Promtail)

  • 配置分布式存储(Thanos/Cortex)

  • 实现多区域监控

  • 添加业务指标监控

这个监控系统将为你的服务器基础设施提供全面的可视性和可靠性保障

生成文章图片 (60).jpg

关键词: