GPU监控文档 (#73)

This commit is contained in:
UUBulb 2024-06-25 22:06:14 +08:00 committed by GitHub
parent 152f482594
commit b71ec7eaea
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 162 additions and 2 deletions

View File

@ -176,6 +176,7 @@ function getGuideSidebarZhCN() {
{ text: '设置每月重置流量统计', link: '/guide/q6.html' },
{ text: '自定义 Agent 监控项目', link: '/guide/q7.html' },
{ text: '使用 Cloudflare Access 作为 OAuth2 提供方', link: '/guide/q8' },
{ text: '启用 GPU 监控', link: '/guide/q9' },
]
},
{
@ -246,6 +247,7 @@ function getGuideSidebarEnUS() {
{ text: 'Reset Traffic Statistics Monthly', link: '/en_US/guide/q6.html' },
{ text: 'Custom Agent Monitoring Projects', link: '/en_US/guide/q7.html' },
{ text: 'Use Cloudflare Access As OAuth2 Provider', link: '/en_US/guide/q8' },
{ text: 'Enable GPU monitoring', link: '/en_US/guide/q9' },
]
},
{

View File

@ -24,4 +24,5 @@ If you installed the Agent using the one-click script, you can edit `/etc/system
- `--disable-auto-update`: Disables **automatic updates** for the Agent (security feature).
- `--disable-force-update`: Disables **forced updates** for the Agent (security feature).
- `--disable-command-execute`: Disables the execution of scheduled tasks and the opening of the online terminal on the Agent (security feature).
- `--tls`: Enables SSL/TLS encryption (required if you use nginx to reverse proxy the Agent's gRPC connection and nginx has SSL/TLS enabled).
- `--tls`: Enables SSL/TLS encryption (required if you use nginx to reverse proxy the Agent's gRPC connection and nginx has SSL/TLS enabled).
- `--gpu`: Enable GPU monitoring (may need extra dependencies while monitoring GPU utilization. Refer to FAQ - Enable GPU monitoring for any questions.)

78
docs/en_US/guide/q9.md Normal file
View File

@ -0,0 +1,78 @@
# Enable GPU monitoring
GPU monitoring is a new feature implemented in Nezha Monitoring v0.17.x. Before using the feature, please check you Dashboard version is higher than v0.17.2 and Agent version is higher than v0.17.0.
## Enable
### From Command-Line Flag
Append the `--gpu` flag to the Agent argument. For example:
```bash
/opt/nezha/agent/nezha-agent -s example.com:5555 -p example --gpu
```
### From configuration file
Execute the following command to modify Agent configuration to enable GPU monitoring.
```bash
/opt/nezha/agent/nezha-agent edit
```
In the returned interactive menu, choose to enable GPU monitoring.
## Enable GPU utilization monitoring
GPU model and GPU utilization are two different monitor items, which uses different approaches to obtain their value.
Windows and macOS supports getting GPU utilization without extra dependencies, and support multiple graphics card brands.
Linux distros support only NVIDIA and AMD cards and need to install extra dependencies.
Below are the instructions on how to enable GPU utilization monitoring on Linux for NVIDIA / AMD graphics cards.
### NVIDIA
NVIDIA cards need the `nvidia-smi` utility to get GPU utilization. This utility is included in the official driver by default.
If you use unofficial drivers like `nouveau`, then it's not possible to get GPU utilization.
### AMD
AMD cards need to install the official `amdgpu` driver and the `rocm-smi` utility.
Mainstream distros have already packaged `rocm-smi`, below are commands to install the utility on these distros:
```bash
# Arch Linux
pacman -Sy rocm-smi-lib
# Debian / Ubuntu
apt install rocm-smi
# Fedora / RHEL 8+
dnf install rocm-smi
```
If your distro doesn't have the package, then you will need to compile `rocm_smi_lib` manually.
Required dependencies`git` `cmake` `gcc`
First, clone the git repository of `rocm_smi_lib`:
```bash
git clone https://github.com/ROCm/rocm_smi_lib
```
Then compile the libraries and install them on your system.
```bash
cd rocm_smi_lib
mkdir -p build
cd build
cmake ..
make -j $(nproc)
# Install library file and header; default location is /opt/rocm
make install
```

View File

@ -24,4 +24,5 @@
- `--disable-auto-update`:禁止 **自动更新** Agent安全特性
- `--disable-force-update`:禁止 **强制更新** Agent安全特性
- `--disable-command-execute`:禁止在 Agent 上执行定时任务、打开在线终端(安全特性)。
- `--tls`:启用 SSL/TLS 加密(使用 nginx 反向代理 Agent 的 grpc 连接,并且 nginx 开启 SSL/TLS 时,需要启用该项配置)。
- `--tls`:启用 SSL/TLS 加密(使用 nginx 反向代理 Agent 的 grpc 连接,并且 nginx 开启 SSL/TLS 时,需要启用该项配置)。
- `--gpu`: 启用 GPU 监控(其中 GPU 使用率监控可能需要安装额外依赖。相关问题请参见常见问题 - 启用 GPU 监控。)

78
docs/guide/q9.md Normal file
View File

@ -0,0 +1,78 @@
# 启用 GPU 监控
GPU 监控是哪吒监控 v0.17.x 引入的新功能,使用前请检查您的 Dashboard 版本是否为 v0.17.2+ / Agent 版本是否为 v0.17.0+。
## 启用
### 通过启动参数
在 Agent 运行参数后添加 `--gpu` 即可。例如:
```bash
/opt/nezha/agent/nezha-agent -s example.com:5555 -p example --gpu
```
### 通过配置文件
执行以下命令修改 Agent 配置文件以启用 GPU 监控:
```bash
/opt/nezha/agent/nezha-agent edit
```
在返回的互动菜单中选择启用 GPU 功能即可。
## 打开 GPU 占用率监控支持
GPU 型号与 GPU 使用率为两个不同的监控项目,使用了不同实现获取。
其中 Windows 和 macOS 支持无依赖获取 GPU 使用率,并支持多个品牌显卡;
Linux 平台则仅支持 NVIDIA / AMD 显卡,且需要安装额外依赖。
以下将介绍如何在 Linux 上为 NVIDIA / AMD 显卡启用 GPU 使用率监控。
### NVIDIA
NVIDIA 显卡获取 GPU 使用率需要用到 `nvidia-smi` 工具,一般为官方驱动自带。
如果您使用的是非官方驱动,例如 `nouveau`,那么将无法获取 GPU 使用率。
### AMD
AMD 显卡获取 GPU 使用率需要安装官方 `amdgpu` 驱动和 `rocm-smi` 工具。
主流系统均已打包 `rocm-smi` ,以下是部分系统的安装命令:
```bash
# Arch Linux
pacman -Sy rocm-smi-lib
# Debian / Ubuntu
apt install rocm-smi
# Fedora / RHEL 8+
dnf install rocm-smi
```
如果您的系统并没有相应包,那么则需要手动编译安装 `rocm_smi_lib`
您的系统需要安装这些依赖:`git` `cmake` `gcc`
首先 Clone `rocm_smi_lib` 的 git 仓库:
```bash
git clone https://github.com/ROCm/rocm_smi_lib
```
然后进行编译并安装即可。
```bash
cd rocm_smi_lib
mkdir -p build
cd build
cmake ..
make -j $(nproc)
# Install library file and header; default location is /opt/rocm
make install
```