GPU监控文档 (#73)
This commit is contained in:
parent
152f482594
commit
b71ec7eaea
@ -176,6 +176,7 @@ function getGuideSidebarZhCN() {
|
||||
{ text: '设置每月重置流量统计', link: '/guide/q6.html' },
|
||||
{ text: '自定义 Agent 监控项目', link: '/guide/q7.html' },
|
||||
{ text: '使用 Cloudflare Access 作为 OAuth2 提供方', link: '/guide/q8' },
|
||||
{ text: '启用 GPU 监控', link: '/guide/q9' },
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -246,6 +247,7 @@ function getGuideSidebarEnUS() {
|
||||
{ text: 'Reset Traffic Statistics Monthly', link: '/en_US/guide/q6.html' },
|
||||
{ text: 'Custom Agent Monitoring Projects', link: '/en_US/guide/q7.html' },
|
||||
{ text: 'Use Cloudflare Access As OAuth2 Provider', link: '/en_US/guide/q8' },
|
||||
{ text: 'Enable GPU monitoring', link: '/en_US/guide/q9' },
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -24,4 +24,5 @@ If you installed the Agent using the one-click script, you can edit `/etc/system
|
||||
- `--disable-auto-update`: Disables **automatic updates** for the Agent (security feature).
|
||||
- `--disable-force-update`: Disables **forced updates** for the Agent (security feature).
|
||||
- `--disable-command-execute`: Disables the execution of scheduled tasks and the opening of the online terminal on the Agent (security feature).
|
||||
- `--tls`: Enables SSL/TLS encryption (required if you use nginx to reverse proxy the Agent's gRPC connection and nginx has SSL/TLS enabled).
|
||||
- `--tls`: Enables SSL/TLS encryption (required if you use nginx to reverse proxy the Agent's gRPC connection and nginx has SSL/TLS enabled).
|
||||
- `--gpu`: Enable GPU monitoring (may need extra dependencies while monitoring GPU utilization. Refer to FAQ - Enable GPU monitoring for any questions.)
|
78
docs/en_US/guide/q9.md
Normal file
78
docs/en_US/guide/q9.md
Normal file
@ -0,0 +1,78 @@
|
||||
# Enable GPU monitoring
|
||||
|
||||
GPU monitoring is a new feature implemented in Nezha Monitoring v0.17.x. Before using the feature, please check you Dashboard version is higher than v0.17.2 and Agent version is higher than v0.17.0.
|
||||
|
||||
## Enable
|
||||
|
||||
### From Command-Line Flag
|
||||
|
||||
Append the `--gpu` flag to the Agent argument. For example:
|
||||
|
||||
```bash
|
||||
/opt/nezha/agent/nezha-agent -s example.com:5555 -p example --gpu
|
||||
```
|
||||
|
||||
### From configuration file
|
||||
|
||||
Execute the following command to modify Agent configuration to enable GPU monitoring.
|
||||
|
||||
```bash
|
||||
/opt/nezha/agent/nezha-agent edit
|
||||
```
|
||||
|
||||
In the returned interactive menu, choose to enable GPU monitoring.
|
||||
|
||||
## Enable GPU utilization monitoring
|
||||
|
||||
GPU model and GPU utilization are two different monitor items, which uses different approaches to obtain their value.
|
||||
|
||||
Windows and macOS supports getting GPU utilization without extra dependencies, and support multiple graphics card brands.
|
||||
|
||||
Linux distros support only NVIDIA and AMD cards and need to install extra dependencies.
|
||||
|
||||
Below are the instructions on how to enable GPU utilization monitoring on Linux for NVIDIA / AMD graphics cards.
|
||||
|
||||
### NVIDIA
|
||||
|
||||
NVIDIA cards need the `nvidia-smi` utility to get GPU utilization. This utility is included in the official driver by default.
|
||||
|
||||
If you use unofficial drivers like `nouveau`, then it's not possible to get GPU utilization.
|
||||
|
||||
### AMD
|
||||
|
||||
AMD cards need to install the official `amdgpu` driver and the `rocm-smi` utility.
|
||||
|
||||
Mainstream distros have already packaged `rocm-smi`, below are commands to install the utility on these distros:
|
||||
|
||||
```bash
|
||||
# Arch Linux
|
||||
pacman -Sy rocm-smi-lib
|
||||
|
||||
# Debian / Ubuntu
|
||||
apt install rocm-smi
|
||||
|
||||
# Fedora / RHEL 8+
|
||||
dnf install rocm-smi
|
||||
```
|
||||
|
||||
If your distro doesn't have the package, then you will need to compile `rocm_smi_lib` manually.
|
||||
|
||||
Required dependencies:`git` `cmake` `gcc`
|
||||
|
||||
First, clone the git repository of `rocm_smi_lib`:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/ROCm/rocm_smi_lib
|
||||
```
|
||||
|
||||
Then compile the libraries and install them on your system.
|
||||
|
||||
```bash
|
||||
cd rocm_smi_lib
|
||||
mkdir -p build
|
||||
cd build
|
||||
cmake ..
|
||||
make -j $(nproc)
|
||||
# Install library file and header; default location is /opt/rocm
|
||||
make install
|
||||
```
|
@ -24,4 +24,5 @@
|
||||
- `--disable-auto-update`:禁止 **自动更新** Agent(安全特性)。
|
||||
- `--disable-force-update`:禁止 **强制更新** Agent(安全特性)。
|
||||
- `--disable-command-execute`:禁止在 Agent 上执行定时任务、打开在线终端(安全特性)。
|
||||
- `--tls`:启用 SSL/TLS 加密(使用 nginx 反向代理 Agent 的 grpc 连接,并且 nginx 开启 SSL/TLS 时,需要启用该项配置)。
|
||||
- `--tls`:启用 SSL/TLS 加密(使用 nginx 反向代理 Agent 的 grpc 连接,并且 nginx 开启 SSL/TLS 时,需要启用该项配置)。
|
||||
- `--gpu`: 启用 GPU 监控(其中 GPU 使用率监控可能需要安装额外依赖。相关问题请参见常见问题 - 启用 GPU 监控。)
|
78
docs/guide/q9.md
Normal file
78
docs/guide/q9.md
Normal file
@ -0,0 +1,78 @@
|
||||
# 启用 GPU 监控
|
||||
|
||||
GPU 监控是哪吒监控 v0.17.x 引入的新功能,使用前请检查您的 Dashboard 版本是否为 v0.17.2+ / Agent 版本是否为 v0.17.0+。
|
||||
|
||||
## 启用
|
||||
|
||||
### 通过启动参数
|
||||
|
||||
在 Agent 运行参数后添加 `--gpu` 即可。例如:
|
||||
|
||||
```bash
|
||||
/opt/nezha/agent/nezha-agent -s example.com:5555 -p example --gpu
|
||||
```
|
||||
|
||||
### 通过配置文件
|
||||
|
||||
执行以下命令修改 Agent 配置文件以启用 GPU 监控:
|
||||
|
||||
```bash
|
||||
/opt/nezha/agent/nezha-agent edit
|
||||
```
|
||||
|
||||
在返回的互动菜单中选择启用 GPU 功能即可。
|
||||
|
||||
## 打开 GPU 占用率监控支持
|
||||
|
||||
GPU 型号与 GPU 使用率为两个不同的监控项目,使用了不同实现获取。
|
||||
|
||||
其中 Windows 和 macOS 支持无依赖获取 GPU 使用率,并支持多个品牌显卡;
|
||||
|
||||
Linux 平台则仅支持 NVIDIA / AMD 显卡,且需要安装额外依赖。
|
||||
|
||||
以下将介绍如何在 Linux 上为 NVIDIA / AMD 显卡启用 GPU 使用率监控。
|
||||
|
||||
### NVIDIA
|
||||
|
||||
NVIDIA 显卡获取 GPU 使用率需要用到 `nvidia-smi` 工具,一般为官方驱动自带。
|
||||
|
||||
如果您使用的是非官方驱动,例如 `nouveau`,那么将无法获取 GPU 使用率。
|
||||
|
||||
### AMD
|
||||
|
||||
AMD 显卡获取 GPU 使用率需要安装官方 `amdgpu` 驱动和 `rocm-smi` 工具。
|
||||
|
||||
主流系统均已打包 `rocm-smi` ,以下是部分系统的安装命令:
|
||||
|
||||
```bash
|
||||
# Arch Linux
|
||||
pacman -Sy rocm-smi-lib
|
||||
|
||||
# Debian / Ubuntu
|
||||
apt install rocm-smi
|
||||
|
||||
# Fedora / RHEL 8+
|
||||
dnf install rocm-smi
|
||||
```
|
||||
|
||||
如果您的系统并没有相应包,那么则需要手动编译安装 `rocm_smi_lib`。
|
||||
|
||||
您的系统需要安装这些依赖:`git` `cmake` `gcc`
|
||||
|
||||
首先 Clone `rocm_smi_lib` 的 git 仓库:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/ROCm/rocm_smi_lib
|
||||
```
|
||||
|
||||
然后进行编译并安装即可。
|
||||
|
||||
```bash
|
||||
cd rocm_smi_lib
|
||||
mkdir -p build
|
||||
cd build
|
||||
cmake ..
|
||||
make -j $(nproc)
|
||||
# Install library file and header; default location is /opt/rocm
|
||||
make install
|
||||
```
|
Loading…
x
Reference in New Issue
Block a user