To Set up the GPU metrics reporting script:
On each of your VM instances, check that you meet the following requirements:
-
Each VM instance must have GPUs attached.
-
Each VM instance must have a GPU driver installed.
-
Each VM instance must have the pip utility installed. On each of your VM instances, install the GPU metrics agent.
-
To install the metrics agent, complete the following steps:
git clone https://github.com/GoogleCloudPlatform/tensorflow-inference-tensorrt5-t4-gpu.git
cd tensorflow-inference-tensorrt5-t4-gpu/metrics_reporting
pip install -r ./requirements.txt
sudo cp report_gpu_metrics.py /root/
cat <<-EOH > /lib/systemd/system/gpu_utilization_agent.service
[Unit]
Description=GPU Utilization Metric Agent
[Service]
Type=simple
PIDFile=/run/gpu_agent.pid
ExecStart=/bin/bash --login -c '/usr/bin/python /root/report_gpu_metrics.py'
User=root
Group=root
WorkingDirectory=/
Restart=always
[Install]
WantedBy=multi-user.target
EOH
systemctl daemon-reload
systemctl --no-reload --now enable /lib/systemd/system/gpu_utilization_agent.service
git clone https://github.com/GoogleCloudPlatform/tensorflow-inference-tensorrt5-t4-gpu.git
systemctl --no-reload --now enable /lib/systemd/system/gpu_utilization_agent.service