Problem:
When submitting batch analysis jobs that include Phenologic.AI workloads to an external Slurm cluster, the following error appears on Image Artists job status page:
Cannot start Phenologic.AI server. No installation at /home/user/Phenologic.AI
Here, /home/user/Phenologic.AI refers to the directory on the Deep Learning server that contains the Phenologic.AI configuration files. The actual location of the Phenologic.AI installation may differ depending on your environment and configuration.
Cause:
This error occurs when one or more Slurm nodes are missing the required configuration parameter:
userconfig AcapellaDeepLearning.DNN.HCSDLServerConnect
This parameter should be defined in the /etc/acapella/config.init file. Without it, the Slurm nodes do not know where to route the Phenologic.AI jobs.
Solution:
Make sure that every Slurm node has the /etc/acapella/config.init file. If the file doesn’t already exist, you’ll need to create it. Then, add the following line to the file:
userconfig AcapellaDeepLearning.DNN.HCSDLServerConnect="http://<IP-of-DL-server>:8003"
Replace <IP-of-DL-server> with the IP address or domain name of your Deep Learning service. This could be:
- A dedicated Deep Learning host, or
The Deep Learning server running as part of the Docker swarm on the Image Artist server itself.
If the external cluster includes multiple worker nodes, this configuration must be applied to each one individually. Additionally, different worker nodes can be configured to use different Deep Learning servers to help distribute the processing load.
Comments
0 comments
Article is closed for comments.