Product: Columbus
Columbus Plus Job stuck 'Submitting'
If you have started a job that does not advance beyond 'Submitting' it could be one of the following:
Celery is not running
Access the server's command line interface as the root user and issue the following command:
$ /etc/init.d/columbus status celery
Celery should return:
Celery: up [ OK ]
If not try a restart with:
$ /etc/init.d/columbus restart celery
The next step varies if you are using an on premises cluster install or an leveraging Amazon EMR
On Prem
Keyless authentication is not working
The Columbus server needs to be able to connect to the submit host (AFA Master Node) via ssh in order to run create_cluster_job.py
Check you can ssh from server to submit host as the columbus user. On the Columbus server login or switch into the columbus user:
$ su - columbus
Check you can ssh to the submit host. Typically the command would take the form:
$ ssh columbus@SUBMIT_HOST
Where SUBMIT_HOST is the hostname of the submit host
If you are asked for a password thenin lies the problem.
AWS
Can be caused by incorrect setting in /etc/columbus/***.config file
If AWS availability zone is used in error, as oppose to region name.
An example AWS region would be us-east-1, its availability zone being us-east-1e.
If us-east-1e us used as the region_name in the config file, results in batch analysis stuck submitting.
Typical error that is found in /var/log/columbus/web/long_tasks.log
EndpointConnectionError: Could not connect to the endpoint URL: "https://elasticmapreduce.us-east-2c.amazonaws.com/"
If you wanted to check the URL within the CLI use a command such as:
$ curl https://elasticmapreduce.us-east-2c.amazonaws.com && echo
Which returns Couldn't resolve host... in the above us-east-1e case but returning a resonable <MissingAuthenticationTokenException> in the case of us-east-1.
Though if none of the above helps gather up all Columbus logs and provide to Technical Support.
Comments
0 comments
Article is closed for comments.