Articles in this section

Why do I see error on "stuck thread" when running asbackup?

Problem Description

I see the following error when issuing a backup job:

2019-06-30 15:13:02 GMT [ERR] [ 9129] Stuck thread detected
2019-06-30 15:13:02 GMT [ERR] [ 9129] Error while joining backup thread (error 110: Connection timed out)

Explanation

A backup is in fact a scan job. Aerospike's ability to read and enumerate all data in a namespace is called scanning. To understand the functionality of scans in general, refer to Document on Scans. While getting the above "Stuck thread" error in a scan job, you can check with the following sitiations.

Solution

The error is expected to be seen in 2 situations:

  1. The executed backup job was exited manually (ctrl+c) which would result in timing out of the connection. The active scan job on the node(s) would eventually time out. If necessary, an explicit kill of the job might be needed.

Refer to the asinfo command on killing scan jobs on the Manage Scans documentation page.

  1. If the error is seen on fresh execution of asbackup, it could mean that all the current scan-threads are busy processing ongoing and/or higher priority scan jobs. In this situation, a potential workaround would be to issue the asbackup at a higher priority (using '-f' or '--priority' option in asbackup) or to increase the number of scan-threads. Refer to Manage Scans page to understand scan priority.

Notes

  • Aerospike Documentation on Backup
  • Aerospike Documentation on Restore

Applies To Earliest Version

Pre 4.9

Applies To Latest Version

Current Version
Was this article helpful?
0 out of 0 found this helpful