Articles in this section

Why Do I See Stalled Migrations And "record too small" Errors In The Log?

Problem Description

When a cluster is migrating the migration does not complete and nodes with incoming migrations report the following error in the logs.

Mar 05 2020 01:41:07 GMT: WARNING (flat): (flat.c:135) record too small 0
Mar 05 2020 01:41:07 GMT: WARNING (migrate): (migrate.c:1398) handle insert: got bad record

Explanation

This error will occur when there is a node in the cluster with a bad disk. The node is aware that it needs to send out a record but due to the disk error the record is of 0 size. The error will occur on the node where the migration is inbound as it cannot write an inbound record of 0 size. On checking the status of migrations it is likely that a single node will be the source of the problematic partitions.

Solution

As the issue is due to a problem with node hardware the quickest solution to allow migrations to complete would be to shutdown the problem source node. The problematic node will almost certainly be showing disk errors in dmesg which can be run manually or as part of the asadm -e collectinfo command. The dmesg output would usually look similar to the output below:

[11055874.801271] Buffer I/O error on dev nvme0n2, logical block 38509628, async page read
[11055888.682385] print_req_error: critical medium error, dev nvme0n2, sector 308077024

Applies To Earliest Version

Pre 4.9

Applies To Latest Version

Current Version
Was this article helpful?
0 out of 0 found this helpful