Articles in this section

How to check AWS Instance network throttling metrics ?

Context

AWS Instances have certain limits on the number of network packets or network bandwidth. Breaches in these limits can lead to packet drops and connections churns.

We can use a command line tool like ethtool to check the AWS instance limiters metrics, though this needs the a more recent version of the ENA driver for this to work - 2.2.10 or later. See this reference : https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html


Method

You can use the ethtool command line with the -S option to capture the relevant metrics.

sudo ethtool -S <Network_interface_Name>

Pay attention to the following metrics:

  • bw_in_allowance_exceeded : The number of packets queued and/or dropped because the inbound aggregate bandwidth exceeded the maximum for the instance.

  • bw_out_allowance_exceeded : The number of packets queued and/or dropped because the outbound aggregate bandwidth exceeded the maximum for the instance.

  • pps_allowance_exceeded : The number of packets queued and/or dropped because the bidirectional PPS exceeded the maximum for the instance.

  • conntrack_allowance_exceeded : The number of packets dropped because connection tracking exceeded the maximum for the instance and new connections could not be established. This can result in packet loss for traffic to or from the instance.

  • linklocal_allowance_exceeded : The number of packets dropped because the PPS of the traffic to local proxy services exceeded the maximum for the network interface. This impacts traffic to the DNS service, the Instance Metadata Service, and the Amazon Time Sync Service.

More details can be obtained from AWS Support on an instance that has breached these limits.

Example Output:

# ethtool -S eth0
NIC statistics:
     tx_timeout: 0
     suspend: 0
     resume: 0
     wd_expired: 0
     interface_up: 1
     interface_down: 0
     admin_q_pause: 0
     *bw_in_allowance_exceeded: 0
     *bw_out_allowance_exceeded: 0
     *pps_allowance_exceeded: 0
     *conntrack_allowance_exceeded: 0
     *linklocal_allowance_exceeded: 0
     queue_0_tx_cnt: 10898
     queue_0_tx_bytes: 680131
     queue_0_tx_queue_stop: 0
     queue_0_tx_queue_wakeup: 0
     queue_0_tx_dma_mapping_err: 0
     queue_0_tx_linearize: 0
     queue_0_tx_linearize_failed: 0
     queue_0_tx_napi_comp: 36751
     queue_0_tx_tx_poll: 36751
     queue_0_tx_doorbells: 10897
     queue_0_tx_prepare_ctx_err: 0
     queue_0_tx_bad_req_id: 0
     queue_0_tx_llq_buffer_copy: 34
     queue_0_tx_missed_tx: 0
     queue_0_tx_unmask_interrupt: 36751
     queue_0_rx_cnt: 47795
     queue_0_rx_bytes: 56902785
     queue_0_rx_rx_copybreak_pkt: 7978
     queue_0_rx_csum_good: 39907
     queue_0_rx_refil_partial: 0
     queue_0_rx_bad_csum: 0
     queue_0_rx_page_alloc_fail: 0
     queue_0_rx_skb_alloc_fail: 0
     queue_0_rx_dma_mapping_err: 0
     queue_0_rx_bad_desc_num: 0
     queue_0_rx_bad_req_id: 0
     queue_0_rx_empty_rx_ring: 0
     queue_0_rx_csum_unchecked: 5152
     queue_0_rx_lpc_warm_up: 0
     queue_0_rx_lpc_full: 0
     queue_0_rx_lpc_wrong_numa: 0
     queue_1_tx_cnt: 3622
     queue_1_tx_bytes: 367816
     queue_1_tx_queue_stop: 0
     queue_1_tx_queue_wakeup: 0
     queue_1_tx_dma_mapping_err: 0
     queue_1_tx_linearize: 0
     queue_1_tx_linearize_failed: 0
     queue_1_tx_napi_comp: 3900
     queue_1_tx_tx_poll: 3900
     queue_1_tx_doorbells: 3622
     queue_1_tx_prepare_ctx_err: 0
     queue_1_tx_bad_req_id: 0
     queue_1_tx_llq_buffer_copy: 15
     queue_1_tx_missed_tx: 0
     queue_1_tx_unmask_interrupt: 3900
     queue_1_rx_cnt: 360
     queue_1_rx_bytes: 89804
     queue_1_rx_rx_copybreak_pkt: 307
     queue_1_rx_csum_good: 360
     queue_1_rx_refil_partial: 0
     queue_1_rx_bad_csum: 0
     queue_1_rx_page_alloc_fail: 0
     queue_1_rx_skb_alloc_fail: 0
     queue_1_rx_dma_mapping_err: 0
     queue_1_rx_bad_desc_num: 0
     queue_1_rx_bad_req_id: 0
     queue_1_rx_empty_rx_ring: 0
     queue_1_rx_csum_unchecked: 0
     queue_1_rx_lpc_warm_up: 0
     queue_1_rx_lpc_full: 0
     queue_1_rx_lpc_wrong_numa: 0
     ena_admin_q_aborted_cmd: 0
     ena_admin_q_submitted_cmd: 25
     ena_admin_q_completed_cmd: 25
     ena_admin_q_out_of_space: 0
     ena_admin_q_no_completion: 0

Notes

The above metrics can indicate that the instance in under heavy traffic and is not able to handle the incoming traffic. There may be a need for further capacity planning.

More details on using CloudWatch to monitor these metrics can be found in AWS CloudWatch Documentations.


Applies To Earliest Version

Pre 4.9

Applies To Latest Version

Current Version
Was this article helpful?
0 out of 0 found this helpful