Context
AWS Instances have certain limits on the number of network packets or network bandwidth. Breaches in these limits can lead to packet drops and connections churns.
We can use a command line tool like ethtool to check the AWS instance limiters metrics, though this needs the a more recent version of the ENA driver for this to work - 2.2.10 or later. See this reference : https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html
Method
You can use the ethtool command line with the -S option to capture the relevant metrics.
sudo ethtool -S <Network_interface_Name>
Pay attention to the following metrics:
-
bw_in_allowance_exceeded : The number of packets queued and/or dropped because the inbound aggregate bandwidth exceeded the maximum for the instance.
-
bw_out_allowance_exceeded : The number of packets queued and/or dropped because the outbound aggregate bandwidth exceeded the maximum for the instance.
-
pps_allowance_exceeded : The number of packets queued and/or dropped because the bidirectional PPS exceeded the maximum for the instance.
-
conntrack_allowance_exceeded : The number of packets dropped because connection tracking exceeded the maximum for the instance and new connections could not be established. This can result in packet loss for traffic to or from the instance.
-
linklocal_allowance_exceeded : The number of packets dropped because the PPS of the traffic to local proxy services exceeded the maximum for the network interface. This impacts traffic to the DNS service, the Instance Metadata Service, and the Amazon Time Sync Service.
More details can be obtained from AWS Support on an instance that has breached these limits.
Example Output:
# ethtool -S eth0
NIC statistics:
tx_timeout: 0
suspend: 0
resume: 0
wd_expired: 0
interface_up: 1
interface_down: 0
admin_q_pause: 0
*bw_in_allowance_exceeded: 0
*bw_out_allowance_exceeded: 0
*pps_allowance_exceeded: 0
*conntrack_allowance_exceeded: 0
*linklocal_allowance_exceeded: 0
queue_0_tx_cnt: 10898
queue_0_tx_bytes: 680131
queue_0_tx_queue_stop: 0
queue_0_tx_queue_wakeup: 0
queue_0_tx_dma_mapping_err: 0
queue_0_tx_linearize: 0
queue_0_tx_linearize_failed: 0
queue_0_tx_napi_comp: 36751
queue_0_tx_tx_poll: 36751
queue_0_tx_doorbells: 10897
queue_0_tx_prepare_ctx_err: 0
queue_0_tx_bad_req_id: 0
queue_0_tx_llq_buffer_copy: 34
queue_0_tx_missed_tx: 0
queue_0_tx_unmask_interrupt: 36751
queue_0_rx_cnt: 47795
queue_0_rx_bytes: 56902785
queue_0_rx_rx_copybreak_pkt: 7978
queue_0_rx_csum_good: 39907
queue_0_rx_refil_partial: 0
queue_0_rx_bad_csum: 0
queue_0_rx_page_alloc_fail: 0
queue_0_rx_skb_alloc_fail: 0
queue_0_rx_dma_mapping_err: 0
queue_0_rx_bad_desc_num: 0
queue_0_rx_bad_req_id: 0
queue_0_rx_empty_rx_ring: 0
queue_0_rx_csum_unchecked: 5152
queue_0_rx_lpc_warm_up: 0
queue_0_rx_lpc_full: 0
queue_0_rx_lpc_wrong_numa: 0
queue_1_tx_cnt: 3622
queue_1_tx_bytes: 367816
queue_1_tx_queue_stop: 0
queue_1_tx_queue_wakeup: 0
queue_1_tx_dma_mapping_err: 0
queue_1_tx_linearize: 0
queue_1_tx_linearize_failed: 0
queue_1_tx_napi_comp: 3900
queue_1_tx_tx_poll: 3900
queue_1_tx_doorbells: 3622
queue_1_tx_prepare_ctx_err: 0
queue_1_tx_bad_req_id: 0
queue_1_tx_llq_buffer_copy: 15
queue_1_tx_missed_tx: 0
queue_1_tx_unmask_interrupt: 3900
queue_1_rx_cnt: 360
queue_1_rx_bytes: 89804
queue_1_rx_rx_copybreak_pkt: 307
queue_1_rx_csum_good: 360
queue_1_rx_refil_partial: 0
queue_1_rx_bad_csum: 0
queue_1_rx_page_alloc_fail: 0
queue_1_rx_skb_alloc_fail: 0
queue_1_rx_dma_mapping_err: 0
queue_1_rx_bad_desc_num: 0
queue_1_rx_bad_req_id: 0
queue_1_rx_empty_rx_ring: 0
queue_1_rx_csum_unchecked: 0
queue_1_rx_lpc_warm_up: 0
queue_1_rx_lpc_full: 0
queue_1_rx_lpc_wrong_numa: 0
ena_admin_q_aborted_cmd: 0
ena_admin_q_submitted_cmd: 25
ena_admin_q_completed_cmd: 25
ena_admin_q_out_of_space: 0
ena_admin_q_no_completion: 0
Notes
The above metrics can indicate that the instance in under heavy traffic and is not able to handle the incoming traffic. There may be a need for further capacity planning.
More details on using CloudWatch to monitor these metrics can be found in AWS CloudWatch Documentations.