Connectivity issues to Hub pages - Follow-Up resolved, 05/07/2020 12:58PM UTC Hub Pages


This is a Follow-Up to Connectivity issues to Hub pages

Notification history

05/07/2020 12:58PM UTC
Closing issue and adding RFO update below.


What happened

Our monitoring systems systems detected connectivity issues on 2020-04-30 09:13 UTC

The reason is that one hub had a sudden spike of incidents created. However after that spike there was an inflow of additional incidents and incident updates constantly updating the hub page. This connected with high traffic to that hub resulted in a much lower hit-rate in cache cluster which put a much higher load on the main database cluster.

We adjusted the infrastructure and in cooperation with the customer we resolved this issue on 2020-04-29 10:16 UTC

However at 2020-04-29 11:17 UTC connectivity issues were detected again. The full investigation is not yet closed but preliminary analysis suggests that this second issue was triggered by adjustments made when scaling infrastructure to solve the initial problem. These changes connected with suboptimal settings on web servers put the platform performance off-balance.
After adjusting configuration settings the issue was fully resolved on 2020-04-29 11:58 UTC

Prevention and follow-up

We have made adjustments to caching mechanisms to perform better in situations like this.
We are conducting more load tests and checks simulating the situation to better understand the cause of second part of issue and prevent such situations in future.
04/30/2020 12:11PM UTC
Unfortunately the issue has come-back again.
We have made adjustments that improved the situation again but we are not closing the ticket yet - we will closely monitor the situation.