ResolvedIssues accessing the Practice Labs platform
Users attempting to access the Practice Labs platform were unable to login.
What went wrong
Our database cluster had a spike in memory usage which caused a failover of the primary node to one of the secondary nodes. During this period the cluster become unresponsive.
Who was impacted
All users attempting to login to the Practice Labs platform.
Why it went wrong
Memory exhaustion in our database cluster.
How did we fix it
We have upgraded the memory in all 4 nodes in our database cluster. One node is running with less RAM than we have allocated due to a minor hardware fault which is being addressed by our maintenance provider.
Our database cluster is now operating with reduced processing times, in some cases up to 60% faster with the additional RAM. We have monitored this closely for 7 days and are now comfortable that we can come out of monitoring.
We have completed our emergency maintenance to upgrade our CDC database servers as part of the remediation plan from yesterdays outage. We will continue to closely monitor the platform.
We are performing emergency maintenance at 9am UTC to upgrade our CDC database servers as part of the remediation plan from yesterdays outage, this should not impact users but are monitoring closely.
Our database clusters primary node automatically failed over to one of its secondary nodes which restored user access to the platform.
We are investigating further corrective actions and will continue to monitor. We appreciate your understanding and patience during this incident.
We are currently investigating an issue where users are unable to access the platform either by logging in or launching a lab from another platform.
We apologies for any inconveniences this has caused.
We are investigating an issue where users are unable to access the platform either by logging in or launching a lab from another platform.