Resolved
Issues accessing the Practice Labs platform

Started
September 19, 2023 at 12:15 PM
Status
Resolved after 8 days

Impact

Operational
Affected
Websites
www
api
  • Resolved

    Symptoms

    Users attempting to access the Practice Labs platform were unable to login.

    What went wrong

    Our database cluster had a spike in memory usage which caused a failover of the primary node to one of the secondary nodes. During this period the cluster become unresponsive.

    Who was impacted

    All users attempting to login to the Practice Labs platform.

    Why it went wrong

    Memory exhaustion in our database cluster.

    How did we fix it

    We have upgraded the memory in all 4 nodes in our database cluster. One node is running with less RAM than we have allocated due to a minor hardware fault which is being addressed by our maintenance provider.

    Our database cluster is now operating with reduced processing times, in some cases up to 60% faster with the additional RAM. We have monitored this closely for 7 days and are now comfortable that we can come out of monitoring.

  • Monitoring

    We have completed our emergency maintenance to upgrade our CDC database servers as part of the remediation plan from yesterdays outage. We will continue to closely monitor the platform.

  • Monitoring

    We are performing emergency maintenance at 9am UTC to upgrade our CDC database servers as part of the remediation plan from yesterdays outage, this should not impact users but are monitoring closely.

  • Monitoring

    Our database clusters primary node automatically failed over to one of its secondary nodes which restored user access to the platform.

    We are investigating further corrective actions and will continue to monitor. We appreciate your understanding and patience during this incident.

  • Identified

    We are currently investigating an issue where users are unable to access the platform either by logging in or launching a lab from another platform.

    We apologies for any inconveniences this has caused.

  • Investigating

    We are investigating an issue where users are unable to access the platform either by logging in or launching a lab from another platform.