On Tuesday, November 25th, some campaigns were impacted by symptoms like being stuck in the “syncing” or “scheduling” stages. This issue was due to downstream effects from the issue we shared on Friday, November 21st (posted here).
The incident was resolved by fixing the resource allocation policies to ensure that each worker had sufficient resources to operate effectively, and will continue to operate smoothly going forward.
The Engineering team is auditing the resource allocation policies for all other internal applications to ensure that they will not run into similar issues when scaling.