In late July 2025, a demand spike in Azure's East US region exhausted available compute. Customers trying to create or update VMs hit AllocationFailed errors for over a week. Microsoft confirmed that General Purpose VM pools became "highly constrained," pushing hardware beyond safe operating thresholds. The same thing happened in UK South by early 2026, where Microsoft temporarily stopped accepting new VM deployments for GPU and AMD SKUs. Microsoft's own CFO flagged capacity constraints as a revenue limiter on multiple earnings calls. An internal forecast, later reported by Bloomberg, showed demand outstripping supply in key US regions until at least mid-2026.
Most strategists respond to this by telling you to "use multiple regions" and "consider capacity reservations." That advice isn't wrong. It's just incomplete. The decisions that matter are harder than that.
We work with IT decision-makers across hundreds of Microsoft environments every month. Here is what we're actually seeing, and what we'd tell your team.
The two errors and what they actually mean
When Azure can't provide a VM, it returns one of two errors. Most people treat them the same. They aren't.
AllocationFailed means Azure's orchestration layer tried to find available hardware in the region and came up empty. The specific cluster your subscription is associated with in that region has no free compute matching your request.
ZonalAllocationFailed means the specific availability zone you targeted is constrained. This one has a useful workaround: if your workload can tolerate zone-flexible deployment, removing the zone pin from your ARM template or Bicep file lets Azure's scheduler search the entire region rather than just one zone. We've resolved genuine-looking capacity blocks by making this one change.
The distinction matters because the right response is different. AllocationFailed in a genuinely saturated region often can't be resolved without moving to a different region or VM size. ZonalAllocationFailed frequently can be resolved without moving anywhere.

Quota vs. capacity: the confusion that wastes weeks
This is the single biggest source of support escalations we see in constrained environments. Organizations get a quota error, open a ticket requesting a quota increase, Microsoft approves it, and the error persists. Then the support ticket goes in circles for two weeks.
Quota and capacity are two separate systems.
Quota is a policy ceiling: the maximum number of vCPUs, or a specific VM type, that your subscription is allowed to have. Microsoft manages it at the subscription policy layer. You can request increases through the Azure portal, and approvals usually come within a day or two.
Capacity is a physical reality: whether there is actual hardware available to provision against your quota. If a region is constrained, raising your quota does nothing. You now have permission to provision a VM that doesn't exist yet.
When you hit an AllocationFailed error, read the error details carefully. If the message says "We do not have sufficient capacity for the requested VM size in this region," that's a capacity problem, not a quota problem. A quota increase ticket will not help. The right moves are: try a different VM size, try a different region, or use a capacity reservation if you need that specific SKU locked in.
The deallocate-and-retry trap
One of the more dangerous bits of folk wisdom on Azure support forums is "just deallocate your VM and try to reprovision it." The theory is that deallocation releases your current compute slot, giving the scheduler a chance to find a new one. Sometimes this works. When a cluster is lightly constrained and just needs a scheduler retry, deallocation can land you on a less-loaded compute node.
When a region is genuinely capacity-constrained, this can leave you in a much worse position. You've successfully deallocated a running VM, but you can't provision a replacement. Your workload is down and you're no longer holding the hardware you had.
We saw this exact scenario during the East US crunch. A client deallocated a VM running a dev SQL environment to "free up a slot" during a planned resize. It took four days to get a replacement provisioned in an adjacent region. The lesson: don't deallocate anything in a constrained region without first provisioning an alternative or establishing a capacity reservation.
Why this is happening: the less-discussed factors
The surface-level explanation is straightforward: AI workloads exploded and Azure didn't build fast enough. True. But there are a few factors that don't get much coverage.
Microsoft's own AI services compete with your workloads. Azure Copilot, OpenAI's API infrastructure, and Microsoft's internal AI tooling run in the same physical data centers as your VMs. When East US filled up in July 2025, part of that was Microsoft's own AI capacity expansion. Every GPU-equipped server earmarked for Azure OpenAI is unavailable for your NC-series VM. Microsoft doesn't advertise this dynamic, but the physical infrastructure doesn't distinguish between a customer VM and a first-party AI service.
Azure's regional architecture creates uneven pressure. Azure operates over 70 global regions, more geographic coverage than any other cloud provider. The trade-off is that many individual regions are relatively small. When demand spikes in a popular geography like Northern Virginia, UK South, or West Europe, those regions can saturate faster than a comparably-sized AWS zone because there's less physical capacity per location. The wide regional footprint that makes Azure attractive for data sovereignty is also why certain regions hit the ceiling faster.
Enterprise Agreement customers and large-spend subscriptions get different treatment during allocation crunches than pay-as-you-go accounts. Microsoft doesn't explicitly publish this, but the behavior is well documented in support forums and acknowledged in service communications. If your organization is on a PAYG subscription or a small CSP arrangement, you're lower priority when hardware is rationed. One practical implication: consolidating onto an EA or working through a direct-bill CSP with larger committed spend can materially change your experience during a crunch. We've seen this firsthand with clients who escalated through our partner channel and got faster resolution than going direct on PAYG.

The migration timing problem
Here's a scenario playing out right now across hundreds of organizations. Windows Server 2016 hits end of support in January 2027. SQL Server 2016 hits end of support in July 2026. Many teams are planning to migrate those workloads to Azure as the deadline-driven exit. That plan makes sense.
The problem is they're migrating to Azure during a period of constrained capacity, using some of the most in-demand VM SKUs in some of the most congested regions. If you're planning a lift-and-shift of on-prem workloads to Azure VMs to take advantage of free Extended Security Updates, you need to account for the real possibility that the region and VM SKU you want may not be available on your timeline. Our SQL Server resource hub covers the EOL planning side of this in more detail, but the practical point is simple: provision your Azure VMs before you finalize your migration timeline, not after. Treat Azure provisioning as the long lead-time item, not the migration itself.
For organizations also managing Windows Server EOL alongside SQL Server, our comparison of Windows Server 2019, 2022, and 2025 covers the OS decision. Both problems converge on the same Azure capacity pool.
What to actually do: a practical hierarchy
1. Identify your real exposure first. Not every Azure deployment carries the same risk. Capacity constraints are most acute for GPU and HPC VM families (NC, ND, NV series), certain newer general-purpose families in high-demand regions, and any deployment pinned to a single availability zone in a congested area. If your workloads use common Dv3, Ev3, or Bv2 series VMs in less-congested regions, your exposure is low. Start with an audit of which VMs fall into constrained SKU families and constrained regions. Azure Resource Health and the Service Issues blade in the portal surface known regional constraints without requiring a support ticket.
2. Use az vm list-skus before you need it. Most teams only discover a SKU is unavailable when they hit the error. The Azure CLI command az vm list-skus --location eastus --resource-type virtualMachines --output table returns availability status by SKU and zone before you attempt deployment. Build this check into your deployment pipelines. You want to fail fast and intelligently during planning, not fail mid-deployment at 11pm on a migration night.
3. Rethink capacity reservations for constrained SKUs. The standard recommendation is to avoid capacity reservations because they bill at on-demand rates whether you use the compute or not. That logic holds when Azure supply is unconstrained. In a constrained region, a capacity reservation becomes cheap insurance against a provisioning failure at the worst possible moment. Run the numbers for your critical VM SKUs. If you genuinely can't afford a provisioning failure, a capacity reservation at on-demand pricing with guaranteed availability may be the correct trade-off. The calculus changes when the alternative is a failed migration window, not just a slightly higher monthly bill.
4. Build SKU substitution maps before incidents, not during them. When a desired VM size is unavailable, most people wait or escalate a support ticket. The better approach is to map equivalent SKUs in advance and test them before you need them. Standard_D8s_v5 unavailable in East US? Standard_D8s_v4 or Standard_D8as_v5 (AMD-based) typically have more headroom and comparable performance for general-purpose workloads. We build these substitution maps for clients before deployments. Having this list means a constrained region is a 30-minute reroute, not a multi-day incident.
5. Consider IaaS-to-PaaS migration for exposed workloads. Raw VM deployments are most exposed to capacity constraints because you're requesting specific hardware SKUs in specific locations. PaaS services like Azure Kubernetes Service, Azure App Service, and Azure Batch abstract the underlying infrastructure and let Microsoft's scheduler more flexibly spread workloads. Batch processing jobs, stateless application tiers, and containerized workloads are often good candidates. The capacity constraint problem largely disappears when you stop pinning to a specific VM SKU. Our cloud modernization guide covers when IaaS-to-PaaS moves make sense beyond just the capacity angle.
6. Build real multi-region redundancy, not just documented failover. Most DR plans list a secondary region. Fewer have actually tested provisioning in that region recently. During the UK South crunch, customers who failed over to UK West found it was also constrained. Microsoft now recommends pairing UK South with Sweden Central or Norway East rather than another UK region. Your failover region needs to be tested, not assumed. Our multi-cloud strategy guide covers architectural decisions that reduce single-region dependency.

Where things stand
Azure is still the right foundation for most Microsoft-centric organizations. The capacity issues are real but geographically specific and being actively addressed. Microsoft is spending over $80 billion in 2025 on data center expansion, struck a five-year capacity deal with Nebius's New Jersey facility, and has an East US 3 region in the pipeline for 2027.
But "being fixed" is not the same as "not your problem now." If you're planning a deployment, migration, or scale-out in the next 12 to 18 months, capacity belongs in your risk register, as it probably wasn't two years ago. The teams that handle this well aren't the ones with the cleanest architecture diagrams. They're the ones who ran their provisioning scripts before the migration date and didn't learn what ZonalAllocationFailed means from a production error at 2am.
If your Azure spend feels disconnected from what you're actually getting, the Azure cost optimization guide covers the governance framework we use with clients. Capacity planning and cost governance are the same conversation once you dig into them.
For the broader context of managing Microsoft product lifecycles, including what happens when on-prem EOL deadlines push workloads toward a constrained cloud, our guide to software end of life and support covers the playbook. And if you want a direct conversation about your specific environment and exposure, our cloud architects handle this every day.



