Neoclouds and the Future of GPU-As-A-Service

Share the Post:
Neocloud GPU-as-a-Service

The emergence of the neocloud ecosystem represents the most significant structural shift in digital infrastructure since the commercialization of public cloud computing in the mid-2000s. As technology shifts from general-purpose central processing unit (CPU) computing to specialized, accelerated graphics processing unit (GPU) computing, the traditional “Big Three” hyperscalers (Amazon Web Services, Microsoft Azure, and Google Cloud) face challenges they were not originally built to handle.

Neoclouds, a new generation of cloud service providers (CSPs) designed from the physical layer up for artificial intelligence (AI) and high-performance computing (HPC), have emerged to fill a critical gap in global compute supply.

Strategic Importance

Understanding this sector requires knowledge of the current “compute-industrial” complex. GPU-As-A-Service (GPUaaS) has grown from an experimental niche for data scientists into a key utility for modern enterprises. ABI Research forecasts that neocloud providers will generate over $65 billion in GPUaaS revenue by 2030, nearly three times the $24 billion estimated for 2024.

Broader market assessments, which consider integrated software platforms and sovereign deployments, suggest that the total neocloud opportunity could reach $250 billion annually by the end of the decade.

Executive Summary

Key Drivers of Neocloud Adoption

The transition to neocloud infrastructure is driven by three main factors:

  1. The technical obsolescence of legacy data centers for AI workloads.
  2. A global supply-demand imbalance for high-end accelerators.
  3. The rising importance of digital sovereignty.

Traditional cloud architectures, designed for multi-tenancy and heterogeneous web traffic, often create bottlenecks such as virtualization overhead and non-optimized networking. These factors degrade the performance of large-scale foundation model training.

In contrast, neoclouds use “greenfield” environments with dense rack designs, direct-to-chip liquid cooling, and non-blocking InfiniBand or AI-optimized Ethernet fabrics. These features allow near-linear scaling of GPU clusters.

Market Momentum

Market momentum is currently strong. Synergy Research Group reports that neocloud revenues surpassed $5 billion in Q2 2025, growing 205 percent from the previous year. Growth relies on multi-billion-dollar “anchor tenant” agreements from hyperscalers and AI labs.

For example:

  • Nebius signed a contract with Microsoft valued between $17.4 billion and $19.4 billion.
  • CoreWeave formed high-value partnerships with Meta, OpenAI, and IBM.

These deals show a paradoxical “coopetition,” where hyperscalers depend on neoclouds to handle the capital-intensive and physically demanding build-out of AI-ready data centers.

Economic Normalization

The sector is entering a phase of economic normalization. The scarcity of 2023–2024, when H100 rental rates reached $8.00 per hour, has given way to a supply-rich environment in 2025. Current rates range between $2.85 and $3.50 per hour, forcing providers to compete on software rather than hardware access.

Additionally, structural constraints have shifted from chip availability to power availability. Grid connection delays of up to 84 months (seven years) in key markets like Germany and the United States now represent the main hurdle to expansion.

Future Outlook

By 2030, the neocloud market will likely split into full-stack AI platforms and regional sovereign compute hubs. The industry is moving from “hardware reselling” to providing integrated AI factories, including proprietary orchestration, carbon-aware scheduling, and vertical-specific MLOps.

For enterprises and investors, the neocloud is more than a backup option for GPU capacity; it is the blueprint for AI-native infrastructure in the 21st century.

Historical Context: From Hyperscale Clouds to Neoclouds

Cloud computing history can be divided into two eras:

  • Elastic CPU Era (2006-2022)
  • Accelerated GPU Era (2023-present)

Understanding the rise of neoclouds requires analyzing the technical and economic limits of the first era.

The Legacy of General-Purpose Cloud

AWS EC2 launched in 2006, introducing the cloud built on horizontal scalability for general-purpose compute. Data centers housed standard 19-inch racks with power densities of 5 kW to 15 kW per rack. They prioritized uptime for heterogeneous applications such as web servers, databases, and enterprise software. Virtualization technologies like Hyper-V and KVM divided physical CPUs into multiple virtual machines (VMs), enabling providers to maximize hardware utilization across thousands of tenants.

While this model supported the SaaS revolution, it could not meet the synchronized, high-bandwidth needs of deep learning. Training a large language model (LLM) requires thousands of GPUs working as a single computer. This demands a “shared-nothing” architecture and extremely low-latency interconnects (RDMA), which virtualization layers typically disrupt.

The 2023 Scarcity Crisis and the “Crypto Pivot”

The launch of ChatGPT in late 2022 acted as a Black Swan event for the cloud industry. Suddenly, enterprises and AI labs needed thousands of NVIDIA GPUs at the same time. Hyperscalers could not upgrade their legacy power and cooling infrastructure quickly enough, creating extreme scarcity. CIOs were informed that GPU instances would not be available for 12 months or more.

During this vacuum, the first neoclouds emerged from two main sources.

  • Former crypto miners like CoreWeave already had high-density power infrastructure.
  • Specialist workstation providers like Lambda Labs scaled their developer-focused hardware approach into a cloud model.

These providers were “AI-native” because they were free from legacy CPU-centric services and could dedicate all engineering resources to optimizing the GPU stack.

The Evolution of Cloud Computing Eras

  • Cloud 1.0 (2006-2020): Focus on Web, SaaS, and ERP. CPU-centric, air-cooled hardware. Key providers were the “Big Three” hyperscalers.
  • Cloud 2.0 (2021-2023): Focus on early machine learning and data lakes. Hardware was a hybrid of CPU and GPU. Providers included hyperscalers and specialized cloud providers.
  • Neocloud Era (2024-2030): Focus on foundation models and generative AI. GPU-first, liquid-cooled hardware. Providers are neoclouds and “AI factories.”

What Is a Neocloud? Core Characteristics and Service Models

A neocloud is more than a GPU rental provider. It is a specialized utility designed to maximize FLOPS-per-watt and FLOPS-per-dollar for AI training and inference. Unlike traditional cloud providers, which aim to serve all users, neoclouds differentiate themselves through architectural specialization.

Physical Specialization: The Thermodynamics of AI

Neocloud data centers are built to handle heat loads that would overwhelm traditional facilities. Modern AI racks housing NVIDIA Blackwell chips consume 100 kW to 120 kW of power, producing thermal density that air cooling cannot manage effectively. This difference highlights a simple truth: water cools far more efficiently than air.

Heat Capacity Comparison: Air carries 3,300 times less heat per unit volume than water. To cool a 100 kW rack with air, a facility would require 10,000 cubic feet per minute (CFM) of airflow. This setup generates 95 decibels of noise and consumes 25 kW just to power the fans.

Liquid Cooling Efficiency: Direct-to-chip liquid cooling or immersion cooling operates with 10 to 15 °F differentials, compared to 30 to 40 °F for air. Reducing operating temperatures by 10 °C can double component lifespan.

Infrastructure Native Design: Neoclouds like CoreWeave and Crusoe build “liquid-cooling native” campuses from the start. This avoids the $2–3 million per megawatt cost of retrofitting air-cooled halls.

Networking Fabrics: InfiniBand vs. AI-Optimized Ethernet

Model training is network-bound. Neoclouds typically avoid the TCP/IP stacks used by hyperscalers, which consume CPU cycles and create jitter. Instead, they rely on high-performance networking:

InfiniBand: This switched fabric design provides predictable point-to-point connections. It uses Remote Direct Memory Access (RDMA) to let computers access each other’s memory directly. This bypasses the OS and CPU, cutting response times significantly.

Lossless Transmission: Unlike standard Ethernet, which retransmits dropped packets, InfiniBand uses credit-based flow control to prevent congestion before it occurs. This ensures reliable data delivery for trillion-parameter training jobs.

RDMA over Converged Ethernet (RoCE v2): Some neoclouds use AI-optimized Ethernet fabrics, such as DriveNets, as an alternative to InfiniBand. These networks offer up to 800 Gbps bandwidth while maintaining open-standard flexibility.

The Four Archetypes of Neocloud Providers

ABI Research identifies four functional models within the neocloud ecosystem:

  1. Full-Stack AI-First Platforms: Examples include CoreWeave and Nebius. They operate massive footprints, often exceeding 500 MW. They manage their own software orchestration (e.g., CoreWeave Mission Control) and sign multi-billion-dollar contracts with leading AI labs.
  2. GPUaaS Opportunists: Providers like RunPod and Vast.ai act as marketplaces. They aggregate capacity from multiple sources and offer the lowest possible prices for non-critical fine-tuning or development tasks.
  3. Domain-Specific Infrastructure: These neoclouds optimize hardware for particular verticals. For example, healthcare-focused providers offer HIPAA-compliant H100 clusters integrated with medical imaging data lakes to accelerate drug discovery.
  4. Decentralized Compute Marketplaces: These platforms, using software layers like Lightning AI, unify access across multiple clouds. Developers can move jobs from a hyperscaler to a neocloud with one click, avoiding vendor lock-in.

The Economics of GPUaaS: Pricing, Supply, and Demand

The economic landscape of GPUaaS in 2025 is defined by a rapid transition from hyper-inflation to a supply-rich environment where operational efficiency determines survival.   

The 2025 Market Correction and Unit Economics

The H100 rental market experienced a 64 percent decline from its late 2024 peaks. While prices were once as high as $9.00 per hour, they have settled into a “normalized” range. This correction has introduced a strict “profitability threshold” for infrastructure investors.

  • Break-even Crisis: Analysis suggests that if H100 rental prices fall below $1.65 per hour, revenues no longer recoup the initial capital investment over a standard five-year depreciation cycle.   
  • Utilization Sensitivity: To outperform alternative investments (like the stock market), prices must exceed $2.85 per hour. If utilization slips below 80 percent, most neocloud gross profit margins (which typically range from 14 to 16 percent after labor and depreciation) will flatline.   
  • Secondary Market value: As newer architectures (like Blackwell B200) offer 4 to 5 times faster inference than H100s, older GPUs are being redeployed for smaller fine-tuning tasks or academic research where cost is more critical than absolute speed.   

Total Cost of Ownership (TCO) Scenarios

The “Rent vs. Own” framework has become the primary decision-making tool for AI organizations. For a mid-sized workload (e.g., 200 TB of storage and 200 vCPUs), the five-year TCO comparison reveals the economic advantage of specialized infrastructure.   

  • On-Premises Breakeven: For high-throughput AI workloads, the breakeven point (where the cumulative cost of cloud matches on-prem investment) often occurs within 5 to 9 months for on-demand instances.   
  • Hyperscaler Premium: Renting an NVIDIA H100 via a major public cloud can cost approximately $98 per hour, while specialized neoclouds offer the same compute for around $34 per hour (a 66 percent saving).
  • Hidden Costs: Cloud storage and data egress (moving data out of the provider network) are often underestimated. For a 20 TB per month transfer, egress fees can add $19,600 to an annual bill, a cost neoclouds often minimize or eliminate to attract developers.   

GPUaaS Pricing and Profitability Benchmarks (December 2025)

  • NVIDIA H100 Rental: Prices peaked at $8.00 to $9.50 per hour during the scarcity period, falling to $2.85 to $3.50 per hour in the normalized market. The break-even profit floor is approximately $1.65 per hour.   
  • NVIDIA A100 Rental: Prices fell from $4.00 to $5.50 per hour during scarcity to between $0.78 and $1.80 per hour. The break-even profit floor is roughly $0.60 per hour.   
  • NVIDIA B200 Rental: Anticipated normalized market rates are $5.50 to $7.00 per hour, with an anticipated break-even profit floor of $3.20 per hour.   
  • Average Utilization Requirement: While 50 to 60 percent utilization was sufficient during the scarcity period, the current market requires 85 to 90 percent utilization for financial viability.   

Real-World Deals and Market Momentum

The scale of neocloud infrastructure agreements now rivals major industrial projects, reflecting a shift where compute is treated as a strategic national asset.

The $19 Billion Nebius-Microsoft Partnership

In September 2025, Nebius announced a five-year supply agreement with Microsoft valued at up to $19.4 billion. Microsoft secured dedicated access to Nebius’s GPU clusters based in Vineland, New Jersey. This deal is back-end loaded, with the majority of capacity and revenue set to materialize in 2026. For Microsoft, the deal allows it to expand AI capacity faster than its own construction teams can build, effectively “outsourcing” the physical infrastructure layer to a specialized partner.

CoreWeave and IBM: The Granite Supercomputer

CoreWeave partnered with IBM to deliver one of the first NVIDIA GB200 Grace Blackwell supercomputers at scale. This system, interconnected with Quantum-2 InfiniBand, is used to train IBM’s Granite models. Interestingly, the deal is a “bi-directional” partnership: IBM provides its Storage Scale System to CoreWeave, while CoreWeave provides the raw GPU compute. This collaboration allows IBM to achieve 80 percent higher speed for enterprise tasks (like Retrieval-Augmented Generation) compared to traditional environments.   

Major Infrastructure and Capacity Agreements (2025 to 2026)

  • Nebius and Microsoft: A $17.4 billion to $19.4 billion agreement for Blackwell B200 and GB200 infrastructure in the USA and Europe.
  • CoreWeave and Meta: An approximately $14 billion contract for GB300 superclusters, providing Meta with massive inference capacity.   
  • Nebius and Meta: A $3 billion contract for H200 clusters across European data centers, intended to handle live traffic for the Llama 3 models.
  • Nscale and OpenAI (Stargate Norway): A $1 billion joint venture aiming to deploy 100,000 GPUs by 2026, powered entirely by renewable energy in Northern Norway.
  • Boost Run: A public company (via de-SPAC) projecting 250 percent revenue growth, focused on bare metal and government sectors in the United States.

Competitive Dynamics: How Neoclouds Fit With and Against Hyperscalers

The competitive landscape is no longer a simple rivalry; it has become a symbiotic ecosystem of “high-performance niches” and “general-purpose platforms”.   

The Complementary Roles: “Coopetition”

Enterprises are increasingly adopting multi-cloud strategies where hyperscalers handle general workloads (ERP, CRM, databases) while neoclouds handle the AI “heavy lifting”. Even hyperscalers occasionally rent capacity from neoclouds to manage their own supply constraints, as seen in the Microsoft-Nebius and Google-Cipher agreements. This allows the Big Three to offload the capital risk of building specialized AI factories while still profiting from the downstream software services.   

The Neocloud Edge: Agility and Specialization

Neoclouds maintain a competitive advantage in “speed to market”. A specialized provider can bring new hardware online three to six months faster than a hyperscaler, which must vet every component for global consistency across hundreds of legacy sites. Furthermore, neoclouds can offer “bare-metal” access without the 5 to 10 percent virtualization performance tax common in general-purpose clouds.   

Multi-Cloud GPU Marketplaces

A new category of software orchestration is resolving the friction of moving between providers. Platforms like Lightning AI offer a unified interface where developers can:

  • Unified Access: Launch jobs on any cloud (AWS, Lambda, Nebius) without rewriting code or re-orchestrating the DevOps stack.
  • Cloud-Agnostic Storage: Use “Lightning Storage” to store data that moves across clouds without incurring the massive ingress and egress fees that usually lock customers into one provider.
  • Dynamic Optimization: Automatically match a specific workload (e.g., fine-tuning vs. inference) to the cheapest or most available GPU in the market.

Structural Hurdles and Operational Challenges

Despite massive revenue growth, the neocloud sector faces physical and regulatory constraints that threaten the pace of expansion.   

The Power Grid Bottleneck and “Speed to Power”

The primary constraint on AI expansion has shifted from chip availability to grid availability. AI data centers require 5 to 10 times more power per square foot than traditional facilities. In high-demand regions, utility companies are unable to provide the necessary gigawatts on a timeline that matches AI demand.   

  • Grid Delays: In Germany and parts of the Eastern United States, grid connection delays for new data centers have reached up to 84 months (seven years).   
  • Decoupling from the Grid: To bypass these delays, neoclouds like Crusoe Cloud use “stranded power” strategies, siting data centers directly at renewable energy sources (hydropower, wind) or flared natural gas sites.   
  • The 2026 Energy Crisis: By 2026, electricity availability is expected to be the absolute biggest limiter for new internet infrastructure, with data center energy demand projected to triple by 2030 in Europe.   

Capital Intensity and the Talent Gap

Standing up a large GPU fleet requires massive upfront investment without the diversified revenue streams (like adtech or SaaS) that hyperscalers enjoy. Neoclouds are often highly leveraged, using their GPU inventory as collateral for billions in debt. Furthermore, there is a chronic shortage of skilled professionals who can manage the extreme complexity of liquid-cooled, InfiniBand-connected clusters. Providers that cannot hire these experts risk high “job failure rates” and customer churn.   

Geographical and Regulatory Constraints

Government regulations regarding data residency and national security are creating “legal borders” for the internet. In Europe, the EU AI Act and GDPR are forcing providers to offer “air-gapped” clouds that are immune to extraterritorial legislation. Neoclouds are better positioned than US-based hyperscalers to meet these “sovereign” requirements because they can build localized, independent infrastructure from scratch.

The Future of GPUaaS

The neocloud industry in 2030 will transition from a hardware-centric market into a mature utility and software layer.   

The Inference Shift and Edge Neoclouds

While the first half of the decade focused on training massive models, inference (running the models in production) will account for 80 percent of neocloud revenue by 2030. Inference workloads are more price-sensitive and require geographic distribution to reduce latency. This will drive the rise of “Edge Neoclouds”—smaller, high-performance nodes located within 20 kilometers of end-users in cities like Milan, Warsaw, and Berlin.   

Hardware Diversification: Beyond NVIDIA

To mitigate the risks of overreliance on a single vendor, neoclouds are beginning to adopt “challenger silicon”.   

  • AMD Instinct: Adoption of the MI300 and MI350 series is growing, with firms like Meta and Microsoft deploying them for high-volume inference due to their total cost of ownership (TCO) advantages.   
  • Custom Silicon: By 2030, custom processors designed by cloud providers themselves (like Amazon’s Trainium) are expected to capture 15 percent of the AI semiconductor market, forcing neoclouds to decide whether to host third-party chips or develop their own.   

Regional Sovereign AI Hubs

Governments are increasingly funding national AI “gigafactories” to ensure domestic technological control.

  • Stargate Norway: A joint venture between Nscale and OpenAI to deploy 100,000 GPUs by 2026, leveraging low-cost hydropower to create the most sustainable AI factory in the world.
  • Sovereign Compliance: Regional providers like Scaleway (France) and Core42 (UAE) are winning public sector contracts by guaranteeing that data is never exposed to foreign jurisdictions, a requirement that is moving from a “marketing slogan” to a strict “contractual term”.

Conclusion

The rise of neoclouds represents a permanent re-architecting of the global compute landscape. They have solved the physical and economic constraints that traditional clouds were not designed to handle: extreme power density, ultra-low latency requirements, and the need for localized sovereign control. By 2030, the neocloud ecosystem will be a systemically critical industry, generating hundreds of billions in revenue and serving as the primary infrastructure for the world’s AI initiatives.

Ultimately, the long-term success of these providers depends on their ability to move “up the stack” into higher-margin software and orchestration services. Those who remain mere hardware resellers risk being squeezed by falling GPU rental prices and rising power costs. However, for the enterprises and developers who consume these services, the neocloud offers a future of precision, proximity, and performance that the general-purpose cloud can no longer match.

Related Posts

Please select listing to show.
Scroll to Top