Breaking

Neo Clouds

Feature

From Cloud Storage to Data Gravity Wars 2.0

AI-scale datasets introduce new constraints that traditional compute-centric models cannot absorb efficiently. Large training corpora, continuously updated embeddings, and high-frequency

Kiara Mandavia
27 March 2026
5 min read
Neo Clouds
World

AI-scale datasets introduce new constraints that traditional compute-centric models cannot absorb efficiently. Large training corpora, continuously updated embeddings, and high-frequency logging systems now dominate storage footprints, creating persistent data gravity that resists movement across regions. Cloud providers once optimized for elastic compute expansion, yet they now encounter scenarios where large-scale data transfer costs and latency considerations can make relocating data less efficient than provisioning additional compute resources closer to existing datasets. Egress pricing models, originally designed for moderate data transfer, can become a significant cost factor at scale, particularly in data-intensive AI workflows that involve frequent large-volume data movement. This imbalance forces enterprises to reconsider placement strategies, prioritizing proximity to data over access to generalized compute availability. As a result, storage locality has become an increasingly important factor in determining where workloads execute, particularly for data-intensive applications.

The repricing of data gravity also reflects a deeper structural imbalance between storage persistence and compute elasticity. AI models require repeated access to large datasets during training, fine-tuning, and inference cycles, which amplifies the cost of even marginal data movement.Cloud vendors have begun adjusting pricing tiers, although pricing structures and optimization strategies continue to evolve alongside emerging AI workload requirements. Enterprises therefore internalize the cost burden, leading to hybrid strategies that anchor datasets in specific regions while distributing compute selectively. This shift creates localized “cost basins” where data remains economically viable, discouraging relocation despite changing operational requirements. Moreover, the financial implications extend beyond direct transfer fees, encompassing latency penalties and reduced throughput efficiency. The outcome is a recalibration of cloud economics where storage plays a more central role alongside compute and networking considerations in infrastructure decision-making.

The Rise of Data-Anchored Cloud Zones

Cloud regions have traditionally emerged based on network access, population density, and enterprise demand, while data gravity and large-scale datasets are becoming additional factors influencing infrastructure placement. Data-anchored cloud zones form around concentrated datasets such as genomics repositories, financial transaction archives, and proprietary AI training libraries. These zones attract compute infrastructure not because of user proximity but due to the immobility of high-value data assets. Hyperscalers increasingly design regions with pre-integrated storage clusters that host critical datasets, allowing compute resources to operate within minimal latency boundaries. This approach reduces data transfer overhead while enabling faster iteration cycles for AI model development. Consequently, cloud expansion strategies now reflect data concentration patterns rather than purely market-driven considerations.

The emergence of data-centric infrastructure strategies introduces a new competitive dynamic between hyperscalers and neo-cloud providers. Specialized cloud operators increasingly design infrastructure tailored to specific industry requirements, including environments optimized for large and sensitive datasets. This specialization allows them to compete without replicating the full scale of global hyperscale infrastructure. Hyperscalers, in response, pursue partnerships and acquisitions to secure access to proprietary datasets, reinforcing their regional ecosystems. Additionally, enterprises increasingly co-locate private data lakes within these zones to leverage shared infrastructure advantages. The result is a more diverse and specialized cloud landscape where different regions and providers optimize for specific workload and data requirements. However, this fragmentation also complicates workload portability, requiring new orchestration strategies that account for data immobility.

The cost of moving intelligence across cloud environments extends beyond bandwidth pricing and enters the domain of systemic performance constraints. AI workflows depend on continuous data exchange between storage systems and compute clusters, making network efficiency a critical determinant of overall performance. Cross-region data transfer introduces latency that directly affects training times, model convergence, and inference responsiveness. Cloud providers have historically scaled backbone networks to accommodate general-purpose workloads, yet AI-specific demands expose limitations in both throughput and cost efficiency. Interconnect pricing structures, which often include tiered billing models and variable costs, can introduce complexity in managing large-scale data movement. As a result, organizations increasingly account for network capacity, latency, and cost as important factors in designing AI infrastructure.

Infrastructure providers are evolving network architectures to better support high-throughput, low-latency data access required by modern AI workloads.High-speed interconnects within regions are receiving increased investment to support data-intensive workloads, alongside continued enhancements to backbone network capacity. Edge networking technologies also play a role in preprocessing data closer to its source, reducing the volume that must traverse core networks. Furthermore, pricing models evolve to incentivize intra-region data movement while discouraging cross-region transfers, reinforcing localized execution patterns. Enterprises adapt by restructuring workflows to cluster compute tasks around primary datasets, thereby reducing dependency on inter-region communication. Therefore, interconnect economics increasingly shapes how AI systems scale, emphasizing efficiency over raw expansion.

Sovereign Data Meets Distributed AI Workloads

Regulatory frameworks governing data residency introduce additional complexity to the already constrained movement of AI datasets across regions. Governments enforce localization requirements that mandate storage and processing within specific jurisdictions, particularly for sensitive data such as financial records or healthcare information. AI development often benefits from diverse datasets, which may be distributed across multiple geographic regions depending on availability and regulatory constraints.Cloud providers offer region-specific infrastructure configurations to help organizations meet data residency and compliance requirements. These stacks enable model training on region-specific data without violating residency laws, although they may limit access to broader datasets. The result is a segmented AI ecosystem where compliance considerations directly influence architectural decisions.

This regulatory landscape encourages the development of federated learning approaches and distributed training methodologies that reduce the need for centralized data aggregation. Models can train across multiple regions using localized datasets, sharing learned parameters instead of raw data. Such techniques align with sovereignty requirements while preserving the ability to build globally relevant AI systems. Nevertheless, these approaches can introduce additional complexity in areas such as synchronization, model consistency, and performance optimization. Cloud providers invest in orchestration tools that manage distributed training workflows across sovereign boundaries, ensuring compliance without sacrificing efficiency. Meanwhile, enterprises must balance legal constraints with operational objectives, often prioritizing regulatory adherence over performance gains. Consequently, sovereign data policies are becoming an important factor influencing the evolution of AI infrastructure strategies.

Storage Becomes the New Control Plane

The increasing centrality of data in AI workflows has transformed storage systems from passive repositories into active orchestration layers. Object storage platforms, data lakes, and vector databases increasingly influence how workloads are scheduled and executed, particularly in data-intensive environments. These systems maintain metadata, access patterns, and dependency relationships that guide compute allocation decisions. As AI pipelines grow more complex, orchestration logic shifts closer to the data layer, enabling more efficient resource utilization. Storage platforms integrate with scheduling frameworks to ensure that compute resources operate in proximity to required datasets. Consequently, control over data placement can significantly influence workload execution and performance outcomes.

This transition redefines the traditional hierarchy of cloud architecture, where compute once dictated system behavior. Data-centric orchestration approaches enable more dynamic workload placement by incorporating data locality and access patterns into scheduling decisions. Vector databases play an important role in many AI applications that rely on embedding retrieval and similarity search. These systems require tight integration with compute resources to maintain performance, reinforcing the importance of co-location. Additionally, storage-driven control planes enable more granular optimization of resource allocation, reducing inefficiencies associated with generic scheduling models. However, this shift also demands new operational expertise, as managing data infrastructure becomes as critical as managing compute clusters.

The evolution of cloud infrastructure reflects a fundamental shift from fluid compute allocation to fixed data-centric architectures that anchor workloads in place. AI datasets introduce persistent gravitational forces that resist movement, compelling compute resources to align with data rather than operate independently. This transformation challenges some traditional assumptions about cloud elasticity by highlighting the growing importance of data locality and movement constraints. Cloud providers are adapting their regions, networks, and orchestration models to better support data-intensive and latency-sensitive workloads. Enterprises, in turn, develop strategies that prioritize data locality as a primary factor in infrastructure planning. The result is a new paradigm where scalability depends not on moving workloads freely but on positioning them effectively within established data gravity wells.

Topics

Kiara Mandavia

Kiara Mandavia is the Content Manager at Compute Forecast, a publication covering the data centre industry. She brings a background in technology and editorial strategy, with a focus on making complex infrastructure trends accessible and meaningful for industry audiences. Her work explores the business, innovation, and sustainability stories shaping how the world builds and scales its digital foundations. At Compute Forecast, Kiara leads feature stories, industry analysis, and thought leadership content that keeps readers ahead of the curve in a rapidly evolving sector.

[simple-author-box]

COMPUTE WEEKLY

The briefing that 40,000+ tech leaders read every Monday. Sharp, fast, essential.

Download Now

Building an AI Startup Without Owning GPUs

Not owning GPUs has become the default, deliberate strategy for building an AI company — not a compromise founders accept reluctantly. H100 rental rates fell 64-75% in fifteen months, a dense ecosystem of neoclouds and inference-as-a-service providers now lets startups skip infrastructure entirely, and credit programs can fund a company’s first year before a founder writes a check

Cerebras Systems

Neo Clouds

The chip that makes Nvidia nervous. Cerebras’ Wafer Scale Engine is rewriting the rules of AI inference at scale.

Faster

0 x

YoY Revenue

0 x

Transistors

0 T

Market Pulse

NVDA

$924.60

-2.11%

MSFT

$421.30

-2.94%

AMZN

$192.80

-4.87%

AMD

$924.60

-2.40%

TSMC

$924.60

-2.32%

Indicative only · Not financial advice

Upcoming Events

SEP

The AI Infrastructure Race (India)

WEBINAR · ONLINE

The AI Infrastructure Race: Won on Power, Land and Trust — Not Capital

MAY

AI Infrastructure Summit

DUBAI · IN PERSON

MEA’s premier AI infrastructure event.

JUN

0 0

Compute Forecast Summit

SINGAPORE · IN PERSON

Our flagship APAC event. Early bird open.

Latest Moves

Live

Ecolab Deepens Cooling Strategy With $4.75B CoolIT Acquisition

Ecolab is making one of its biggest moves yet into AI infrastructure after completing its $4.75 billion acquisition of liquid cooling specialist CoolIT Systems

Pure DC and AVK Deploy Europe’s First 110 MW Data Center Microgrid in Dublin

The Pure DC Dublin microgrid has made history as Europe’s first large-scale on-site data center microgrid, launched in partnership with power solutions provider AVK at Pure DC’s campus in Ireland.

Pace Digitek Partners With MEGMEET to Expand AI Data Center Power Business

India’s AI infrastructure ecosystem continues to mature as domestic technology manufacturers move beyond traditional telecommunications and industrial markets toward high-growth digital infrastructure opportunities

Follow Compute Forecast

11K followers

1200 followers

Companies to Watch

CoreWeave

Neo Cloud · $19B · IPO Watch

Cerebras Systems

AI Hardware · $4.25B · Pre-IPO

G42

G42

Sovereign AI · Abu Dhabi

Humain

Saudi AI · $40B Fund

Latest Podcast

EP . 041

AI Capex, Cloud Margins & the Nuclear Bet

48 MIN · 25 APR 2026

Breaking

Neo Clouds

Feature

From Cloud Storage to Data Gravity Wars 2.0

AI-scale datasets introduce new constraints that traditional compute-centric models cannot absorb efficiently. Large training corpora, continuously updated embeddings, and high-frequency

Kiara Mandavia
27 March 2026
5 min read

847 SHARES

0
SHARES

Topics

[simple-author-box]

COMPUTE WEEKLY

The briefing that 40,000+ tech leaders read every Monday. Sharp, fast, essential.

Free Report

Global AI Infrastructure Outlook 2026

The briefing that 40,000+ tech leaders read every Monday. Sharp, fast, essential.

Download Free

Cerebras Systems

Neo Clouds

The chip that makes Nvidia nervous. Cerebras’ Wafer Scale Engine is rewriting the rules of AI inference at scale.

Faster

0 x

YoY Revenue

0 x

Transistors

0 T

Market Pulse

NVDA

$924.60

+2.4%

MSFT

$421.30

+1.1%

AMZN

$192.80

-0.6%

NVDA

$924.60

+2.4%

NVDA

$924.60

+2.4%

Indicative only · Not financial advice

Upcoming Events

MAY

0 0

DCD Global — London

LONDON · IN PERSON

World’s largest DC event. CF is media partner.

MAY

AI Infrastructure Summit

DUBAI · IN PERSON

MEA’s premier AI infrastructure event.

JUN

0 0

Compute Forecast Summit

SINGAPORE · IN PERSON

Our flagship APAC event. Early bird open.

Latest Moves

Live

Sam Altman

OpenAI appoints new Chief Infrastructure Officer to lead $100B DC programme

27 APR · OPENAI

Sam Altman

OpenAI appoints new Chief Infrastructure Officer to lead $100B DC programme

27 APR · OPENAI

Sam Altman

OpenAI appoints new Chief Infrastructure Officer to lead $100B DC programme

27 APR · OPENAI

Follow Compute Forecast

18.4K followers

12.1K followers

9.3K subscribers

41 episodes

Companies to Watch

CoreWeave

Neo Cloud · $19B · IPO Watch

Cerebras Systems

AI Hardware · $4.25B · Pre-IPO

G42

G42

Sovereign AI · Abu Dhabi

Humain

Saudi AI · $40B Fund

Latest Podcast

EP . 041

AI Capex, Cloud Margins & the Nuclear Bet

48 MIN · 25 APR 2026

From Cloud Storage to Data Gravity Wars 2.0

The Rise of Data-Anchored Cloud Zones

Sovereign Data Meets Distributed AI Workloads

Storage Becomes the New Control Plane

More from AI Infrastructure

COMPUTE WEEKLY

Building an AI Startup Without Owning GPUs

Cerebras Systems

$924.60

$421.30

$192.80

$924.60

$924.60

From Cloud Storage to Data Gravity Wars 2.0

More from AI Infrastructure

COMPUTE WEEKLY

Global AI Infrastructure Outlook 2026

Cerebras Systems

$924.60

$421.30

$192.80

$924.60

$924.60