The Data Problem: Who Owns the Fuel That Runs the AI Revolution — and Did Anyone Actually Ask?

by - DK on - 2:18 PM

Inside The Machine

Authored by Neal Lloyd · Daily AI Series

Inside The Machine

← All Episodes

Day 10

Data · Power · Privacy

The Data Problem:
Who Owns the Fuel That Runs the AI Revolution — and Did Anyone Actually Ask?

Training data, consent, power concentration, and the privacy reckoning nobody planned for. The battle over data is the battle over AI’s future.

Neal Lloyd

Author · Inside The Machine · May 2026

10 min read

“Data is the new oil.” It is one of those metaphors that contains just enough truth to be useful and just enough distortion to be dangerous. Oil is finite. Oil pollutes. Data grows when used. Data can be in multiple places simultaneously. Data, in the right hands, compounds in value in ways oil cannot. The metaphor is catchy. The reality is considerably stranger and more consequential. The battle over data is the battle over AI’s future — and it is being fought largely without the people whose data is at stake.

What AI Actually Runs On

The Fuel Nobody Talks About Honestly

Modern AI systems have been trained on hundreds of billions of words of text, billions of images, vast repositories of code. The internet, in many ways, is the training set. Every Wikipedia article, every digitised book, every Reddit thread — all of it has contributed to the models we interact with daily. Where did that data come from? Largely from people who had no idea their words, images, and creative work would be used to train systems sold as commercial products by companies worth hundreds of billions of dollars. Their contribution was extracted rather than purchased. This is the foundational economic arrangement of the AI industry.

The human beings who produced the raw material of the AI revolution were, in the overwhelming majority of cases, not consulted, not compensated, and not informed. Their contribution was extracted rather than purchased. This is the foundational economic arrangement of the AI industry.

Neal Lloyd · Inside The Machine, Day 10

The Consent Problem

Did Anyone Actually Ask?

Platform terms of service were written before large-scale AI training existed as a concept. Whether accepting those terms constitutes consent to AI training requires stretching language written for one purpose to cover a categorically different use. Even if scraping public data is technically legal — is it right? When someone writes a personal essay, posts it on a platform, and that essay becomes part of the training data that teaches an AI to simulate emotional depth — did they consent to that use?

⚡ The Terms of Service Gap

Most platform terms of service were written before large-scale AI training existed. Whether accepting these terms constitutes consent requires stretching language written for one purpose to cover a categorically different use. Whether courts accept this will define the legal landscape for decades.

Who Controls the Data Controls the Future

The Power Asymmetry Nobody Wants to Name

The organisations that control the largest, highest-quality datasets have a structural advantage that compounds over time. Training data is not easily replicated. The competitive moat in AI is not primarily algorithmic — algorithms can be replicated. The moat is data. And the organisations that control the most comprehensive datasets will have disproportionate influence over what AI systems know, what perspectives they reflect, and what biases they embed — for a very long time. The battle over data is not over. The precedents being set now will determine who benefits from AI and who provides the raw material without sharing in it.

— Neal Lloyd
Inside The Machine, Day 10 · May 2026

← PreviousDoes AI Feel Anything?Next →The Trust Question

About The Author

Neal Lloyd

Author · Series Creator

Authored by Neal Lloyd

Neal Lloyd writes about technology, human adaptation, and the uncomfortable questions nobody wants to answer at dinner. Inside The Machine is his ongoing daily series on AI.

By The Numbers

∞

Amount of internet text used to train AI without explicit consent of the people who wrote it.

Compensation received by the overwhelming majority of people whose work forms AI training data.

Yrs

How long the legal battles over training data will take to resolve. Technology moves considerably faster.

The Series