The Dark Revenue Problem: Publishers’ €10 Billion Blind Spot

Publishers are generating roughly €10 billion in AI training value every year.

They can’t see it.
They can’t track it.
They definitely can’t invoice it.

I’ve said before that amounts to stolen revenue, but that’s only half right.

Rather, it’s dark revenue; value created from their content that never shows up on a statement.

Like dark energy or dark matter in physics, we only know it exists because of what it does to the system around it: AI products that mysteriously talk like seasoned journalists, or produce excerpts verbatim from paywalled content.

I’ve had hundreds of conversations about this with publishers, AI engineers, and media execs. Two in particular stuck with me.

Both were with major financial publications.

We spoke under Chatham House rules, but you’d recognise them instantly.

I asked them both the same question:

“How do you know when your content shows up somewhere it shouldn’t? And what do you do about it when you see it?”

What I wasn’t expecting was both to give almost identical answers.

Someone on the team happens to see their paywalled story on another site.
Or a reporter pastes an AI chatbot answer into Slack and says, “I’m pretty sure this is ours.”
Maybe a subscriber emails them a screenshot.

That’s the “monitoring system” at some of the most sophisticated newsrooms on earth: luck, screenshots, and the kindness of strangers.

No systematic way to discover where their work appears online or inside LLMs.
No internal playbook for what happens next.
No budget to fix it, because you can’t price a problem you can’t measure.

If that’s where they are, imagine the rest of the industry.

This is the dark revenue problem. And it’s where publishers are fighting the wrong war.

They’re trying to block AI, when they should be building toll roads.

Why text isn’t music (and why that matters)

Everyone points at Spotify and says, “Music solved piracy. We’ll just do that.”

Not quite.

If it were that easy, everyone would do it.

Music has a clean attribution unit. When you stream a track, one work gets credited. Anyone can understand who wrote “Yesterday”, and recognise a cover version.

Text in AI is a blender.

Ask a model for an explainer on climate risk in banking and the answer might pull from:

ten of your articles
fifty from competitors
a wire service
some NGO’s PDF
blog posts and forum threads from 2014

Hundreds of fragments. Maybe thousands. One answer.

Who gets paid?
The outlet that broke the story?
The one that explained it best?
Everyone? No one?

This is why “just do what Spotify did” does not transfer directly.

It’s more of an analogy form which to draw inspiration.

What is useful from music is the sequence:

First: build infrastructure that can track usage.
Then: send invoices.

ASCAP didn’t wait for the internet or streaming. They built detection and collection mechanisms when radio was the supposed existential threat. By the time the format changed, the rails were already there.

Publishers are trying to skip the boring step. They want streaming royalties without stream-level infrastructure.

That’s not strategy. That’s a wish.

The incentive architecture problem

The late, great Charlie Munger liked to say: “Show me the incentive and I’ll show you the outcome.”

Right now, AI companies are looking at three options:

Simply scrape whatever is publicly reachable and hope the lawyers can stretch “fair use” far enough.
Attempt hundreds of bespoke licences with individual publishers, each on different terms and tech.
Work with intermediaries who can provide large, structured corpora with clear rights and predictable pricing.

No prizes for guessing which one is easiest to execute.

The first option is cheap and fast.
The second is expensive and slow.
The third only exists in a few verticals.

So they scrape, settle, and move on.

What they’re absolutely not going to do, which some in the industry appear to think, is they’ll patiently sit on their hands and hope “fair use” magically stretches to cover everything.

No.

Suing them might feel satisfying, but it doesn’t change the incentive structure. The music industry tried the “see you in court” strategy for a decade. Nobody looks back on that era and says it was their finest commercial decision.

What actually changed behaviour was when legal access became cheaper and simpler than cheating.

That’s the real job here: create conditions where paying publishers is the low-friction, rational move.

Litigation changes the risk calculation at the margin, sure – the New York Times suit against OpenAI and Microsoft, the French cases against Meta, and the $1.5B+ Anthropic settlement all matter.

What those cases don’t supply is a way to pay thousands of publishers consistently.

For that, you need something blunter: a setup where paying for clean, trackable content is cheaper, safer, and simpler than gambling on scraped data.

Not because anyone suddenly finds their conscience, but because the spreadsheet says so.

Charlie Munger

Digital fingerprints: where the toll roads start

To get that flip, you need industrial infrastructure. Boring, but effective.

Toll roads don’t stop people driving, they simply charge those who use the road most.

To change IP incentives, publishers need one boring, essential thing:

Digital fingerprints for text.

Every article, column, and long-form piece needs a durable machine ID and ownership metadata that can survive paraphrasing, summarisation, and blending.

When an AI system trains on, retrieves, or remixes that content, those fingerprints should show up in the exhaust:

which outlets contributed to the answer
roughly what share of the answer came from where
how often that material is being hit

Think of it less like DRM and more like metering.

Not a wall to keep everyone out, but a turnstile that counts the people coming through.

Once usage is trackable, a lot of “impossible” conversations become straightforward commercial questions:

How much is this domain worth to you in this vertical?
Do you want exclusivity, or depth?
Are we pricing per token, per request, per seat, per sector?

That’s the infrastructure gap. Until it’s plugged, dark revenue stays dark.

AI economics are being written now

Here’s the part that should bother anyone who signs budgets or sits in a board meeting.

Deals are already being signed.

While most of the industry is still arguing about prompts, AI companies are quietly locking in supply deals with whoever can deliver:

large volumes of quality content (scale)
clear rights
clean data
simple technical integration
boring but reliable reporting

Sometimes that’s a publisher.
Often it’s not. It’s a platform, a data broker, or a weird aggregator you’ve never heard of.

Those early contracts are setting the reference points for value, establishing the baseline economics for the next decade:

which sectors get treated as “must-have”
what high-quality content is worth in those sectors
how granular attribution has to be to unlock serious spend

If you arrive after those numbers are already baked into product roadmaps and budgets, you’re not a strategic partner. You’re now a line item.

So what do you do with this?

If you work in publishing, this isn’t a think-piece. It’s a checklist.

This quarter, you still have time to:

Ask the embarrassing question internally
Send a one-line email to product, legal, and editorial:
“When our content shows up where it shouldn’t – cloned or inside AI tools – how do we know, and what happens next?”
The answer will tell you how exposed you are.
Put dark revenue in your financial model
Add a line labelled “AI licensing / royalties” to your three-year P&L, even if the number is fuzzy. Zero is still a decision.

It says you believe your content won’t matter enough for anyone to pay for it.
Start looking for infrastructure, not ‘AI products’
Ignore anything that sounds like another shiny dashboard. Look for specifics: fingerprinting, attribution, clearing, payment.

Whether that’s what we’re building at Writers’ Bloc or someone else’s stack is less important than getting on something that can talk to AI systems at scale.
Use this piece as a litmus test
Share this with your CEO, your legal head, or whoever leads AI/strategy and watch their reaction.

If they shrug, you’ve learned something useful about how seriously your organisation takes its own future revenues and royalty line.

If you’ve read this far, you’re already ahead of most of the industry.

You’ve seen the outline of the missing money, the dark revenue is already there.

The question is whether you’re going to help design the meter, or wait for someone else to hand you their version of the bill.

If you want help with any of this, or simply have a question, feel free to book a call or reply to me directly.