The Publisher Paradox: Media on the Outside, Data on the Inside

On paper, you run a media company.

You commission stories, fight over headlines, obsess about open rates and renewals. Your survival still depends on producing and distributing high-quality journalism into a very human market.

But under the bonnet, the market has quietly reclassified you.

To AI labs, search platforms and infrastructure companies, you are no longer a “media business”. You are an audience business with a data asset attached; an archive of language to be strip-mined into tokens and fragments.

That’s the publisher paradox:
You must keep making great media to stay alive…
while the real money is shifting to people who treat your work as data.

And right now, a lot of that money is showing up in someone else’s P&L as dark revenue.

Articles vs tokens: how the market really sees you

You still sell wholes: articles, issues, books, subscriptions.

AI buys parts: tokens and fragments.

That’s what’s happening, and here’s why:

Large language models run on Markov chains, a concept in statistics and probability, that says "what happens next depends only on the state of affairs now."

The models don’t “read” your feature; they slice it into tiny statistical units – words, characters, punctuation – and learn which token tends to follow which. It’s industrial-scale auto-complete, not actual comprehension.

Markov chains underpin Google’s Page Rank, and why they catapulted to the top of the Search market in the 00’s. They also drove predictive text, and today AI chatbots.

They’re basically predictive text, on steroids.

When a user asks a question, the model doesn’t go “ahhh yes, that great longform from last March.” Instead, it assembles an answer out of Lego-sized chunks from everywhere, including you.

Humans pay for stories.
Machines pay for patterns.

And markets have a long history of following the smallest measurable unit. Singles beat albums. Impressions beat full-page spreads. Derivatives beat whole loans. Once someone can meter the fragments, the fragments become the asset.

AI is doing that to text. The paradox is you’re still pricing the album.

The world is already treating you like a data company

Look at where the serious money is moving:

AI labs are paying to clean up their mess.
Anthropic has agreed to a proposed $1.5 billion copyright settlement with authors and publishers over pirated books used in training, plus destroying that training data. It’s the largest reported copyright class action settlement to date – effectively a retroactive fee for unlicensed fragments.
Infrastructure is putting meters on fragments.
Cloudflare’s Pay Per Crawl and AI Crawl Control let site owners answer an AI bot with HTTP 402 “Payment Required”, plus a price per request. Every crawl can now be a billable event rather than a silent scrape.
Collectives are building AI-specific pipes.
The Copyright Clearing Centre (CCC) has announced an AI Systems Training License so organisations can lawfully use large catalogues of works as training data – a collective licence explicitly for turning text into model input.
The text vs music gap is embarrassing.
Text collectives like US-based CCC and UK Publishers’ Licensing Services (PLS) together manage around $250 million a year, while music’s ASCAP alone reported $1.835 billion in 2024 revenue and nearly $1.7 billion distributed to rightsholders.
In a nutshell the reason is because music has cohesive infrastructure and widespread industry buy-in. The publishing industry, simply put, does not.
Indie publishers are sounding the alarm.
The Independent Publishers Alliance warns that roughly one-third of sites in its network could shut down within 15 months if AI search keeps diverting traffic and regulators don’t intervene. Their EU antitrust complaint in June over Google’s AI Overviews is very simple: our content, your summary box, our collapsing business model.

So yes, congratulations and hard luck: your newsroom has become an unpaid R&D lab for systems that now sit above your links and answer your readers’ questions for you.

That’s dark revenue in its purest form: value created by your journalism, captured by someone else’s balance sheet.

The real paradox: you still need great media

There’s enough “AI doom” takes out there, but most miss the point entirely:

The fact that models consume tokens doesn’t make articles irrelevant. It makes good articles more important.

AI labs are in a quality arms race. Better outputs require better inputs. A model trained on SEO slurry will sound like SEO slurry. High-quality, well-edited reporting is premium training data.

So the paradox isn’t “media is dead”.

It’s that you still need to do excellent, expensive journalism to survive with humans AND act like an audience + data company when you negotiate with bots and machines.

If you ignore the second part, you keep all the costs of being a media business, whilst handing off the upside to whoever is savvy enough to treat your work as a structured data asset.

Provenance and blockchain: boring, necessary plumbing

If the market is going to pay you as a data company, it needs to know what data is yours.

That starts with:

Digital fingerprints: hashes or watermarks that identify your sentences and paragraphs, even when lightly paraphrased.
Provenance: who created it, who owns it, and under what terms it can be used.

This is where blockchain quietly stops being a conference punchline and becomes plumbing.

AI is the perfect stress test:

Trillions of tiny, machine-to-machine uses
Across multiple jurisdictions
With different rights, exemptions and output rules

A tamper-evident ledger that records “this fragment came from here, under these terms” is not speculative in that world. It’s how you clear transactions, calculate royalties, and audit usage after the fact.

This is no silver bullet; blockchain doesn’t “fix” AI. But it does structure the market around what AI actually consumes: fragments with provenance.

Path out of the paradox: turning dark revenue into recurring revenue

This isn’t about feeling clever and doomed. I’m not going to sugarcoat anything, or say it’s all roses. But equally I’m not going to profess the end of the world. It’s about turning a structural insult into a recurring revenue line.

Here’s a practical path that respects your intelligence and your constraints:

Name the paradox internally.
Stop talking about “our AI policy” and start saying:

“we still produce media for our readers, but there is a large market segment that now values us as an audience and data company. Where is that data being monetised, and by whom?”
Once you label the publisher paradox, people stop treating this as a side-issue.
Map your dark revenue.
Run simple tests: key stories, brands, and bylines in major AI assistants. Where are your words showing up with no click, no credit, no cheque? That isn’t anecdote; it’s your dark revenue report.
Separate human vs machine rights.
In contracts – with freelancers, syndication partners, platforms – treat training and machine-use rights as their own line, not a footnote buried under “digital”. If someone wants your archive as AI training data, that’s not a freebie bundled with web rights.
Design for fragments in your stack.
Work towards a CMS where paragraphs, pull-quotes, and charts are addressable objects with IDs and metadata, not just a wall of HTML. That’s how you later attach fingerprints, provenance and, eventually, pricing.
Plug into the new rails, on your terms.
Experiment with tools like Pay Per Crawl and 402 “Payment Required” responses instead of binary “allow or block”. Pair that with emerging collective licences (like CCC’s AI Systems Training License) so you’re not negotiating every fragment manually.

None of this stops you making great media. It lets you monetise the fact that great media is also high-value data.

Readers will always want stories.
AI will always want fragments.
Markets will always reward whatever can be measured and traded simply.

These are features, not bugs.

The publisher paradox is accepting that you are still a media company to your readers,
and already an audience + data company to everyone else.

The sooner you start charging like the latter,
the easier it gets to keep funding the former.

If you’d like help fingerprinting your content or mapping your dark revenue, Writers’ Bloc can help.

Feel free to reply directly to this email, or book a short 15-minute call here.