AI and Copyright InfrastructureJuly 6, 20267 min readBy Publicator Editorial

AI Rights Reservations Are Becoming Publishing Infrastructure

The EU AI Act implementation process is turning text-and-data-mining rights reservations into machine-readable publishing operations. Journal teams should treat TDMRep, policy URLs, PDFs, EPUBs, and article pages as part of one rights workflow.

A journal can have a carefully drafted copyright policy and still be nearly invisible to the machines that matter. That is the practical lesson behind the European Commission process on text-and-data-mining rights reservations under the AI Act and the General-Purpose AI Code of Practice. The policy question has moved from "do we reserve rights?" to "can a crawler, a model provider, a repository, and a publisher all understand the same reservation without guessing?"

The Commission launched a consultation in December 2025 to support implementation of the AI Act obligation for general-purpose AI model providers to identify and comply with rights reservations expressed by rightsholders. The process is meant to identify machine-readable opt-out protocols that are technically implementable and widely adopted. STM reported on June 15, 2026 that a first workshop had taken place on June 2 and that TDMRep had emerged as the protocol with the most support among rightsholders.

For scholarly publishers, this is not only a European legal development. It is a signal that AI-era rights governance is becoming part of publishing infrastructure: origin servers, article templates, PDF metadata, EPUB packages, platform migrations, license pages, and editorial ownership all have to line up.

The Rights Signal Has To Leave The Policy Page

Many journal sites still treat AI crawling and text mining as a paragraph in terms of use. A human can read it. A lawyer can point to it. An editor can reassure a society board that the language exists. But the operational problem is different. The AI Act process is focused on whether model providers can identify and comply with reservations in a machine-readable way. A buried paragraph is weak infrastructure for that job.

TDMRep is relevant because it tries to make the rights signal discoverable. The W3C Community Group final report describes a protocol for expressing TDM rights reservations and for discovering associated licensing policies. It supports several places where the signal can live, including a file on the origin server, an HTTP response header, HTML metadata, EPUB metadata, and PDF metadata.

That breadth matters. Scholarly content rarely exists as one web page. A single article may have an HTML version, a PDF, supplementary files, an EPUB export, a JATS package, a repository copy, a preprint, and cached representations in indexing systems. If only the website footer carries the rights statement, the journal has not actually governed the article supply chain.

Robots.txt Is A Gate, Not A Rights Record

Robots.txt remains part of the conversation, and the Commission consultation explicitly referred to commitments to respect robots.txt and later versions of that standard. But robots.txt was built for crawl permissions, not for the richer questions journal teams now face. Can content be mined for non-commercial research? Is a license available? Who should a machine or company contact? Does the reservation apply to all content or only some collections? Are PDFs covered as well as HTML?

A journal that relies only on robots.txt risks compressing too many meanings into a coarse technical signal. Blocking everything may protect one concern while damaging discovery, preservation, indexing, accessibility tools, and legitimate research mining. Allowing broad crawling may be useful for discovery while leaving rights-reservation intent unclear. The emerging infrastructure needs more precision than allow or disallow.

This is why rights reservation should be treated as metadata governance, not website administration. The decision belongs across editorial leadership, publishing operations, legal, platform, and production. Someone has to decide the journal-level policy, but someone also has to make sure the policy is expressed consistently at the resource level.

PDFs Are No Longer An Afterthought

Journal teams often modernize article pages first and leave PDFs as static artifacts. That is understandable; the web page is easier to change. But the TDMRep discussion exposes the weakness in that habit. STM says it helped expand TDMRep to cover PDF and EPUB formats. The W3C Community Group report also includes PDF and EPUB mechanisms in its protocol section.

This matters because PDFs are still the version many readers download, share, archive, and feed into reference managers or analysis tools. If the HTML page says one thing about TDM rights and the PDF carries no signal, downstream systems may see an incomplete record. The same issue applies to EPUB exports, special issue compilations, and article packages moved between hosting vendors.

The operational test is simple: pick one recently published article and ask where the rights-reservation signal appears. If the answer is "in the website terms," the workflow is probably too shallow. If the answer includes the article page, server-level declaration, PDF metadata, policy URL, and internal ownership for updates, the journal is closer to treating AI rights as production data.

The Policy URL Has To Be Maintained Like Infrastructure

A machine-readable reservation is only useful if the policy it points to is intelligible and current. That policy URL should not be a vague copyright page last updated during a website redesign. It should tell a user or automated actor what is reserved, what uses may be licensed, who can grant permission, whether terms differ by content type, and how exceptions such as open access articles, third-party images, data, or supplements are handled.

The European Commission training-content template for general-purpose AI models points in the same direction from the other side of the table. It requires providers to publish summaries that include information about data sources and scraped online content, and it says enforcement for failures to publish the required summary can begin on August 2, 2026. Publishers should not assume those summaries will be enough to protect their interests. They should make their own rights signals easier to find, interpret, and audit.

A policy URL also needs change control. If a society changes licensing strategy, if a journal converts to open access, if a backfile is moved, or if a platform vendor changes asset handling, the rights-reservation policy may need to change with it. Treating that URL as static legal furniture creates the same drift problems publishers already see with APC pages, license statements, and preservation records.

Where Journal Workflows Will Break First

Platform migrations that move article pages but do not preserve server-level rights files, HTTP headers, or PDF metadata.
Mixed-license journals where some articles are fully open, some contain third-party figures, and older backfiles sit under legacy terms.
Production vendors that generate PDFs or EPUBs without a field for TDM rights metadata or a tested method for carrying it into final files.
Society portfolios where each title has a slightly different policy but the hosting layer applies one generic technical setting.
Repository and indexing feeds that expose copies or derivatives without preserving the publisher-facing rights signal.

None of these failures requires bad intent. They are the normal result of treating rights language as prose while publishing systems operate through files, headers, metadata packages, and handoffs. The more AI crawling becomes a compliance and licensing issue, the less sustainable that split becomes.

A Rights-Reservation Runbook For Journal Teams

The useful response is not to wait for every legal question to settle. The useful response is to map the surfaces you already control and decide how rights intent should travel through them.

Inventory the public surfaces: journal home, article pages, PDFs, EPUBs, supplementary files, XML feeds, repository copies, archive pages, and API responses.
Decide which content classes share the same TDM position and which need separate treatment, such as OA articles, subscription backfiles, data files, images, and third-party materials.
Create or revise a policy URL that is specific enough to support licensing conversations and operational enough for staff to maintain.
Test whether the platform can express rights reservations through the relevant mechanisms, including server files, headers, HTML metadata, PDF metadata, and EPUB metadata where applicable.
Assign an owner for updates during platform migrations, license changes, journal launches, backfile imports, and production vendor changes.
Keep evidence. Record when the policy changed, what technical signals were deployed, which titles or content classes were covered, and who approved the change.

The Strategic Point Is Not To Block AI

Some publishers will use rights reservations to create licensing routes for commercial AI use. Others will allow certain mining activities for research while reserving rights for other uses. Open access publishers may decide that broad reuse is part of the mission, but still need to distinguish article text from third-party media, datasets, or content under different terms. The right answer will vary by journal, business model, jurisdiction, and community expectation.

What should not vary is the discipline of expression. A journal should know what it intends, where that intent is encoded, and whether downstream article formats carry the same story. In an AI-mediated discovery environment, ambiguity is not neutral. It shifts work to libraries, vendors, platforms, and eventually editors who have to explain why the public signal did not match the journal policy.

Practical Takeaway For Journal Leaders

Before the next platform roadmap or publishing operations meeting, ask for a one-article proof: show the rights-reservation position on the article page, in the PDF or EPUB where relevant, in any server-level or header-based signal, and at the linked policy URL. Then ask who owns that chain when the journal changes license model, migrates hosts, or republishes a corrected file. If nobody can show the chain end to end, AI rights reservation is still a policy aspiration, not publishing infrastructure.