Skip to navigationSkip to content
REUTERS/Joshua Roberts/File Photo
What a disappointment
POORLY WRITTEN AND BADLY TRAINED

Scribd taking down the Mueller Report is the future the EU has voted for

By David Yanofsky

A convenient feature of nearly all documents created by the US government is that they have no copyright. As such, the Mueller Report exists in the public domain both colloquially (it’s everywhere!) and legally (no rights restrictions!).

However, in a taste of what’s to come with the EU’s views on copyright enforcement, the online document portal Scribd took down multiple copies of the Mueller Report claiming that their algorithms identified it as a copyrighted work.

Scribd thought the Mueller Report was copyrighted because there was no one to think otherwise—the company uses an algorithm to make determinations about intellectual property violations.

Update (5:36pm ET): A spokeswoman for the company says that “a leading global publisher” released the report as a book, fooling Scribd’s systems into thinking the report was owned by the publisher. The company identified 32 copies of the document, all of which were removed and then reinstated, she said.

There are still many copies of the report on Scribd, though searches on Scribd for text known to be in the report reveal that few of those copies are more than the unsearchable images of the document’s scanned pages.

A separate request Quartz made to reinstate its own versions of the document was resolved 17 minutes after we sent it, seemingly after an employee review.

Users affected by the takedown received an email from the company indicating that “Scribd’s BookID copyright protection system has disabled access” to their documents, even though the email admits, “This does not necessarily mean that an infringement has occurred” or that the uploaders “have done anything wrong.”

The email continued, “Like all automated systems, it will occasionally identify legitimate content as a possible infringement. Unfortunately, the volume of content in Scribd’s library prohibits us from reaching out for verification before BookID disables content. Scribd frequently updates BookID in order to reduce false positives.”

In summary:

  1. Scribd admits there’s no way to making an automated system that is 100% correct all the time…
  2. …has decided that it’s too popular a service to have humans vet its algorithms decisions, and…
  3. …will update its systems after the fact when things go awry.

The Council of the European Union recently approved legislation known as Article 13. It’s a law that requires internet platforms to police content for copyright violations before it goes up, rather than only after it is reported as infringing by a third party as they do now. In practice, it is expected to lead to more scenarios like the one here: public domain and other legal uses of work get blocked or taken down by an unaccountable system of corporate dragnets based on code so poorly written and algorithms so poorly trained that it mistakes the most talked-about and most widely shared public-domain documents for copyrighted works.

0