How the NYPL confirmed 480,000 books were in the public domain


The New York Public Library (NYPL) has been engaged in an impressive project over the past year to figure out which books published between the years 1923 and 1964 are no longer protected by copyright (TL;DR 480,000 books). People interested in sort of thing have long known that books published before 1923 were out of copyright. Due to the complexity of copyright law, and the many changes that have been made to that law over the years, the first books that have been known to be out of copyright since 1923 only started to exit copyright protection this year. The short version is that 1923 + 95 years protection = 2018, so books published in 1923 started exiting copyright protection in 2019. For the long version read the NYPL’s excellent blog post U.S. Copyright History 1923–1964.

To figure that out, the NYPL created a database of all the data published in the Catalog of Copyright Entries, a series of book published until 1977 (after which it was digital), and cleaned the data so they could figure out which books had been renewed and which had not. These books have been searchable online before, such as through Stanford and Google, but these search interfaces only allowed you to see that copyright had been renewed, it didn’t insure that it had not been (and typos, OCR problems, and spelling differences between when a book was initially published and when I may have been renewed made it difficult to know for sure that a book had never been renewed). By analyzing this data they were able to determine that only roughly 25% of all books had their copyrights renewed during this important 40 year period. That means that roughly 75% of all books published between 1923 and 1963 are out of copyright, or roughly 480,000 books.

There’s still a lot of work to be done to make all of these books accessible. For one, only books through 1937 have Library of Congress Control Numbers which allow them to be easily linked to specific books published online via the Hathi Trust and Google Books. More work will be needed to link the rest of the books to the versions already scanned and put online, thus allowing them to be fully unlocked for users to access online. See the NYPL blog post U.S. Copyright History 1923–1964 for more details on this amazing project.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.