Copyright and the Progress of Science: Why Text and Data Mining Is Lawful
This Article argues that U.S. copyright law provides a competitive advantage in the global race for innovation policy because it permits researchers to conduct computational analysis — text and data mining — on any materials to which they have access. Amendments to copyright law in Japan, and the European Union’s recent addition of limitations on copyright to legalize some TDM research, implicitly acknowledge the competitive benefits provided by the fair use provision of U.S. copyright law.
Focusing only on U.S. law, this Article makes two general contributions to the literature on fair use: (1) in cases involving archiving, the user’s security precautions are relevant under the first fair use factor and should not be treated as an unenumerated factor or as part of the market harm analysis; and (2) good faith should not be a factor in fair use analysis, but even if courts do consider good faith, TDM research conducted on infringing sources, such as Sci-Hub, is still lawful because the research provides transformative benefits without causing harm to the markets that matter. This Article also revisits the issue of temporary copies to argue that certain steps in TDM research do not make copies that “count” under U.S. law and that it is possible to design cloud-based TDM research that does not implicate U.S. copyright law at all. This Article addresses the needs of many audiences including policymakers, courts, university counsel, research libraries, and legal scholars who seek a thorough legal analysis to support this argument.