Providers of texts in electronic format often gather data on their readers: this is the case with E-readers.[1] In the case of academic publishers,
web tracking is part of a general trend towards data-driven management of research and higher education, where the data are collected and sold by private companies.
In the 2010s, major commercial publishers, in addition to providing content, have started performing data analytics. This is in particular the case of Elsevier, Pearson and Cengage.[2] In 2018, the three leading research data analytics vendors were Clarivate, Digital Science (a division of the Holtzbrinck Publishing Group), and Elsevier (a publisher).[2] These companies are selling research intelligence data tools to universities, research funders, and governments.
For example, Elsevier is selling the information system Pure to universities, with the claim of providing a "comprehensive overview of all their research activities" by aggregating "information from all their data sources".[3] The 2020 partnership of Elsevier with Dutch research institutions, bundles a Publish and Read contract with research intelligence services.[4] In 2018, Elsevier won a contract for collecting data for the European Commission's Open Science Monitor.[5][6] The Irish Science Foundation is basing its strategy on data it purchases from Elsevier.[7]
Tracking readers allows publishers to improve services, for example by providing targeted reading suggestions, or by adapting search results to personal profiles.[8][9]
The acquisition of research workflow tools by big publishers has been attributed to a strategy of research
workflow embedment, in other words vertical integration of academic infrastructure.[10] For example, Elsevier has acquired the reference manager Mendeley in 2013 and the preprint server SSRN in 2016.
It has been theorized that
this integration leads to a data-driven organization of research.[11] The focus is no longer the scholarly article, but the individual researcher, whose online behaviour generates valuable data.[12]
The pirate website Sci-Hub has been threatening the subscription revenues of publishers. Sci-Hub downloads articles from publishers' websites using genuine university credentials. Some publishers have been claiming that this is a threat to universities' network security, and have founded the Scholarly Networks Security Initiative for combating it.[13] The initiative has been advertising tools for tracking users[14] before declaring in 2021 that is does not advocate the use of spyware.[8]
Academic publishers use standard methods of web tracking.[15] They gather information on users who connect to their websites, such as login data, browser fingerprints or IP addresses. Extra information can be provided by third-party cookies that publishers insert in users' computers. A 2019 study of 15 publisher websites found an average of "18 third-party assets being loaded on their article pages".[15]
Data collection is facilitated by tools that are ostensibly designed for helping readers access the literature, such as GetFTR, an academic implementation of Single sign-on.[1]
Data on individual users can be aggregated using "audience tools", i.e. commercial software from companies such as Adobe, Oracle or Neustar.[15]
Systems for managing academic libraries, which may be provided by Alma, ExLibris or OCLC, can perform data collection. Libraries can become dependent on such systems.[1]
A 2021 petition demanded that publishers "stop tracking science", and asked research institutions to sign the DORA declaration.[16]
A 2021 statement by the Invest in Open Infrastructure organization, also supported by other organizations, called for more oversight and regulation of Clarivate after its acquisition of ProQuest, with the aim of reining in "surveillance capitalism" in scientific research.[18]
↑ 1.01.11.2Siems, Renke. "Das Lesen der Anderen: Die Auswirkungen von User Tracking auf Bibliotheken". O-Bib. Das Offene Bibliotheksjournal (VDB). doi:10.5282/o-bib/5797.
↑ 2.02.1Aspesi, Claudio; Allen, Nicole Starr; Crow, Raym; Daugherty, Shawn; Joseph, Heather; McArthur, Joseph Thomas William; Shockey, Nick (2019-04-03). SPARC Landscape Analysis: The Changing Academic Publishing Industry – Implications for Academic Institutions. Center for Open Science. doi:10.31229/osf.io/58yhb.
↑Posada, Alejandro; Chen, George (2018-06-15). Inequality in Knowledge Production: The Integration of Academic Infrastructure by Big Publishers. OpenEdition Press. doi:10.4000/proceedings.elpub.2018.30.
↑Herb, Ulrich (2018-04-26). "Zwangsehen und Bastarde". Information - Wissenschaft & Praxis (Walter de Gruyter GmbH) 69 (2-3): 81–88. doi:10.1515/iwp-2018-0021. ISSN 1619-4292.
↑Moore, Samuel A. (2020-07-28). "Individuation through infrastructure: Get Full Text Research, data extraction and the academic publishing oligopoly". Journal of Documentation (Emerald) 77 (1): 129–141. doi:10.1108/jd-06-2020-0090. ISSN 0022-0418.