We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

KEYNOTE - Surveillance capitalism in our libraries

00:00

Formal Metadata

Title
KEYNOTE - Surveillance capitalism in our libraries
Title of Series
Number of Parts
14
Author
Contributors
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
In the transition from industrial to informational capitalism, much of our lived experience has gone from physical to digital, including library services. As publishers, library vendors, and other informational service providers have become internet-based companies, their business models have transitioned from analog services to data-based services. In short, our traditional library service providers are becoming data analytics companies, dabbling in, or diving into personal data brokering. From RELX to ProQuest, major library vendors are finding new ways to extract and monetize people's personal data. Researchers are finding surveillance software like ThreatMetrix in their research databases, and data analytics companies like Clarivate are trying to acquire ProQuest, a major library service platform provider to exploit library patrons' data to create more academic metrics to sell grant funders and research institutions. All of these corporate decisions are part of a trend of our vendors collecting library patrons' personal data. The increasing surveillance capitalism in our library spaces makes open access more important than ever.
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Transcript: English(auto-generated)
Sarah is a professor at the School of Law, which is part of the City University of New York. While primarily researching and teaching information law and policy, she also holds a master's degree in library science
and is a senior fellow at SPARC, the Scholarly Publishing and Academic Resources Coalition. Her research and publications cover information access, surveillance and privacy, and informational capitalism.
In addition to her official roles, she works in data justice projects with environmental and immigrant groups. Most recently, she dived into the field of publishers, library vendors, and other information service providers.
Before I hand over the word, you are encouraged to post questions already during the talk. Please use the Town Square channel and MetaMouse. We will collect them and I will read them aloud for the stream listeners.
Now, the stage is yours, Sarah. Thank you. It's really, really great to be here and it is so, so I know that it's always nice to meet in person, but it's also pretty amazing that there's this opportunity for people from literally all over the world to meet in one place.
So I'm really grateful to see everybody, even though we don't get to physically see each other this year. So I am going to, I think I'm going to be using slides, so bear with me if there are any slide issues,
but you should all see my first slide. So I'm going to be talking a little bit about the issue that, as Joachim stated, is really a hot topic right now of surveillance capitalism in our library.
So I'm going to first introduce kind of the idea of surveillance and tracking products in our library products and then dive into kind of what they're being used for, what the goal of the companies are that are using them, and then kind of leave us with an opening for some discussion that I hope we get to have after my brief presentation.
So let's begin. So this probably, I mean, in a sense, this has always been true, right? There's always been some curiosity about what people are researching and interest in surveilling people as they research.
That's not new, but it is new that we are using a lot of third party platforms, right? We're using a lot of platforms and services that we buy from library vendors. And more and more, those products are collecting our data and following us as we use their products. And they're doing it for different reasons, for a whole host of reasons that I'll explain in a few minutes.
But the fact of the matter is, is we have surveillance companies and data analytics companies that collect our data, running a lot of our library databases and services.
So the companies that own our library resources are no longer publishers. So we were always used to calling Elsevier a publisher and Springer a publisher. These companies, less and less, are considering themselves publishers. They're more considering themselves data analytics companies or providers of intelligence solutions
or companies that provide predictions and prognoses, not just the content from their journals and from whatever it is that they're selling us in their databases, right? They're selling that content plus more products and services. And a lot of those products and services happen to be data analytics products and services
that kind of meld together their published content with personal data about users and about researchers. So the first question I always like to answer is, how did we get here, right? How did we get to this place where traditional publishers that publish science journals and humanities journals
and other scholarship are now data analytics companies? Because in some sense, it's the same thing. They're providing information, right? But in other senses, that's quite different, right? Publishing journals and also transitioning to selling data about people
and about what people are doing with research. So this transition from publishing to data analytics actually started when digitization started, right? And I mean, the first database was made long before we decided to start digitizing, kind of en masse different things.
But the first kind of the fruits of digitization as we know it today started in the 1990s when we figured out how to turn all sorts of different media into kind of a binary code that we could put on a single USB drive or on a single hard drive. We found a way to take music, images and writings and put them all in the same place, right?
We no longer have to have a separate storage facility for film reels as we do for journals and other texts, right? We can put a set of encyclopedias in a file right next to, you know, Bach concerto, right next to Monet, right?
They can all be on the same little disk drive. And that was an innovation that gave a lot of opening to publishers to kind of collect and platform and publish more different types of content. And around the time that that mass digitization happened, we also had this ability, new ability to share materials more easily than ever.
So we could share sounds and images and writings with each other digitally and electronically instead of, you know, having to go to a physical place or wait by the mail, right, for a physical copy of something. So those two things, those two events, digitization and the advent of the Internet and then the advent of Wi-Fi which allowed us to take Internet everywhere with us,
you know, in our phones and on our watches, that all began in the 1990s. And if you'll notice, if you're anyone's familiar with the publishing industry, that's also when the publishing industry kind of started to undergo big changes, mass consolidation and kind of in some places collapse, right, as these companies started merging with each other and kind of went into this like bonanza of acquisitions and mergers.
And that was all kind of the effects of companies figuring out how to monetize and profit from digitization in the Internet.
So major content publishers like Elsevier and Springer and all of, you know, kind of the oligopoly of publishers in academia, we're excited, right, that they had all these new opportunities to platform and purchase more and more content, right. They could get all sorts of different types of journals. They no longer had to focus on several journals.
They could acquire thousands of journals, right, a whole copyright, excuse me, portfolio of copyrights. Right. That's what I call it. No longer was each journal kind of a unique thing that had to be taken care of separately. Now they could kind of get hundreds of them and platform all of them. It was just this endless ability to platform as much academic content as they could get.
So at this point, library vendors stopped considering just publishing, right. They could have just said, you know, we're going to keep just publishing journals and we're going to maintain our roles as traditional publishers.
Instead, they said, you know, I think there's more possibilities here. I think there's other things we can do with our content. There are more ways for us to make money with our platforms. So they kind of veered off course from publishing into data collection. And some of this is, you know, these big data projects that are useful for academics. Right. They help us find each other and find new research products, find grant opportunities.
But some of it is a little opportunistic and exploitive. Right. Some of it's not as good for researchers, but all of it is good for Elsevier's profits and ProQuest and Clarivate's profits. Right. So. When these companies kind of veered off the road from traditional publishing to data collection,
they started buying up more and more types of content and also more and more types of data analytics products and services. So here's one example. Clarivate. Right.
Claire, you can see this is a kind of a timeline that Marshall Breeding made of Clarivate's kind of mergers and acquisitions to become a gigantic company that is now on the precipice of actually acquiring ProQuest, which is another huge company. You can see all of these smaller companies kind of getting spun up into Clarivate and becoming part of Clarivate.
You can see a lot of Thomson and Reuters companies and then Thomson Reuters merged and became Thomson Reuters. And you can see just a whole, you know, kind of collection of different standalone companies that are now under Clarivate's corporate umbrella.
Similarly. Rita, Elsevier, Lexis, Nexus acquired dozens of companies between this is just reflects between 1993 and 2018, but there were acquisitions before and there continue to be a lot more acquisitions. So dozens of companies read Elsevier and Lexis, Nexus acquired.
And then I mean, read Elsevier merged with Lexis, Nexus in the 90s, too. So just a lot of consolidation in both the library services platform like Clarivate and then companies like Elsevier and other major library providers. So one thing to point out, this is kind of a close up of part of that relics list that you saw in the last slide.
I put red stars by all of the data analytics products that the company bought versus all of the academic products. And you can see only one, the yellow circle is the only data academic focused product that is in this cluster.
Right. They got Mendeley right in 2013. But then most of read Elsevier, Lexis, Nexus's acquisitions have been in data analytics. So they're developing these robust data analytics products. And I'm going to explain briefly what data analytics means in this context, because data analytics is kind of a vague, broad term.
So this is how Clarivate describes its data analytics products and programs. So Clarivate doesn't they sell the access to Web of Science, which is an index.
But really what that index is, it's not a place where you can necessarily go and get the articles you need. You know, it's not an open access or even a closed access resource. It's really a tool to see how often people's research is looked at and who's looking at it. So it's collecting data about the way academics and grant funders and other people look at academic research.
So the way Clarivate describes its data analytics is at the bottom of its pyramid of value. Kind of the foundation is all of its raw data assets. So that's all of the data is collected under, you know, Web of Science and all the clicks it collects,
everything it collects around just raw data about, I mean, both the content and then how people are using the content. Then, you know, one kind of level up is smart data. So that's taking that data, categorizing it, creating links between it, just kind of organizing and sorting that data out.
And then where the value is really added for companies like Clarivate and Relics is through running the data, the organized data through data analytics products that make predictions and prescriptions. So they predict what might happen next in academic research and they tell people what to do about those predictions.
So they give suggestions to grant funders and to researchers and to institutions who are just trying to decide who to give tenure, you know, and who to hire, what labs to support. Right. So they make those predictive analytics and prescriptive analytics with new data analytics products that they're developing or that they've acquired.
So similarly, Relics describes its publishing platform as no longer being based in publishing. So here's just a little description from a Read All Severe Lexis Maxis article
where one of the CEOs of Relics describes the company's having a heritage in publishing. So as being a former publisher that now has managed down its publication streams from 50 percent of what it does to 10 percent.
I'm sorry, it's print print publishing sales. Sorry. My my whole family was struck with an illness this weekend. So I am we are all going through this together. So I apologize for my tissue. So they've reduced their print sales from 50 percent to just 10 percent of their business.
And they're at the same time ramping up their data analytics products. And they are also redefining what they call themselves. So Relics no longer calls itself a publisher. It doesn't publish, you know, academic work or the law or news anymore.
It now focuses its energy on being a business services company that provides insights and intelligence to businesses. And those businesses include grant funders and academia, if you want to look at academia as a business, but they don't consider themselves publishers.
So when we call Elsevier a publisher, that's a misnomer, right? That's not what they call themselves anymore. They're a business services company. Clarivate has a similar description. Clarivate. This is from a description of some PR, some press about how they're acquiring ProQuest, which is a major library services platform company in the US.
But it also collects a lot of data about users, right? ProQuest helps libraries organize their their content and manage their subscriptions. But it also collects every click that a library user makes in ProQuest.
So there's a huge amount of data that ProQuest collects and Clarivate plans to use ProQuest library patron data to broaden their analytical offerings. So they also describe themselves as making predictive and prescriptive analytics and and talk about how ProQuest will broaden those analytical opportunities.
They are working on building their Clarivate Research Intelligence Cloud, which is just a massive data collection and data analytics business service. So these companies are doing some they're interestingly, they're providing both the traditional library services that we need, right?
Search functions and organizing indexing functions and content. But they're also providing a whole different range of services to non library providers or non library businesses.
And I'll get to that in a moment. So here is just one more image I like to share about. This is the visualization of what ARILEX describes its current business model as. And I like it because it looks like an extruder, like a clay extruder that, you know, you would put a bunch of different pieces of clay through and then they come through as like what like a tube or, you know, spaghetti hair or something.
So they basically extrude all sorts of data. And that includes primary research. That includes like Elsevier journals and publications. It also includes public records. So those are public records about people and events, people's personal data and proprietary data that they have.
News articles, contributory database has been unstructured records, which is just like data dossiers, personal data dossiers, et cetera, and then structured records. And they run that all through. You see that magic like circle image in the middle. And they come out with predictions and prescriptions that they sell to pharmaceutical companies, academic institutions,
even like law enforcement and places like that, as I'll get to in a bit. So you can see that they make entity resolutions, link analysis, clustering analysis, which is grouping things together, and then complex analysis, which is like heatless of people who may do, according to their predictive analytics,
may commit a crime or may not be able to pay rent, may have an opiate addiction, all sorts of different things that they do not sell to libraries. They sell to other entities like policing, financial data and legal analytics purchasers.
So this slide kind of lays out, you know, the academic metrics that they sell, the ones that we use, like SciVal, Pure, Plum Analytics, things that we use to maybe judge the value of our collections or rank researchers,
you know, pull out what articles might be the most popular. But they also, at the same time, relics is creating products for law enforcement and government data brokering. So local, state, federal law enforcement, including in the United States, immigration and customs enforcement, intelligence agencies,
and then the Department of Defense and also agencies that kind of divvy out social services like Social Security Administration and other agencies that decide if somebody should get welfare benefits, unemployment benefits, you know, just social support that the government gives out.
They also sell financial data products to landlords, benefits programs, insurance providers. Relics has a booming insurance data business where it sells data to insurance companies about whether people are, you know, a good or a bad bet to insure, how to insure people, health care systems.
So doctors, hospitals and also employers, the people who hire and fire you, including obviously academic institutions. They also sell legal analytics problem products to lawyers, judges, what have you, to help law enforcement and government agencies decide how their case will fare in court,
what are the likelihood of victory, you know, how much money might you get out of a legal case. So they kind of game the legal system with those data analytics products. So they sell a lot of data analytics products across a lot of sectors, not just the academic sector.
And here are some examples of things that these non-library analytics products do. They do public records data, so that's like data dossiers about all of us. They have millions of files of data about people all over the world. They link people and locations together so they identify where people connect with places,
where people connect with each other. They can notify people and entities of when somebody's status changes, so kind of give a progressive look at what somebody's doing, where they're going, whether they got hired or fired, whether they got arrested, you know, whether their name changed or what have you.
They can ID people. They can ID people on a map, so put people in geospatially where they're located. They can make things like hot lists for police and for landlords. They can make lists of people who shouldn't be insured, who shouldn't be given certain types of medication in health care process.
And they help law enforcement agencies share information with each other. So like Relics has a pool of over a thousand law enforcement agencies where the law enforcement agencies can just see each other's data through Lexis. So they kind of create a fusion center, but they're like a private vendor instead of a government entity providing that service.
So not only can their data analytics products do a lot of different things, they're also very robust. So they have very deep wells of information about us. They have over 283 million active Lex IDs. Lex ID is a data dossier which has like thousands of data points updated in real time about a single person.
They have over 1.5 million billion bankruptcy records, 77 million business contact records, 330 million unique cell phone numbers, 11.3 billion unique name and address combinations.
And so they have very, very they have a huge wealth and depth of data, personal data about people all over the world, and that includes researchers through their research products. So in the United States, a librarian named Chase Wager actually managed to get a copy of his Thompson Reuters data dossier,
which is similar to Relics data dossier. And it was, I think, 41 pages long. And one thing he noticed about it was that it contained inaccuracies and errors. And he saw that on the front page, the cover sheet that he got, like the letter he got with the dossier,
the companies don't vouch for their own data quality. And this is actually one of the biggest problems with these companies' data analytics products and projects. The data inside of the products, they don't vet it. They depend on the downstream data brokers, so people who they buy the data from, to vet it for them. And that's usually not something that happens.
So one of the worst things about these data collections is that they're inaccurate and they're used to make very official big decisions about our lives. But they're not always right. So people get hurt because they can no longer get insurance or they're not able to buy a car or get the healthcare they need because their Relics or Thompson Reuters data is incorrect and there's no way to correct it
because the companies get data from over 10,000 sources. So it's really, really hard to figure out where the data went wrong downstream and to then go and try to correct it.
So if a Relics data dossier is bad, if it has bad data and that data hurts you, there's oftentimes very little recourse that you have, which is really frustrating for consumers, for people, for all of us. So that's one thing to note about the data collections.
And so I guess this is just kind of another indication about how data analytics is taking up more and more of these businesses. So this is from 2020. This is from Relics' annual report. You can see that science, technical and medical materials, which is in large part journals,
still makes up the biggest chunk of the pie. But their risk services, which are their data analytics products, are starting to take over more and more of what they do. And that pie piece, you know, ostensibly likely will consider to grow. So a lot of the business that, I mean, Claribate's whole business is basically data analytics.
But companies like Relics, who are publishers, more and more they are also doing data analytics and that will continue to increase as their data analytics capabilities, as the world's data analytics capabilities increase and more do you collect.
So Alejandro Posada and George Chen had a report, I think, I want to say 2019, but you should, the citations on the next slide, it's an amazing, excellent report. And I advise you should all read it or write it down on your to read list if you haven't read it yet.
But what they did is they studied kind of not the growth of the risk products in Relics, but they really focused on Elsevier and found that Elsevier as more things are becoming datafied and as we're kind of moving into this data analytics world, Elsevier's found a way to spread way beyond its traditional publishing role in library and academic services.
Right. And we all see this happening. Right. We can see that not only do they publish journals, which used to be their only role. Right. They published and sold journals. Now they also collect data from preprints and collect preprints and platform preprints before publication.
And then they also sell data analytics after the research is published about who's clicking on on research, how popular it is. And they're using that to create services to network grant funders, academics, institutions, and to evaluate research based on impact metrics.
Right. We're all familiar with that. So the data analytics service, just in the academic realm alone, is really created more financial and profit possibilities for the company. Right. They can now monetize every part of the research process instead of just publication. Right. And in that sense, that means that the preprints and what happened afterwards, we never really own that anymore.
Right. It belongs to Elsevier and Clarivate and these other companies who are buying it and making money off of it and making it their proprietary content. Right. It takes that it takes kind of the ownership and ability to handle things away from us and gives it to them. And so this is the same image as the last slide, but with the actual products kind of overlaid onto it.
So you can see you can see what that that looks like with all the products you probably recognize some of these products. So, yeah, I really advise you to go read that. That report is really it.
It's been very formative for me. And I think it's kind of eye opening for all of us. So I wanted to I made a similar. It's not nearly as lovely or well researched. But this is kind of how I visualize the way that data analytics are being monetized beyond Elsevier. Right. So generally, I started the top circle. We give our personal data and our subscription
money to get past their paywalls and to get subscriptions to all the journals we need. We give that those those the both of those things, money and data to Thomson Reuters, Relics, other data companies. Right. Sorry, I meant it should say Clarivate that I often talk about Thomson Reuters and Relics because they also happen to be a duopoly in the U.S.
legal information sector, which is a whole different problem, but it's the same companies. So then the R&D, all that research and development that they do to develop more and more data analytics.
They both have they all have big multimillion dollar technology labs that are staffed with thousands of technologists that are building machine learning, A.I., you know, algorithmic products for them and ideas. So they sell those to landlords, creditors, insurance, insurers, employers, law enforcement.
Right. They sell those to all of their users. And that ends up putting more people either in like the way I usually describe it is, you know, in the justice system, right, in the criminal or civil justice system, getting them kind of entangled with government or,
you know, in the case of academia, you know, your grant funding, your preprints, you're hiring it all. All of that becomes part of the system that then goes back to us continuing to need to fund these companies. Right. We have to continue funding them and we fund them even more because with more with more with more grant funders using these companies,
with more law enforcement using these companies, we create more people who depend on these systems. So it's this kind of cycle where they manage to profit by selling content that we all buy. And then by developing data analytics to make more money off of that content and then by selling this data analytics, which leads to us buying more content or having to do more research.
Right. So they really have found a lot of ways to monetize all of this. So one question that people often ask, especially library, library workers and information science people, where does library data go?
So very practically, what does it mean when your library patrons come and use ProQuest or Elsevier or Clarivate or what have you in your library? What happens to their data? So the short answer is we aren't sure. Right. But we do have some indications that surveillance products are ending up in our research platforms.
So Wolfie Crystal, who's a researcher that has done a really great job of seeing where these products get used, how these data analytics products get used, kind of how data brokering is working. He found at least one instance where Relix, read also for LexisNexis, is putting threat metrics,
which is actually a policing surveillance product in their science direct research platform. And we can't be 100 percent sure why the company is doing this. I mean, likely they would say it's for data security to make sure that nobody is illegally downloading their their copyright assets.
Right. Their journals without permission. So that could be one reason. But the fact is, is that threat metrics, no matter what, what the rationale for collecting the data, threat metrics is collecting users data when they use ScienceDirect.
And that should be a little concerning to all of us without more transparency, more explanation of why, why a company like Relix would be surveilling researchers with threat metrics. A little bit of a concern. Right. So that's something to ask for more transparency on or wonder about at the very least.
And it's also a good kind of example of how these days, because we have to we depend on platforms to get our research and to get our library services, we haven't just handed over the keys to our collections to Relix and to these other companies. We've also we're also handing over our data. Right. Because we have to use the company's platforms in order to get the research.
We can't just go, you know, online and get a copy of the journal article and read it most of the time. Usually we have to go through our institution, which puts us on one of these vendors platforms. Right. And the second we're on there, they will begin collecting our data. Right. So that's kind of one of the problems of the kind of walled garden ecosystem of academic publishing online.
With the paywalls and with the barriers to access is that in order to get access, we kind of have to cede our control of our data. Right. We have to say, oh, OK, I really need to look at this journal article. You can look at everywhere I click when I'm you know, you can look if I click on the author's name, you can look at that.
If I click it related articles, you can look at that. We just kind of accept that as part of the deal. So we're not just paying a paywall or giving our data. Right. And even if the companies claim they don't use our data now. So some. So when we have this problem with legal research platforms in the US, at one
point, relics sent an email to every law faculty in the country saying we promise we're not selling. We're not working with ice. We're not working with Immigrations and Customs Enforcement. Thompson Reuters is doing that, but we're not doing that right. And and they never publicly promised that they would never sell our data or use library research data, patron data.
They never promised there's no pronouncement, ensuring that they're never going to use our data in ways that violate our privacy. And that's because depending on where the profits go, the companies will change their mind.
Right. So here is that. Here's the letter that we got from from relics back in what, 2019. They said we are not providing jailbooking data to ICE. We're not working with them to build data infrastructure. That's in bold. Right. They said another company is doing that by which they met Thompson Reuters. But then in 2021, when they finally got the offer,
they signed a sixteen point eight, I think, million dollar contract with ICE to provide data infrastructure. Right. Because they got offered that money. So without more transparency and more assurance from these companies that they're going to handle our data responsibly,
it's really, I think, important to recognize what our relationship with with them is. They're no longer publishers. They're definitely collecting data. They're definitely selling data to a lot of people who we'd be uncomfortable likely with having our library data go to.
And we have no insurance that they're going to keep our data private in the ways that we would want them to. So I want to I think this is a good point to open it up to discussion. I'm happy to take questions. And I'm really grateful that you are considering this topic because I think it's a timely and important one.