Jumping the Paywall
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 85 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/38123 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
5
6
11
12
13
14
16
17
23
24
25
27
29
30
31
32
35
39
41
42
43
44
45
46
47
49
50
51
52
55
56
58
59
60
63
65
66
67
69
72
73
74
76
79
81
82
84
00:00
Chaos (cosmogony)TelecommunicationLatent heatNP-hardVector potentialInformationInternetworkingTheoryWeb 2.0Content (media)Computer animationLecture/Conference
01:33
IRIS-TInclusion mapGroup actionMetropolitan area networkRevision controlInformationContent (media)FreewareOpen setHypermediaSurface of revolutionSoftware frameworkGroup actionContent (media)Point (geometry)Figurate numberWordSlide ruleMobile appAuthorizationSource codeComputer animation
02:49
Process (computing)Limit of a functionInformationLatent heatTerm (mathematics)Proper mapPhysical lawStrategy gameAsynchronous Transfer ModeCategory of beingOpen setDirection (geometry)SubsetCore dumpProfil (magazine)Source codeLecture/Conference
04:12
Length of stayRadio-frequency identificationStatement (computer science)PropagatorOpen setCategory of beingRight angleTerm (mathematics)Attribute grammarComputer animationLecture/Conference
04:55
Content (media)Source codeLibrary (computing)LoginState of matterIdentity managementOvalRule of inferenceSet (mathematics)Computer forensicsDistribution (mathematics)TrailDatabaseLibrary (computing)Multiplication signComputer animationLecture/Conference
05:36
Different (Kate Ryan album)Source codeLocal ringIdentity managementRow (database)Library (computing)WebsiteSystem identificationLecture/Conference
06:12
Source codeContent (media)Library (computing)LoginState of matterIdentity managementOvalRepository (publishing)InformationDigital object identifierObservational studySource codeCASE <Informatik>Observational studyRight angleWordSuite (music)Content (media)Exterior algebraDigitizingLibrary (computing)Uniform resource locatorRoundness (object)Object (grammar)SequenceMathematicsIdentifiabilityPhysicsDot productVector spaceComputer animation
07:18
Uniform resource locatorBeat (acoustics)Digital object identifierRoundness (object)Digital object identifierSequenceSign (mathematics)Proxy serverTask (computing)Library (computing)Pay televisionFile archiverComputer animationLecture/Conference
07:56
TwitterDigital object identifierUniform resource locatorSource codeDigital object identifierTwitterSheaf (mathematics)Probability density functionPhysicalismLibrary (computing)Link (knot theory)Computer animation
08:28
GoogolLink (knot theory)Library (computing)LoginComputer configurationComputerComputer iconWeb browserAuthorizationWeb pageRevision controlOpen setExistential quantificationWindowLibrary catalogKey (cryptography)LoginImage registrationTouchscreenLibrary (computing)Radical (chemistry)Computer animation
09:17
Information securityServer (computing)Information securityRight angleServer (computing)CASE <Informatik>DatabaseComputer animation
10:00
Source codeEntropie <Informationstheorie>PerimeterMagnetic stripe cardAuthenticationInformation securityTape driveRegular graphRight angleRandomizationWordRule of inferenceServer (computing)Row (database)SpacetimeSystem administrator
10:32
Magnetic stripe cardExistenceRow (database)PlastikkarteSocial engineering (security)Information securityTape driveInstance (computer science)Lecture/Conference
10:55
Content (media)Computer animation
11:18
Content (media)DisassemblerOvalMetadataContent (media)Type theoryDigital watermarkingLetterpress printingReading (process)Computer animation
11:45
Drop (liquid)Probability density functionContent (media)OvalData recoveryPasswordArmData managementElectronic program guidePasswordProbability density functionForcing (mathematics)Data recoveryContent (media)Point (geometry)Electronic program guidePublic key certificateServer (computing)Order (biology)Computer programmingComputer animation
12:26
Content (media)Cloud computingForm (programming)Functional (mathematics)Content (media)Digital watermarkingMarginal distributionOpen setWordRight angleWater vaporMultiplication signComputer animation
13:06
Marginal distributionoutputProbability density functionDigital watermarkingSource code
13:30
Web pageDigital watermarkingContent (media)Marginal distributionProbability density function
13:53
Probability density functionWeb pageMathematical analysisDigital watermarkingNatural numberGreen's functionPairwise comparisonMarginal distributionDigital watermarkingParameter (computer programming)Natural languageDifferent (Kate Ryan album)WordFunctional (mathematics)Computer fileNumberIterationAlgorithmInformationSpacetimeWeb pageTrailSource codeRevision controlMathematical analysisGreen's functionLine (geometry)Program flowchart
15:34
Probability density functionTime zoneMetadataTime zoneMultiplication sign2 (number)MetadataAuthorizationConnectivity (graph theory)Field (computer science)Instance (computer science)TouchscreenLie groupInformationComputer animation
16:12
Random numberExecution unitHypermediaData managementInformationNamespaceSystem programmingCore dumpElectronic program guideSample (statistics)Data modelMereologyDigital object identifierRandom number generationUniformer RaumCellular automatonElectronic mailing listText editorHexagonFile formatProbability density functionMetadataParameter (computer programming)WordLatent heatUniqueness quantificationComputer virusAdobe AcrobatInstance (computer science)ImplementationAddress spaceRandom number generationPoint (geometry)Computer fileVector potentialTheoryScripting languageAuthorizationRevision controlTracing (software)Source code
17:45
Probability density functionOpen setOffice suiteHexagonFile viewerParameter (computer programming)Text editorMultiplication signWordInstance (computer science)Point (geometry)MetadataTime zoneRight angleoutputMathematicsFunction (mathematics)Source codeComputer animation
18:36
Content (media)Distribution (mathematics)Source codeCross-correlationThermal conductivityComputer fileSoftwareContent (media)Universe (mathematics)Multiplication signType theoryMetadataWordLibrary (computing)TimestampComputer animation
20:10
Multiplication signInternetworkingLibrary (computing)CASE <Informatik>Position operatorLine (geometry)File archiverBasis <Mathematik>Drop (liquid)Row (database)Information securityInstance (computer science)Right angleFile formatInformation privacyDesign of experimentsComputer animation
21:57
Library (computing)Point (geometry)2 (number)AutomationStapeldateiMoment (mathematics)MetadataDigital photographyBelegleserMultiplication signInstance (computer science)Medical imagingState of matterScaling (geometry)Imperative programmingMassRow (database)Information privacyFreewareDifferent (Kate Ryan album)Order (biology)WordRight anglePhysical lawIncidence algebraUsabilityGroup actionBuildingBridging (networking)Lecture/ConferenceComputer animation
25:24
Physical systemPoint (geometry)Multiplication signRight angleTheoryGroup actionGoodness of fitSpacetimeComputer animation
26:28
Extension (kinesiology)Strategy gameAuthorizationBitUniverse (mathematics)Term (mathematics)Open sourcePhysical lawComputing platformWebsitePresentation of a groupMultiplication signLecture/Conference
27:59
Asynchronous Transfer ModeAuthorizationUniverse (mathematics)Multiplication signOpen setLimit (category theory)Point (geometry)SummierbarkeitWordInternetworkingLogic gateComputer animation
28:43
InternetworkingRight angleGoodness of fitLecture/Conference
29:11
Lecture/ConferenceJSONComputer animation
Transcript: English(auto-generated)
00:13
Our next talk is about the deconstruction of academic paywalls. I'm sure everybody here knows and has seen those specific kinds of paywalls,
00:25
and as you also might have noticed, those academic paywalls differentiate themselves, unfortunately, from other paywalls on the web, not by making themselves transparent for the disadvantage, as one might expect due to the importance of proliferation of knowledge,
00:40
but rather by even more exploitative pricing, which is why I'm very excited for Storm Harding and his talk, Jumping to Paywall. Thank you so much. Welcome. Thank you, guys. Thank you. Thank you for coming. Hi, everyone. I'm Storm. I'm from the internet. Today, I'm here to talk to you guys about paywalls.
01:03
Paywalls are very, basically, you're probably familiar with the concept. It's when we have some piece of knowledge, some piece of information, and we cannot get to it without paying. Today, we'll be talking about how do we jump through those paywalls. We're going to break it down into kind of the theoretical approaches that academics have taken to this problem.
01:23
Then we're going to actually talk about practical solutions to how do we extract contact from pay castles, and then how do we remove any potential trader tracing or watermarking of that content. Before we start, we, of course, have this disclaimer where any particular tenses I may use, any particular tenses you may hear are not indicative of any kind of injunction to action.
01:44
We're all operating on a purely imaginative framework. With that in mind, let's look at the theoretical overtures that how academics have grappled with the problem of paywalls. This guy, Gary Hall, in 2009 proposed what we'll term a legalist approach to dealing with paywalls.
02:06
Here you could see some of the approaches that he proposed, such as asking for permission to share an article after it's been published, or adopting a don't ask, don't tell policy where if you publish an article in a pay journal, you could then kind of, on the slide, put it online yourself.
02:24
This last point is particularly interesting to me, and what we're going to be focusing on throughout the talk is, if we can put articles online for free, why should that permissibility be restricted to the figure of the author? Or in other words, why do we need to repeat the so-called dry chance of the legalist?
02:42
In other words, why do we need to engage in legal discourse when we talk about unethical acts? And so onwards, later on, Strifras and McCleary developed this policy that any strategy, as you see up here for contesting the law, should proceed through more than just legal channels. And so that's what we'll be exploring today,
03:02
is extralegal or non-legalist modes of intervention in the copy fight. And then finally, I'm assuming most of you guys are familiar with Aaron Swartz and his Guerrilla Open Access Manifesto, which we'll be more or less adhering to today. And finally, a final note that this is not a talk in defense of copyleft,
03:22
as these talks also often are. Instead, today when we jump the paywall, we view copyleft as, in fact, a much more malignant enemy than traditional copyright. The reason for this is because copyleft presents a sense of acceptability. It makes copyright palatable. This seems contradictory.
03:41
The problem is that copyleft does not question intellectual property itself. It merely changes the directive from thou shalt not, the traditional copyright injunction, to thou shalt, the foul permissibility of copyleft. What it doesn't then question is, why should anyone dictate that thou in the first place?
04:02
So what we're going to be doing today is challenging the notion of intellectual property at its core. Copyleft does not do that. In fact, copyleft entrenches it all the further. And finally, things like open access, which are, again, fundamentally reformist, nonsensical injunctures to, again, propagate intellectual property,
04:21
because here we see some of you may be familiar with PLOS, the Public Library of Science, and on the one hand they claim that they stand for unrestricted access and unrestricted reuse, but in the next paragraph in their mission statement they say that they apply the Creative Commons attribution license. Licensing, of course, is inherently a restriction.
04:41
The particular terms of the license do not matter. What matters is that someone feels they have the right to set a license in the first place, and that is what we're fighting against today. And thus we reject copyleft, we reject open access, and we embrace the copy fight. So that was kind of the theoretical standpoint
05:00
that I'll be coming from. Now let's talk about how do we actually liberate content. There's a set of rules to follow. The first is always be pirating. Always steal books from the library. Never check those books out. The question is why, right? We've all been brought up to be good citizens
05:23
who rent books from the library, return them on time, don't pay our fines. What this does is creates a convenient tracking database which can then of course be correlated to your online distribution activities. So let's say you're fond of a particular forensics journal that a local community college library has, and you always borrow it and return it on time,
05:42
but while you're doing that you also scan a copy and post it somewhere else. Let's say Elsevier, one of the owners of the journals, then decides to start checking library records, and oh, who checked out this particular journal and these particular issues which then went online. Now of course you may be thinking, but other sites may be keeping records too.
06:02
The difference is that your library record is usually, unless you took the precautions to use false identification to register, is linked to your real identity and can lead to source neutralization. That's one of the main problems that we'll be talking about today through particular case studies is source neutralization, which is when the adversary neutralizes
06:22
the source of content, right? In other words, by shutting them down, by sending a lawsuit by arrest, sometimes through grimmer circumstances such as suicide. So like you said, don't use the library unless you have to, and if you do have to, always steal from the library. Going further then into alternative digital vectors,
06:42
there's of course Library Genesis, the current mirror is .io, though if you made me not know, the actual URL bounces around in a round robin sequence, so it may change to .biz, .org, and so on. So exactly how big is Library Genesis? Well, most recent studies show that it contains
07:00
38 million academic articles totaling 28 terabytes, which constitutes 36% of all indexed academic articles that have a DOI or a document object identifier, which is kind of like an ISBN for journal articles. So that's one of our main resources. A related resource is SciHub.
07:23
SciHub is a round robin sequence of .edu and .ac, .uk proxies, which you feed an article DOI or other URI like the URL, and then you could go and get the article. So basically what SciHub does is what we've been doing throughout the 90s where you go and you find public .edu proxies that have access
07:43
to particular journal subscriptions, and then you write a basic scraper to collect all the content and distribute it otherwise. SciHub automates this task by automatically mirroring any particular article that you access on the LibGen archives. So these are the two main resources that we should use
08:01
in lieu of dangerous physical needs-based libraries. There's also a growth lately in crowd-sourced resources. The subreddit scholar has a request section and a fulfillment section where you can post the DOI that you want and someone can find it. On Twitter, there's also the recent hashtag ICanHasPDF where you make a tweet with, again,
08:21
the DOI or the URL of the article you want, and anyone who has access will send you the link. And there's a couple more or less obvious resources that we nonetheless should not overlook, such as Google Scholar, which oftentimes leads to open versions of articles that are otherwise paywalled on other mirrors, and then you should also
08:41
always check the personal pages of any particular author because sometimes they put articles online there. And again, going back to the dreaded library, if everything else fails and you have access to a university, go and try to procure the article from there, but be sure to use open login terminals, okay? Some of these may be non-obvious.
09:02
For instance, if you're faced with just a basic catalog screen, something's tapping something like the Windows key and then right-clicking, going to desktop, and you can basically escalate privileges to obtain access where you don't require registration to view articles in an educational setting. And so now what we should talk about is this last resort, right?
09:22
Let's say that we can't find what we're looking for online. We actually have to trek out to a local, at least Wi-Fi hotspot that has EDU access. We need to practice good operational security when we're actually on adversarial territory, okay? Some of you may be familiar with the case of Aaron Swartz.
09:42
If not, so Aaron was essentially arrested for downloading a few articles from JSTOR, which is a particularly popular academic database, from a server rack at MIT. What Aaron did was he went to a particular server closet over and over again, plugged in his own hard drives, and then liberated a few million articles.
10:01
What led to Aaron's arrest was that he went back to the same closet, right? So in other words, the admins noticed regular activity coming out of this random server rack, and then they set up CCTV surveillance in that space, okay? So the first rule, as always, is to never return to the same feeding hole, right?
10:23
To always pick a different source if you're practicing actual ops, like in the vicinity. Another particular item to keep in mind is do not create any record of your existence. If the particular facility that you're accessing requires swipe-through or smart card access,
10:40
you could try to social engineer access into the facility by, for instance, taping black electrical tape over the mag stripe and then complaining to a security guard that your card just doesn't seem to work and they will more often than not just swipe you in. Okay, so by now, at this point, we should have procured some articles, right?
11:01
Before we actually start sharing them, we now need to engage in content defaming or removing all of the actual poison that the venomous publishers inject into these articles before we can share them to, again, prevent target neutralization. So there's three basic types of so-called bad things
11:23
that publishers can put in articles. Content protection, watermarks, and metadata. So let's go through how can we potentially deal with each one. So let's start with content protection. So content protection is very basically stuff that prevents you from doing stuff to the article.
11:41
Sometimes things like copying it, like printing it, or reading it. And again, there's very, very many easy-to-use tools that we could deploy to defeat content protection. One such tool is the advanced PDF password recovery pro, which can also brute force passwords to PDFs if they're not just content protected,
12:00
but also password protected. And again, this would work for very basic protection, for more advanced protection, such as Adobe's more recent lifecycle program, which requires connecting to a server in order to get a temporary certificate to view the article. What we can, in fact, do is spoof the server to localhost, and I'm not going to go through that because there are existent, very detailed guides.
12:21
The point to take away is that this is very easily done if you just look up how to do it. So that was the content protection. That's usually the overt form of content fanging, or in other words, of protecting content. It's very obvious when you cannot copy a particular article. A much more nefarious latent form is usually watermarks.
12:42
Watermarks function by, once again, the content protection being embedded into the actual article. And these can be things like marginal notes, like an article would say that this was downloaded from wherever, at whatever time, from whatever IP. So this will be the first kind of watermark that we're looking at, and again, this is relatively straightforward.
13:02
This is things that you could see in the marginalia of an article. So let's get rid of it. A basic tool to use would be, on first glance, BRIS. BRIS is a cropping tool where you can input a PDF, select the margins, and crop out the potential watermark. So here we have a censored article,
13:22
where on the left we have before BRIS, the sensible watermark marginalia on the left-hand side, and then after, seemingly without it. The problem, though, is that BRIS performs what is in fact known as a non-destructive crop. This means all it does is adjusts the actual margins. It does not delete the content.
13:41
So if you may even download BRIS and crop out the margins, forensics examiners will still be able to retrieve the watermark that will be outside the printable margins, but will still be embedded in the PDF. Instead, what is necessary to do then is entirely reprint the article, not simply recrop it, and select the margins within the printer parameters.
14:01
Once we do that, we find that actually printing it gets rid of the marginal watermarks more than BRIS does. So that was the very basic kind of watermarking that we can encounter. There are other much more sneaky watermarks that publishers may potentially put in. The first is known as natural language watermarking,
14:22
or NLW. The way that NLW functions is instead of adding extraneous information into the article, it modifies the actual syntax of the article itself. So a very basic example you see up here would be one iteration of an article would say, I ate a green cupcake yesterday, another one would say, yesterday I ate a cupcake that was green,
14:42
or yesterday I ate a green cupcake, and so on and so forth. Once any given number of sentences are modified, the particular tracking algorithm can then deduce which version of the article was watermarked, or which source it came from. And of course, the flip side of this is that this is very trivial to defeat, right? We're performing a simple difference analysis
15:03
between two copies of the article. There's then a potential third kind of watermarking that we should be conscious of as well, which is spatial watermarking. The way that this functions is modifying the actual spacing between sentences, between words within the sentence, between lines, between pages, between page numbers, and so on.
15:21
And again, the good thing is that once we get rid of the content protection, this is again very trivial to remove by dumping most of the article into plain text, which will get rid of particular spacing minutia preserved in PDF files. Finally, the third kind of component that publishers often use to track you is metadata.
15:41
So metadata is again basically data about data. In our instances, it's things like who the article's author is, the time that it was generated, the particular time zone it was generated in, the mysterious UUID field, which we'll talk about in a second. So if you're using something like Adobe Acrobat, they again have ostensibly a metadata scrubbing tool built in.
16:03
And here on the screenshot, you can see they claim this will discard document information and metadata. This is what is known as a lie. If we open the metadata of a PDF that has been scrubbed with Acrobat's own scrubber, we find that it still has UUID parameters,
16:23
which we can view if we open it in a basic hex editor. These are again a formative list of bad things. And remember, our goal is to get rid of bad things to share the good thing, which is the knowledge. So what is this particular unique user ID? Adobe's XMP specifications, which lay out the metadata that Acrobat uses,
16:42
don't actually conveniently tell us what it is. They say that that's up to the printer. The PDF printer can set its own UUID parameters. Best practices in the RFC specs that are there dictate that this should be at least partially a random number generator, but earlier versions of UUID used the MAC address.
17:01
In fact, this is how, for instance, the author of the Melissa virus eventually got caught, was that the UUID used to spread the Melissa virus on some Word documents matched some other random files that someone had uploaded online at one point, which turned out to be the friend of the guy who wrote the virus. In other words, the UUID is dangerous.
17:21
Adobe's specifications did not dictate that you needed to use the latest UUID implementation, which is a random number generator. So in theory, any potential PDF printer that you use could be using UUIDs that will, again, allow traitor tracing. So in other words, they need to be taken care of. If you're editing your document in something like Adobe Acrobat,
17:40
they will not be taken care of, even if you select the script tool, which means that to go back, you need to go into the document in a hex viewer and actually remove or modify the parameters there. And of course, this was all talking about if we want to modify potential metadata. We would open it in a hex editor, change the time zone, for instance.
18:01
If we simply wanted to wipe the data, in other words, we didn't want to insert spoof data, but we wanted to simply erase it all, we could use a very easy tool known as the Metadata Anonymization Toolkit, where you feed an input and it produces a cleaned output. The problem with simple wiping, of course, is that then your adversary will know that the data was erased.
18:20
In other words, they will know that you are privy to the modifications. So if you have the actual time to go in and start modifying values instead of just erasing them, that will potentially lead the adversarial down a goose chase rabbit hole. So at this point, what we've done so far is
18:40
we've found sources where we can procure articles. We've discussed how we could remove protection from the articles. Now, how can we finally share them? The first kind of the very fundamental principle would be not to use your own IP, not to use the IP of any university you may be affiliated with, and to use Tor, but, of course, not to use Tor from your university network,
19:04
because then it would, of course, be obvious if you're the only person using Tor at that given time and there's a Tor upload that matches that timestamp. So in other words, not to use your network entirely. The second thing to do is to wait. So let's say you purchase a book from Amazon at 5 o'clock on Friday
19:22
and then you put it on LibGen at 5 o'1 on Friday, and let's say you do this over and over again. Amazon may very easily then conduct time correlation attacks because LibGen, of course, preserves the file upload date and time. So the second thing to do, other than not using your own network, is to wait before you upload stuff
19:41
to be able to spoof file correlation attacks, and you may further be able to spoof these by, again, modifying the data within your document. So if you downloaded something on Wednesday the 5th, you could change the metadata to say download on Tuesday the 4th or even earlier, or potentially even later. And then finally you could use various file hosting solutions
20:04
which are more or less friendly to the type of content that we want to share. Some of these are the following. And that's pretty much it. Now what we'll do is open it up to questions, but the last thing I want to say is remember that this is serious business.
20:20
This is why we started off with a formal disclaimer, is because people are getting arrested for effectively simply sharing information. So in other words, be safe and be careful. When you guys do this, and remember, we are at war at the time that I'm speaking right now. Elsevier, one of the biggest publishers, has filed a John Doe lawsuit in New York against Sci-Hop.
20:43
LibGen is also under attack in that, for instance, the High Court has recently blocked it in the United Kingdom. So these are very serious issues. I may have addressed some of them glibly as a way of getting them across, but keep in mind this is very serious business. Thank you guys, and if you have any questions.
21:04
Okay, thank you very much. We do have time for questions. I would ask everybody to please line up at the mics. We do have a question already. Please go ahead. I do have a question. Your injunction to steal books from the library is very strange.
21:26
In particular, it violates most ethical positions in the Golden Rule, and it ignores the fact that librarians are very protective of patron privacy, both on a historical basis and in individual cases.
21:40
If you heard Brewster's talk a couple days ago, he talked about the National Security Letter, which the Internet Archive got and fought, dropped. Many libraries have a long tradition of resisting law enforcement demands for patron records. So I think you're deeply misguided in suggesting that people steal books from libraries.
22:02
I'm sorry, I'm deeply what? Misguided. Misguided, okay. So that wasn't really a question, I suppose, but I will respond to it in kind. Anyway, to address your first point about the fact that many librarians are protective of user privacy, librarians can be served with orders where they're not allowed to state
22:21
that they have received orders to turn over loaner records. That's a fundamental fact of at least U.S. law. However, even if that were not a fact, putting trust in another entity increases the entropy. If you're trusting the librarian not to hand over the records, that bridge does not need to be there if you simply take the book in the first place.
22:42
In other words, you are pointing yourself needlessly at risk. Can I point out that you can also read the book in the library and not borrow it and create no record of the book. Or you can photocopy it with your phone without any record of your being there at the library
23:01
and not deprive other people of this library resource, the shared resource. Okay, let's back up and take your point one by point. Yes, you can read a book in the library, assuming that you have physical access to the library. What we are fighting for is making knowledge globally accessible to people who do not have the privilege to be in a particular building.
23:22
Second of all, to address your second point about taking a photo or photocopying or taking a photo with your phone, yes, you can do that if you want very crappy low resolution images. If you didn't want to take the book from the library, you could, for instance, a more prudent solution would be to use their fancy scanners. But going even further than that, you seem to be assuming that in the action of taking a book from the library,
23:45
the book would have otherwise not been taken out. But what of traditional patrons? They took books out from the library. The difference is that when we do it, we put the books online for anyone to then see and then we can bring them back to the library as opposed to a general patron who takes the book out,
24:00
reads them for themselves, a fundamentally selfish act, and then brings them back. So I don't particularly see the problem unless you're assuming that we won't put the books back when we take them or that we won't put the books online, which are the two fundamental imperatives of this mission. Yes. In order to steal a book, you need physical access to the library. Instead of stealing the book, you could read the book, you could photocopy the book.
24:23
The quality of the book, the quality of the photocopy that you make is unlikely to be noticeably different for usability. Of course, you could steal the book and deny it. I don't even know how to go into this. I'm sorry. I think we do have some more questions. Thank you anyway.
24:41
Please go ahead over here. If you have a massive quantity of PDFs you want to change the metadata on, do you have any recommendations for tools to batch process? The question is whether your intention is to actually modify the metadata or simply to wipe it. If you simply want to wipe it, that's a lot easier to do where you could simply batch process
25:04
using the metadata anonymization toolkit, which can batch process and wipe out the data. If you actually want to go through and spoof the data in every single one, at this moment there's not yet, unfortunately, any tools available to automate that. They're being worked on vaguely in people's free time, but unfortunately I don't know of any to do mass scale spoofing at this point.
25:26
Okay, thank you very much. We have another question on the other side of the room. You mentioned two potential problems to solve, the thing with the green cupcake and the thing with the spacing. Have you seen either of those problems in the wild or have you heard of it?
25:43
Yeah, that's actually a very good question. I've looked at, I think, something by now, something like 20 major publishers of articles. None of them use these systems presently, but these systems do exist, so my assumption is it's only a matter of time before they're widely adopted,
26:00
but that's a very good question. In the wild, like I said, by now, in the side that says seven, by now I've done more like 20. I have not actually found that in the wild anywhere yet, so these are at this point only theoretical attacks and counter attacks, but I think it's good that we start thinking about them earlier, as long as they also don't scare us into inaction, but prompt us to your action to removing them.
26:22
Okay, thank you very much. Thank you. Are there any more questions in this aisle? Please get up to the microphone if you can. Right here in, yes, okay.
26:42
Please go ahead. It's a bit off the original purpose of the presentation, but in terms of complementing this data liberation strategy with a strategy that also embraces the authors of the publications that are to be liberated themselves,
27:02
I think a lot has been done historically for quite a while with the social convention of a pre-publication or pre-formatted copy. There is a law in the UK that requires researchers to do that, and I believe to some extent you can put it on your university website,
27:27
and Academia.edu and ResearchGate are, I guess, two platforms that are helping to get around it. I'm also just a bit concerned that maybe the guerrilla tactics on the one side
27:41
are not possibly going to win the favor of authors. It's different when you come to journals. I mean, I don't think many people get pissed off when journals are pirated. I'm sorry, we are very short on time. Do you think you can answer that? Just very briefly to address your question,
28:01
there are things like Academia.edu and ResearchGate, which are, again, legalist modes of praxis where authors can voluntarily put content online, and we should absolutely use those, but kind of my point today is that we shouldn't limit ourselves to simply those kinds of legalist modes of attack. In other words, we should certainly use the authors if they're willing to join us,
28:21
but we shouldn't restrain ourselves to their consent. Okay, great. So, I think we have time for one very quick question, please. I simply wanted to add to the previous speaker that I read a lot of academic literature, and US universities normally put things online nowadays,
28:42
on the open Internet, whereas in Germany the problem is very strong, and publishers don't allow you to put anything on the Internet if you want to publish it in a magazine, so I think this problem should be addressed at a future conference. Thank you. Thank you.
29:00
That was more like an annotation than a question, so I think we're good, right? Sorry? I think we're good? Yeah, I think we're good. Then thank you very much, Tom Harding, for a great talk.