Papis: a simple, powerful and extendable command-line bibliography manager
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 542 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/61449 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2023329 / 542
2
5
10
14
15
16
22
24
27
29
31
36
43
48
56
63
74
78
83
87
89
95
96
99
104
106
107
117
119
121
122
125
126
128
130
132
134
135
136
141
143
146
148
152
155
157
159
161
165
166
168
170
173
176
180
181
185
191
194
196
197
198
199
206
207
209
210
211
212
216
219
220
227
228
229
231
232
233
236
250
252
256
258
260
263
264
267
271
273
275
276
278
282
286
292
293
298
299
300
302
312
316
321
322
324
339
341
342
343
344
351
352
354
355
356
357
359
369
370
372
373
376
378
379
380
382
383
387
390
394
395
401
405
406
410
411
413
415
416
421
426
430
437
438
440
441
443
444
445
446
448
449
450
451
458
464
468
472
475
476
479
481
493
494
498
499
502
509
513
516
517
520
522
524
525
531
534
535
537
538
541
00:00
PhysicistSupercomputerGraphics processing unitQuantumPoint (geometry)View (database)Paralleler AlgorithmusData managementProjective planeOrder (biology)CalculationExtension (kinesiology)Solid geometryTheoryCASE <Informatik>PhysicistCategory of beingUniverse (mathematics)MassWeb 2.0Computer animationMeeting/Interview
00:51
CodePartial derivativeBusiness modelDatabaseTask (computing)Complex (psychology)Interface (computing)Extension (kinesiology)Data structureSource codeComputabilityNumberTuring testComputer fileHill differential equationInformationConfiguration spaceLibrary (computing)Sheaf (mathematics)Text editorEvent horizonPlug-in (computing)Set (mathematics)Library (computing)Web applicationInstance (computer science)Game controllerData structureSheaf (mathematics)Form (programming)MathematicsVideo gameTask (computing)Computer fileVariety (linguistics)InformationWebsiteProgramming languageData conversionSpacetimeUsabilityFile formatElectronic mailing listOpen setFunctional (mathematics)Core dumpKey (cryptography)CASE <Informatik>Source codeConfiguration spaceAdditionScaling (geometry)Scripting languageEntire functionContent (media)Type theoryData managementTheory of relativitySign (mathematics)Data modelCodePartial derivativeMultiplication signReading (process)State of matterDefault (computer science)Formal languageOrder (biology)Probability density functionFile viewerMetadataTuring testSingle-precision floating-point formatAsynchronous Transfer ModeDirectory serviceRevision controlTotal S.A.
07:31
User interfaceComputer configurationDatabaseInformationLibrary (computing)String (computer science)File formatGUI widgetEquals signEmailDefault (computer science)Instance (computer science)VideoconferencingDemo (music)PhysicsTablet computerFood energyFunctional (mathematics)Cross-correlationDigital object identifierNormal (geometry)Frame problemAbstractionIndependence (probability theory)Maxima and minimaTerm (mathematics)Personal digital assistantInformation privacySet (mathematics)Functional (mathematics)Web pageDescriptive statisticsConfiguration spaceDataflowInformationWeb browserCASE <Informatik>Structural loadFlagClassical physicsLibrary (computing)Text editorComputer fileMetreView (database)Source codeMetadataDigital object identifierRadical (chemistry)Uniform resource locatorProbability density functionSource codeJSONXML
09:30
Personal digital assistantInformationCross-correlationFood energyTerm (mathematics)Extension (kinesiology)Web pageConfiguration spaceSource codeComputer fileProgrammable read-only memoryMetadataProbability density functionHeegaard splittingSource codeXML
09:56
Normal (geometry)NumberVolumeVector potentialTelecommunicationUniform convergenceStandard deviationUniformer RaumTerm (mathematics)Regulärer Ausdruck <Textverarbeitung>Food energyTheoryMereologyView (database)ExistenceMaxima and minimaIndependence (probability theory)Extension (kinesiology)Finitary relationPersonal digital assistantCore dumpEmailExecution unitInclusion mapMenu (computing)Normed vector spacePlane (geometry)HoaxMathematicsColor managementSimulationNumbering schemePhysicsDynamic random-access memoryModal logicComputer fileWeb pageReal numberPoint (geometry)Computer fileElectronic mailing listRevision controlFlagAuthorizationProbability density functionInformationWeb pageVolume (thermodynamics)Source code
10:41
View (database)Bloch waveDean numberDemonData structureZeitdilatationFood energyMenu (computing)Uniform convergenceLibrary (computing)VolumeTerm (mathematics)Extension (kinesiology)Numbering schemeInformationComputer fileMatching (graph theory)Personal digital assistantQuery languageComputer configurationQuicksortGroup actionMessage passingForceWorkstation <Musikinstrument>Web pageDatabaseOpen setDirectory serviceChi-squared distributionPhysicsProgrammable read-only memoryLibrary (computing)FlagInformationElement (mathematics)Computer fileDirectory serviceElectronic mailing listFile formatScripting languageWeb applicationSource code
12:24
Demo (music)Scripting languageLibrary (computing)InformationComplex (psychology)Scripting languageDemo (music)Web applicationTelecommunicationComputer animation
12:46
World Wide Web ConsortiumMedical imagingParameter (computer programming)Library (computing)CASE <Informatik>Configuration spaceComputer fileElectronic mailing listPoint (geometry)Interface (computing)Directory serviceGroup actionModule (mathematics)Loop (music)Data dictionarySystem callQuery languageScripting languageFunctional (mathematics)AuthorizationString (computer science)FlagOpen setMessage passingAlpha (investment)Computer animation
15:08
Demo (music)Sign (mathematics)Message passingWeb applicationWeb 2.0Uniform resource locatorService (economics)Message passingWeb browserQuery languageWeb pageProgrammable read-only memorySystem callComputer animation
15:45
Query languageLibrary (computing)Digital filterPhysicsTheoryClassical electromagnetismFiber (mathematics)QuantumMaizeLibrary (computing)Different (Kate Ryan album)Programmable read-only memoryWeb pageResultantAuthorizationQuery languageInstance (computer science)Block (periodic table)Link (knot theory)InformationFlow separationElectronic mailing listRight angleMultiplication signLine (geometry)Computer animation
16:46
Computer-generated imageryClassical electromagnetismLink (knot theory)PhysicsFirefox <Programm>Fiber (mathematics)TheoryQuantumWeb pageLoginFreewareAsymmetryDigital filterLibrary (computing)Business modelCorrelation and dependencePrinciple of localityAuto mechanicSupercomputerTensorProcess (computing)Maß <Mathematik>MathematicsDirac equationPresentation of a groupResultantComputer animation
17:26
Digital filterLibrary (computing)Business modelCorrelation and dependenceMaß <Mathematik>QuantumSupercomputerTensorProcess (computing)MathematicsAuto mechanicPrinciple of localityDirac equationWeb pagePhysicsForm (programming)InformationVolumeLink (knot theory)Computer fileFunction (mathematics)Multiplication signComputer fileInformationContent (media)BitLibrary (computing)Arrow of timeBit rateWeb pageWeb browserRaw image formatForm (programming)Probability density functionComputer animation
18:24
Open setCoefficientInvariant (mathematics)Kernel (computing)Term (mathematics)Function (mathematics)Formal grammarThermal expansionNichtlineares GleichungssystemPopulation densityForm (programming)InformationFunctional (mathematics)Particle systemDensity matrixCross-correlationMatrix (mathematics)Kinetische GastheorieFood energyUniform convergenceTelecommunicationLinear mapRegulärer Ausdruck <Textverarbeitung>WebsiteClassical physicsUniqueness quantificationFirefox <Programm>IntegerDistribution (mathematics)Maxima and minimaLogical constantCategory of beingVector potentialCoefficient of determinationRange (statistics)Execution unitPhysical systemTransformation (genetics)AdditionConsistencyProof theoryCalculus of variationsBusiness modelCorrelation and dependenceMathematicsSystem programmingPhysicsCalculationAtomic numberDirac equationWindowResultantLibrary (computing)Service (economics)Web pageXMLComputer animation
19:36
Business modelCorrelation and dependenceMaxima and minimaSoftware developerReading (process)Gamma functionLatent heatPopulation densityNichtlineares GleichungssystemTheoryCross-correlationSystem programmingElement (mathematics)PhysicsE-textSource codeCalculationFood energyForm (programming)Atomic numberPressureView (database)AbstractionVolumeContent (media)Similarity (geometry)Field (computer science)WebsiteDirac equationRhombusFunction (mathematics)InformationCASE <Informatik>Web pageComputer fileDefault (computer science)Automatic differentiationWeb browserInterface (computing)Library (computing)Computer animation
20:14
Digital filterLibrary (computing)MathematicsPhysicsNichtlineares GleichungssystemCalculationMusical ensembleTheoryData structurePerformance appraisalGibbs-samplingINTEGRALPhase transitionDistribution (mathematics)VideoconferencingLibrary (computing)Interface (computing)Projective planePresentation of a groupInformationWeb pageSource codeComputer animation
21:10
Inheritance (object-oriented programming)Service-oriented architectureVideo gameComputer clusterArmComputer animationProgram flowchartMeeting/Interview
22:06
Projective planePoint (geometry)Video gameLine (geometry)PlanningMultiplication signFlagInterface (computing)Plug-in (computing)Computer animationMeeting/Interview
22:45
Digital object identifierFlagCASE <Informatik>Pattern recognitionUniform resource locatorWeb pageMeeting/Interview
23:42
Connected spaceError messageSoftware testingComputer animationMeeting/Interview
24:11
BitMetadataSource codeComputer-assisted translationProjective planeUniform resource locatorVideoconferencingPhysical lawData centerWeb applicationMeeting/InterviewComputer animation
24:59
File formatOnline chatKey (cryptography)WordInformationComputer fileContent (media)Meeting/InterviewComputer animation
25:48
Term (mathematics)PhysicalismArchaeological field surveyNeuroinformatikPlanningMetadataRepresentation (politics)DatabaseMathematics1 (number)BitWeb applicationUniform resource locatorComputer scienceInstance (computer science)Doubling the cubeWave packetSoftware testingWeb 2.0Data storage deviceMeeting/InterviewComputer animation
27:05
Wave packetBitOnline chatMultiplication sign2 (number)Meeting/InterviewComputer animation
27:58
2 (number)Multiplication signSoftwareVisualization (computer graphics)Network topologyFeedbackPoint (geometry)Social classMeeting/InterviewComputer animation
28:41
2 (number)Streaming mediaWeb 2.0Server (computing)Process (computing)Electronic mailing listMeeting/InterviewComputer animation
29:05
Program flowchart
Transcript: English(auto-generated)
00:05
Hello FOSDEM, my name is Alejandro and today I am going to talk about PAPIS, a simple, powerful and extendable command line bibliography manager that I have been developing during the last 7 years. I will be explaining some of the main considerations of the project and demoing
00:24
some of its basic use cases. First of all let me introduce myself. I work currently as a physicist at the Technical University of Vienna in Austria. We develop massively parallel algorithms in order to calculate properties of molecules
00:41
and solids from a theoretical point of view. You can find me on Mastodon or around the web, don't hesitate to contact me. So, what is PAPIS? PAPIS started as a simple bibliography manager built around the command line.
01:02
It should make possible to manage papers or books at scale or for small curated libraries. It is therefore important to implement a simple data model and use an approachable programming language such as Python so that users can interact easily with PAPIS many features.
01:27
In addition, Python also encourages contributions from researchers in the academic world since nowadays many researchers are exposed to this language.
01:41
PAPIS strives to be and build a community. And various plugins have appeared thanks to the community. There are plugins for the major text editors such as NeoVim and Emacs and partial support exists for VS Code and Vim.
02:04
Additionally, lately we have been working on the web application for PAPIS and I will be showing some of its features in this talk. But you are asking yourself, why PAPIS? We think that it should be possible and simple to perform complex tasks on a whole library.
02:26
This is made possible through a rich command line interface. You can add papers from a DOI or from a variety of websites supported by PAPIS. You can explore sources like Crossref from the command line, or download information
02:44
about the citations of a publication, or check which publications cite the current publication. You can take notes that play well with tools like Vim or Emacs Org Mode. You can version control your documents and export to the most common formats.
03:04
You can spend countless hours curating and improving your library's notes, metadata and PDF documents without fearing losing your data to an API change or end of life of PAPIS since your data is stored in a very simple but flexible format.
03:27
I want to emphasize the fact that one of the main goals of PAPIS is enabling the user to be independent of PAPIS itself. A researcher, academic or not, spends an enormous amount of time searching, reading and annotating
03:45
publications. For us PAPIS maintainers, it is important that a person comfortable with any scripting language should be able to retrieve the totality of PAPIS data by writing a script
04:00
in an afternoon. In order to accomplish this, an extremely simple library structure was chosen. The library structure relies on having one folder per library document. This means, for instance, in the case of the shown publication of Turing, the folder
04:23
includes a YAML file containing the metadata information of the publication and an additional PDF file with the published publication itself. In this example library we would have an additional document under the folder 1-document where
04:43
we find two PDF files in this case. A document in a PAPIS library is any folder containing a YAML file entitled info.yaml. The contents of the YAML file are in principle up to the users to determine, however in
05:04
practice there are some conventions used in PAPIS. Inside the info.yaml file, the key files contains a list of related files in the documents directory. These files might be PDF files or any other kind of files relevant to the document.
05:28
In the case of the Turing publication, files therefore lists a single PDF document, paper.pdf. The key ref is used for exploring bibtex files and is the reference of the document when
05:46
using bibliographic tools outside of PAPIS. The YAML key type is also used for bibtex exploring and is the type of document, whether a book, an article, a monograph, etc.
06:02
There is also an inbuilt support for tags, which may be added as a list of space-separated keywords. We chose the YAML format due to its ease of writing, reading and because most programming languages are provided with libraries that can read these files.
06:21
Of course, given the simplicity of the library model, it is possible to write a crude finder with just a Unix grep and find commands. All functionalities in PAPIS can be customized through a configuration file in the .ini format. PAPIS can define multiple libraries through the configuration file and all PAPIS settings
06:46
can be independently configured for each library. You can define default settings under the settings section, which will be common to all libraries. A library is simply defined as a section with a dir key which contains a path to the library
07:05
directory containing all documents. You can then customize this library, in this case a library named papers, and set the default opener tool to the PDF viewer events.
07:20
If you happen to want an additional library of books holding mostly EPUB formatted books, you could define the opener to be caliber instead. You can read about all the configuration settings in the documentation page, where you will see a description of their function and their default values.
07:42
With this introduction, let us take a look now at a common workflow to add an article from a journal page. Here is a common view of an article in a browser. We can see lots of information and the easiest way of adding this article to PAPIS will
08:02
be by locating the DOI of the article in the page. In this case, we locate the DOI in the URL of the article and we copy it to our clipboard to paste it in the terminal. The command for adding a paper is PAPIS-add, and PAPIS-add comes with quite many options.
08:26
In general, when adding a document, PAPIS will try to download metadata from various sources and, if possible, download PDF documents, if they are freely and legally available.
08:42
In here, we see that I am using the edit flag. This flag instructs the PAPIS-add command to open the editor with the info.yaml file before adding the document to the library. Similarly, the open flag instructs the command to open the attached files, if any, before
09:03
adding the document to the library. We are also telling the command through the from flag to retrieve information exclusively from the DOI. We can also preset some metadata through the command line. In this case, we are adding the tags, classics and DFT.
09:27
Let's go ahead and run the command. PAPIS will now try to download metadata and a PDF file from online sources. In the current configuration, we are greeted with an interactive prompt to add, split
09:45
or reject the metadata retrieved from Crossref. We choose to accept the metadata. The interactive session now shows us a retrieved PDF document and asks us if this is the document
10:01
that belongs to the publication. At this point, we can inspect the document and we realize that we indeed want this PDF file, so we press Y. Now all the information is in place and we can see a preliminary version of the info
10:20
file since we passed the edit flag. We can see that a lot of information could be retrieved, detailed author list information, volume, pages, among others, and our tags have found their way into the YAML file correctly.
10:43
A confirmation prompt subsequently appears since we passed the confirm flag to the command. We agree to it and therefore the document gets added to the library. We can now fetch information about the publication cited in this article.
11:01
The command for this is citations and we pass to it the fetch citations flag which first checks for information in our library and then heads to Crossref to retrieve relevant information about the references appearing in our newly added document.
11:21
If we now open the directory where the document has been stored, we see that the PDF file has been correctly stored alongside the info.yaml file and the newly generated citations.yaml file.
12:02
If we inspect the citations file, we see that it is in the format of the list of YAML files where every element separated by three dashes represents bibliographic information about the citations. This can be used for scripting, for browsing the citations, or for easily visualizing
12:24
them through the web application. This demo will show how to leverage the Puppy's API in Python to write one of the simplest scripts you can write. You can find more information in the documentation together with other more complex example scripts.
12:45
First of all, let us add a bigger library to our demo library. For this, we need to edit the configuration file and add an additional library. After adding the library, we can list the directories with the list command, which shows
13:06
us the interactive interface to select documents. Most Puppy's commands accept a query argument as an input. In this case, we can query for documents matching the author to include the string
13:21
Einstein. We can also use the all flag to do a Puppy's action to all documents matching the query. In this case, listing the full paths for the folders. Other commands like open, edit, or update work in a similar fashion.
13:47
Next, we will write a simple Python script to scan all the documents in the library and add the tag to the document whenever the substring this appears in the title of the document.
14:02
To do this, we can use the Puppy's API submodule and we can obtain all documents in the current library with the function getAllDocumentsInLib. Next, we loop over all documents and we deal with the document as if it were a Python
14:21
dictionary.
14:48
The method safe, saves the document. I will comment out the safe call since I don't want it to overwrite the library.
15:02
Let's run the script and see that it works. And indeed, it works. The last demonstration will concern the web application. The web application is quite useful if you would like to self-host PEPIS or access it from a portable device.
15:22
We can run the web application using the serve command to which we can pass a port 8888. Directing our browser to the URL localhost colon 8888, we see the starting page of
15:43
the web application where we are presented with a simple query prompt. Other pages include listing all the documents in the library, listing all the tags and browsing a different library.
16:00
Let us again enter the author Einstein query into the prompt. The result page includes a handy timeline with the results of the query and a simple multiline list of the results. In this timeline, we can see for instance directly the Annus Mirabilis of Einstein
16:21
together with a couple of other publications further right. We could click on the title of the timeline and go to the respective document page. In the results for the document, we see a left block with some basic information and the PDF links.
16:41
On the right-hand side, we see the citation references and several external links for the document.
17:08
Let us look for the first paper we added at the beginning of this presentation. It is worth noting that we can click on the tags of the documents to get the results for the given tags.
17:36
If we click on the arrow, we will navigate into the document page. The red notifications advise us of small problems with the data in our document.
17:46
However, I will not fix those now. The document page is a multi-tab page where the first tab presents most of the information of the document in an HTML form fashion.
18:02
Additionally, we have access to the raw info file where we can modify and override its contents. We have added a BIPDEX tab for LaTeX users. This document has a single file attached and we can preview it on the browser
18:22
thanks to the library PDF JavaScript. We can also download the document or open the document in a new window. In the next tab, we can visualize the citations file that we generated previously.
18:41
This tab also has a timeline like the search results and the documents with the green reference indicate that these documents exist in our library and we can open them.
19:03
Let us open this article page.
19:26
For this article, we have also generated citations but we can also use the Harvard ADS service.
19:46
In the case of articles citing the current article, we have not generated this file and therefore we get an embedded page from ADS by default.
20:01
In the last tab, we can edit the notes from the browser. Furthermore, clicking on the tags and library pages, we can see how these
20:22
interfaces look like. Thank you very much for your attention. For further information, visit the projects page over at GitHub.
20:44
Of course, Papis is only alive because of its community. I'd like to thank all the users and contributors over the years. I would like to specially thank the co-maintainers of Papis, Alex Fickel and Julian Hauser for their hard work in the last year.
21:04
I hope you enjoyed the presentation and I'll be answering your questions shortly.
21:23
Fantastic, thank you so much for that really, actually quite interesting talk Alejandro. I really quite, I felt inspired and thinking, wow, can I run this like with my own publications as a way of collating stuff and also sharing it with the world.
21:40
It's always nice when you watch a talk and you immediately think, yes, I'm going to use this as well. So I have a few questions here. I think my first one perhaps might be, I historically have used Zotero. I had to think very carefully and not place a note for that. Is it easy for me to migrate if I was inclined to migrate from Zotero or other plugins?
22:10
Yes, so over the years quite a lot of people have developed some plugins for the interface of Zotero and Papis.
22:20
So you can export and create Papis libraries. But I am also aware of some people that actually use both. So they have a workflow to export this dynamically the whole time. So it is in principle compatible and there are a couple of projects that do this.
22:41
This is this coming from the community. Thanks. That's actually really appealing to me. Because I was watching the way you added with the DOI and I thought that was really cool. There's still like seven command line flags and I really like the button that says add this to Zotero.
23:00
Yeah, so the thing is, yeah, I should have maybe given some more examples of adding some documents. So in principle is also possible to add a document just by the URL. So there are some some automatic recognition in Papis.
23:24
So most URLs are recognized and it could in this case even rely for URL. Or within the HTML page.
23:44
I noticed that there is a lot to use. Sorry, your connection has gone just rough. We have implemented a lot of testing, testing.
24:09
Do we still have you now? Yes, that is much better. Could you maybe just repeat your last two or three sentences because it was just a bit hard to hear. So there are some, Zotero has implemented also quite a lot of metadata fetchers from many sources.
24:33
And there is also a project that tries to reuse these metadata fetchers from Zotero for their use in Papis.
24:44
And maybe also in the web application in Papis. So this might also happen in the future. But in general, it's much easier to add documents than what I showed in the video. Cool. Thanks. It's also really nice to hear how interoperable you all are.
25:03
So we have a couple more questions in the chat. So Paul says, does the YAML format follow bibliographic standards of any type? So we try to use most of the BIPTECH keywords when they are applicable.
25:25
And in general, the YAML format is really free to the user to use. So you might want to use a particular convention in your YAML files. But keywords are mostly motivated by BIPTECH.
25:44
That's the only one. It still sounds like there's some decent interoperability there, which is really nice. Celia asks, who are the main uses of Papis in terms of discipline, students, researchers, etc?
26:01
Well, I know that's a good question. A lot of biophysics and biosciences, so bioinformatics. I know quite a lot of people that use it. Physics, mathematics and computer science, I would say. These are the ones. But for instance, Julian is one commentator and he's a philosopher.
26:26
So it really helps if, of course, you have to be a little bit acquainted with the command line. Maybe, hopefully, through the web application in the future, this will change.
26:40
But it's in general people that really care about their libraries, the metadata that they have in their libraries. And they really want to have a very clear representation of their data. They don't want some abstruse database somewhere stored.
27:01
So they really want everything in plain text. Yes, I think you demonstrated so beautifully how accessible your own data is. And it's surprisingly rare. Do you have any trainings for PEPES so that people who maybe are a bit less confident could learn more about it?
27:21
Sadly, not right now. Maybe that's something, if enough people are interested, that's something that we could certainly look into. But we have the discussions in GitHub, so quite a lot of people ask questions there. So there are also frequently asked questions there.
27:41
And we have also a Zulip chat. Also, we are on Libera, but right now not so many people are there. And yeah, so just drop by and ask whatever you want. Super, thank you. So we have about 45 seconds left.
28:02
That's time to squeeze in one last question. So we have one here from Paul. And we've got some love for the timeline, which I agree. I was like, oh, I want that. Do you plan any other visualizations, like maybe publication networks from citation data? Yes, actually, yes, because I realize I really like these visualizations.
28:26
I plan some, like with the citations, some trees and stuff like this. But I would like to have more feedback from users to really know what's really sensible and useful.
28:40
Thanks. That's a great point. So we have three seconds left. Thank you so much. Thank you. Thank you, all of you. All right. I think we're off the live stream. I'm so going to be going back and like setting my own purpose up with a web server. Thank you so much. Yeah, thank you. Thank you.
29:00
Okay, I'm going to hop to the next talk. Bye.