Papis: a simple, powerful and extendable command-line bibliography manager - TIB AV-Portal

Papis: a simple, powerful and extendable command-line bibliography manager

00:00

3

Related Material

Gallo, Alejandro

Formal Metadata

Title

Papis: a simple, powerful and extendable command-line bibliography manager

Title of Series

Number of Parts

542

Author

Gallo, Alejandro

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/61449 (DOI)

Publisher

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Managing efficiently references for research papers or general works is of paramount importance for scholars and students across the spectrum. The common tasks of such a user range from information retrieval of a publication to easy tagging and searching the user's own library. Several libre and proprietary software packages exist. The package Papis consists of an extendable Python library, a flexible command-line interface and a simple (but powerful) data model. This in turn, empowers the user to curate her library metadata in a future-proof and privacy-respecting manner. Papis users are encouraged and empowered by a clear API to write scripts and libraries to extend the core functionality. All major text editors have an interface to Papis and a web-application for remote access to the user's libraries is also in place. Additionally, during the last 7 years we have built a vibrant community of academic and industry researchers that have become happy users and avid contributors.

Speech

Text

Image

00:00

PhysicistSupercomputerGraphics processing unitQuantumPoint (geometry)View (database)Paralleler AlgorithmusData managementProjective planeOrder (biology)CalculationExtension (kinesiology)Solid geometryTheoryCASE <Informatik>PhysicistCategory of beingUniverse (mathematics)MassWeb 2.0Computer animationMeeting/Interview

00:51

CodePartial derivativeBusiness modelDatabaseTask (computing)Complex (psychology)Interface (computing)Extension (kinesiology)Data structureSource codeComputabilityNumberTuring testComputer fileHill differential equationInformationConfiguration spaceLibrary (computing)Sheaf (mathematics)Text editorEvent horizonPlug-in (computing)Set (mathematics)Library (computing)Web applicationInstance (computer science)Game controllerData structureSheaf (mathematics)Form (programming)MathematicsVideo gameTask (computing)Computer fileVariety (linguistics)InformationWebsiteProgramming languageData conversionSpacetimeUsabilityFile formatElectronic mailing listOpen setFunctional (mathematics)Core dumpKey (cryptography)CASE <Informatik>Source codeConfiguration spaceAdditionScaling (geometry)Scripting languageEntire functionContent (media)Type theoryData managementTheory of relativitySign (mathematics)Data modelCodePartial derivativeMultiplication signReading (process)State of matterDefault (computer science)Formal languageOrder (biology)Probability density functionFile viewerMetadataTuring testSingle-precision floating-point formatAsynchronous Transfer ModeDirectory serviceRevision controlTotal S.A.

07:31

User interfaceComputer configurationDatabaseInformationLibrary (computing)String (computer science)File formatGUI widgetEquals signEmailDefault (computer science)Instance (computer science)VideoconferencingDemo (music)PhysicsTablet computerFood energyFunctional (mathematics)Cross-correlationDigital object identifierNormal (geometry)Frame problemAbstractionIndependence (probability theory)Maxima and minimaTerm (mathematics)Personal digital assistantInformation privacySet (mathematics)Functional (mathematics)Web pageDescriptive statisticsConfiguration spaceDataflowInformationWeb browserCASE <Informatik>Structural loadFlagClassical physicsLibrary (computing)Text editorComputer fileMetreView (database)Source codeMetadataDigital object identifierRadical (chemistry)Uniform resource locatorProbability density functionSource codeJSONXML

09:30

Personal digital assistantInformationCross-correlationFood energyTerm (mathematics)Extension (kinesiology)Web pageConfiguration spaceSource codeComputer fileProgrammable read-only memoryMetadataProbability density functionHeegaard splittingSource codeXML

09:56

Normal (geometry)NumberVolumeVector potentialTelecommunicationUniform convergenceStandard deviationUniformer RaumTerm (mathematics)Regulärer Ausdruck <Textverarbeitung>Food energyTheoryMereologyView (database)ExistenceMaxima and minimaIndependence (probability theory)Extension (kinesiology)Finitary relationPersonal digital assistantCore dumpEmailExecution unitInclusion mapMenu (computing)Normed vector spacePlane (geometry)HoaxMathematicsColor managementSimulationNumbering schemePhysicsDynamic random-access memoryModal logicComputer fileWeb pageReal numberPoint (geometry)Computer fileElectronic mailing listRevision controlFlagAuthorizationProbability density functionInformationWeb pageVolume (thermodynamics)Source code

10:41

View (database)Bloch waveDean numberDemonData structureZeitdilatationFood energyMenu (computing)Uniform convergenceLibrary (computing)VolumeTerm (mathematics)Extension (kinesiology)Numbering schemeInformationComputer fileMatching (graph theory)Personal digital assistantQuery languageComputer configurationQuicksortGroup actionMessage passingForceWorkstation <Musikinstrument>Web pageDatabaseOpen setDirectory serviceChi-squared distributionPhysicsProgrammable read-only memoryLibrary (computing)FlagInformationElement (mathematics)Computer fileDirectory serviceElectronic mailing listFile formatScripting languageWeb applicationSource code

12:24

Demo (music)Scripting languageLibrary (computing)InformationComplex (psychology)Scripting languageDemo (music)Web applicationTelecommunicationComputer animation

12:46

World Wide Web ConsortiumMedical imagingParameter (computer programming)Library (computing)CASE <Informatik>Configuration spaceComputer fileElectronic mailing listPoint (geometry)Interface (computing)Directory serviceGroup actionModule (mathematics)Loop (music)Data dictionarySystem callQuery languageScripting languageFunctional (mathematics)AuthorizationString (computer science)FlagOpen setMessage passingAlpha (investment)Computer animation

15:08

Demo (music)Sign (mathematics)Message passingWeb applicationWeb 2.0Uniform resource locatorService (economics)Message passingWeb browserQuery languageWeb pageProgrammable read-only memorySystem callComputer animation

15:45

Query languageLibrary (computing)Digital filterPhysicsTheoryClassical electromagnetismFiber (mathematics)QuantumMaizeLibrary (computing)Different (Kate Ryan album)Programmable read-only memoryWeb pageResultantAuthorizationQuery languageInstance (computer science)Block (periodic table)Link (knot theory)InformationFlow separationElectronic mailing listRight angleMultiplication signLine (geometry)Computer animation

16:46

Computer-generated imageryClassical electromagnetismLink (knot theory)PhysicsFirefox <Programm>Fiber (mathematics)TheoryQuantumWeb pageLoginFreewareAsymmetryDigital filterLibrary (computing)Business modelCorrelation and dependencePrinciple of localityAuto mechanicSupercomputerTensorProcess (computing)Maß <Mathematik>MathematicsDirac equationPresentation of a groupResultantComputer animation

17:26

Digital filterLibrary (computing)Business modelCorrelation and dependenceMaß <Mathematik>QuantumSupercomputerTensorProcess (computing)MathematicsAuto mechanicPrinciple of localityDirac equationWeb pagePhysicsForm (programming)InformationVolumeLink (knot theory)Computer fileFunction (mathematics)Multiplication signComputer fileInformationContent (media)BitLibrary (computing)Arrow of timeBit rateWeb pageWeb browserRaw image formatForm (programming)Probability density functionComputer animation

18:24

Open setCoefficientInvariant (mathematics)Kernel (computing)Term (mathematics)Function (mathematics)Formal grammarThermal expansionNichtlineares GleichungssystemPopulation densityForm (programming)InformationFunctional (mathematics)Particle systemDensity matrixCross-correlationMatrix (mathematics)Kinetische GastheorieFood energyUniform convergenceTelecommunicationLinear mapRegulärer Ausdruck <Textverarbeitung>WebsiteClassical physicsUniqueness quantificationFirefox <Programm>IntegerDistribution (mathematics)Maxima and minimaLogical constantCategory of beingVector potentialCoefficient of determinationRange (statistics)Execution unitPhysical systemTransformation (genetics)AdditionConsistencyProof theoryCalculus of variationsBusiness modelCorrelation and dependenceMathematicsSystem programmingPhysicsCalculationAtomic numberDirac equationWindowResultantLibrary (computing)Service (economics)Web pageXMLComputer animation

19:36

Business modelCorrelation and dependenceMaxima and minimaSoftware developerReading (process)Gamma functionLatent heatPopulation densityNichtlineares GleichungssystemTheoryCross-correlationSystem programmingElement (mathematics)PhysicsE-textSource codeCalculationFood energyForm (programming)Atomic numberPressureView (database)AbstractionVolumeContent (media)Similarity (geometry)Field (computer science)WebsiteDirac equationRhombusFunction (mathematics)InformationCASE <Informatik>Web pageComputer fileDefault (computer science)Automatic differentiationWeb browserInterface (computing)Library (computing)Computer animation

20:14

Digital filterLibrary (computing)MathematicsPhysicsNichtlineares GleichungssystemCalculationMusical ensembleTheoryData structurePerformance appraisalGibbs-samplingINTEGRALPhase transitionDistribution (mathematics)VideoconferencingLibrary (computing)Interface (computing)Projective planePresentation of a groupInformationWeb pageSource codeComputer animation

21:10

Inheritance (object-oriented programming)Service-oriented architectureVideo gameComputer clusterArmComputer animationProgram flowchartMeeting/Interview

22:06

Projective planePoint (geometry)Video gameLine (geometry)PlanningMultiplication signFlagInterface (computing)Plug-in (computing)Computer animationMeeting/Interview

22:45

Digital object identifierFlagCASE <Informatik>Pattern recognitionUniform resource locatorWeb pageMeeting/Interview

23:42

Connected spaceError messageSoftware testingComputer animationMeeting/Interview

24:11

BitMetadataSource codeComputer-assisted translationProjective planeUniform resource locatorVideoconferencingPhysical lawData centerWeb applicationMeeting/InterviewComputer animation

24:59

File formatOnline chatKey (cryptography)WordInformationComputer fileContent (media)Meeting/InterviewComputer animation

25:48

Term (mathematics)PhysicalismArchaeological field surveyNeuroinformatikPlanningMetadataRepresentation (politics)DatabaseMathematics1 (number)BitWeb applicationUniform resource locatorComputer scienceInstance (computer science)Doubling the cubeWave packetSoftware testingWeb 2.0Data storage deviceMeeting/InterviewComputer animation

27:05

Wave packetBitOnline chatMultiplication sign2 (number)Meeting/InterviewComputer animation

27:58

2 (number)Multiplication signSoftwareVisualization (computer graphics)Network topologyFeedbackPoint (geometry)Social classMeeting/InterviewComputer animation

28:41

2 (number)Streaming mediaWeb 2.0Server (computing)Process (computing)Electronic mailing listMeeting/InterviewComputer animation

29:05

Program flowchart

Transcript: English(auto-generated)

00:05

Hello FOSDEM, my name is Alejandro and today I am going to talk about PAPIS, a simple, powerful and extendable command line bibliography manager that I have been developing during the last 7 years. I will be explaining some of the main considerations of the project and demoing

00:24

some of its basic use cases. First of all let me introduce myself. I work currently as a physicist at the Technical University of Vienna in Austria. We develop massively parallel algorithms in order to calculate properties of molecules

00:41

and solids from a theoretical point of view. You can find me on Mastodon or around the web, don't hesitate to contact me. So, what is PAPIS? PAPIS started as a simple bibliography manager built around the command line.

01:02

It should make possible to manage papers or books at scale or for small curated libraries. It is therefore important to implement a simple data model and use an approachable programming language such as Python so that users can interact easily with PAPIS many features.

01:27

In addition, Python also encourages contributions from researchers in the academic world since nowadays many researchers are exposed to this language.

01:41

PAPIS strives to be and build a community. And various plugins have appeared thanks to the community. There are plugins for the major text editors such as NeoVim and Emacs and partial support exists for VS Code and Vim.

02:04

Additionally, lately we have been working on the web application for PAPIS and I will be showing some of its features in this talk. But you are asking yourself, why PAPIS? We think that it should be possible and simple to perform complex tasks on a whole library.

02:26

This is made possible through a rich command line interface. You can add papers from a DOI or from a variety of websites supported by PAPIS. You can explore sources like Crossref from the command line, or download information

02:44

about the citations of a publication, or check which publications cite the current publication. You can take notes that play well with tools like Vim or Emacs Org Mode. You can version control your documents and export to the most common formats.

03:04

You can spend countless hours curating and improving your library's notes, metadata and PDF documents without fearing losing your data to an API change or end of life of PAPIS since your data is stored in a very simple but flexible format.

03:27

I want to emphasize the fact that one of the main goals of PAPIS is enabling the user to be independent of PAPIS itself. A researcher, academic or not, spends an enormous amount of time searching, reading and annotating

03:45

publications. For us PAPIS maintainers, it is important that a person comfortable with any scripting language should be able to retrieve the totality of PAPIS data by writing a script

04:00

in an afternoon. In order to accomplish this, an extremely simple library structure was chosen. The library structure relies on having one folder per library document. This means, for instance, in the case of the shown publication of Turing, the folder

04:23

includes a YAML file containing the metadata information of the publication and an additional PDF file with the published publication itself. In this example library we would have an additional document under the folder 1-document where

04:43

we find two PDF files in this case. A document in a PAPIS library is any folder containing a YAML file entitled info.yaml. The contents of the YAML file are in principle up to the users to determine, however in

05:04

practice there are some conventions used in PAPIS. Inside the info.yaml file, the key files contains a list of related files in the documents directory. These files might be PDF files or any other kind of files relevant to the document.

05:28

In the case of the Turing publication, files therefore lists a single PDF document, paper.pdf. The key ref is used for exploring bibtex files and is the reference of the document when

05:46

using bibliographic tools outside of PAPIS. The YAML key type is also used for bibtex exploring and is the type of document, whether a book, an article, a monograph, etc.

06:02

There is also an inbuilt support for tags, which may be added as a list of space-separated keywords. We chose the YAML format due to its ease of writing, reading and because most programming languages are provided with libraries that can read these files.

06:21

Of course, given the simplicity of the library model, it is possible to write a crude finder with just a Unix grep and find commands. All functionalities in PAPIS can be customized through a configuration file in the .ini format. PAPIS can define multiple libraries through the configuration file and all PAPIS settings

06:46

can be independently configured for each library. You can define default settings under the settings section, which will be common to all libraries. A library is simply defined as a section with a dir key which contains a path to the library

07:05

directory containing all documents. You can then customize this library, in this case a library named papers, and set the default opener tool to the PDF viewer events.

07:20

If you happen to want an additional library of books holding mostly EPUB formatted books, you could define the opener to be caliber instead. You can read about all the configuration settings in the documentation page, where you will see a description of their function and their default values.

07:42

With this introduction, let us take a look now at a common workflow to add an article from a journal page. Here is a common view of an article in a browser. We can see lots of information and the easiest way of adding this article to PAPIS will

08:02

be by locating the DOI of the article in the page. In this case, we locate the DOI in the URL of the article and we copy it to our clipboard to paste it in the terminal. The command for adding a paper is PAPIS-add, and PAPIS-add comes with quite many options.

08:26

In general, when adding a document, PAPIS will try to download metadata from various sources and, if possible, download PDF documents, if they are freely and legally available.

08:42

In here, we see that I am using the edit flag. This flag instructs the PAPIS-add command to open the editor with the info.yaml file before adding the document to the library. Similarly, the open flag instructs the command to open the attached files, if any, before

09:03

adding the document to the library. We are also telling the command through the from flag to retrieve information exclusively from the DOI. We can also preset some metadata through the command line. In this case, we are adding the tags, classics and DFT.

09:27

Let's go ahead and run the command. PAPIS will now try to download metadata and a PDF file from online sources. In the current configuration, we are greeted with an interactive prompt to add, split

09:45

or reject the metadata retrieved from Crossref. We choose to accept the metadata. The interactive session now shows us a retrieved PDF document and asks us if this is the document

10:01

that belongs to the publication. At this point, we can inspect the document and we realize that we indeed want this PDF file, so we press Y. Now all the information is in place and we can see a preliminary version of the info

10:20

file since we passed the edit flag. We can see that a lot of information could be retrieved, detailed author list information, volume, pages, among others, and our tags have found their way into the YAML file correctly.

10:43

A confirmation prompt subsequently appears since we passed the confirm flag to the command. We agree to it and therefore the document gets added to the library. We can now fetch information about the publication cited in this article.

11:01

The command for this is citations and we pass to it the fetch citations flag which first checks for information in our library and then heads to Crossref to retrieve relevant information about the references appearing in our newly added document.

11:21

If we now open the directory where the document has been stored, we see that the PDF file has been correctly stored alongside the info.yaml file and the newly generated citations.yaml file.

12:02

If we inspect the citations file, we see that it is in the format of the list of YAML files where every element separated by three dashes represents bibliographic information about the citations. This can be used for scripting, for browsing the citations, or for easily visualizing

12:24

them through the web application. This demo will show how to leverage the Puppy's API in Python to write one of the simplest scripts you can write. You can find more information in the documentation together with other more complex example scripts.

12:45

First of all, let us add a bigger library to our demo library. For this, we need to edit the configuration file and add an additional library. After adding the library, we can list the directories with the list command, which shows

13:06

us the interactive interface to select documents. Most Puppy's commands accept a query argument as an input. In this case, we can query for documents matching the author to include the string

13:21

Einstein. We can also use the all flag to do a Puppy's action to all documents matching the query. In this case, listing the full paths for the folders. Other commands like open, edit, or update work in a similar fashion.

13:47

Next, we will write a simple Python script to scan all the documents in the library and add the tag to the document whenever the substring this appears in the title of the document.

14:02

To do this, we can use the Puppy's API submodule and we can obtain all documents in the current library with the function getAllDocumentsInLib. Next, we loop over all documents and we deal with the document as if it were a Python

14:21

dictionary.

14:48

The method safe, saves the document. I will comment out the safe call since I don't want it to overwrite the library.

15:02

Let's run the script and see that it works. And indeed, it works. The last demonstration will concern the web application. The web application is quite useful if you would like to self-host PEPIS or access it from a portable device.

15:22

We can run the web application using the serve command to which we can pass a port 8888. Directing our browser to the URL localhost colon 8888, we see the starting page of

15:43

the web application where we are presented with a simple query prompt. Other pages include listing all the documents in the library, listing all the tags and browsing a different library.

16:00

Let us again enter the author Einstein query into the prompt. The result page includes a handy timeline with the results of the query and a simple multiline list of the results. In this timeline, we can see for instance directly the Annus Mirabilis of Einstein

16:21

together with a couple of other publications further right. We could click on the title of the timeline and go to the respective document page. In the results for the document, we see a left block with some basic information and the PDF links.

16:41

On the right-hand side, we see the citation references and several external links for the document.

17:08

Let us look for the first paper we added at the beginning of this presentation. It is worth noting that we can click on the tags of the documents to get the results for the given tags.

17:36

If we click on the arrow, we will navigate into the document page. The red notifications advise us of small problems with the data in our document.

17:46

However, I will not fix those now. The document page is a multi-tab page where the first tab presents most of the information of the document in an HTML form fashion.

18:02

Additionally, we have access to the raw info file where we can modify and override its contents. We have added a BIPDEX tab for LaTeX users. This document has a single file attached and we can preview it on the browser

18:22

thanks to the library PDF JavaScript. We can also download the document or open the document in a new window. In the next tab, we can visualize the citations file that we generated previously.

18:41

This tab also has a timeline like the search results and the documents with the green reference indicate that these documents exist in our library and we can open them.

19:03

Let us open this article page.

19:26

For this article, we have also generated citations but we can also use the Harvard ADS service.

19:46

In the case of articles citing the current article, we have not generated this file and therefore we get an embedded page from ADS by default.

20:01

In the last tab, we can edit the notes from the browser. Furthermore, clicking on the tags and library pages, we can see how these

20:22

interfaces look like. Thank you very much for your attention. For further information, visit the projects page over at GitHub.

20:44

Of course, Papis is only alive because of its community. I'd like to thank all the users and contributors over the years. I would like to specially thank the co-maintainers of Papis, Alex Fickel and Julian Hauser for their hard work in the last year.

21:04

I hope you enjoyed the presentation and I'll be answering your questions shortly.

21:23

Fantastic, thank you so much for that really, actually quite interesting talk Alejandro. I really quite, I felt inspired and thinking, wow, can I run this like with my own publications as a way of collating stuff and also sharing it with the world.

21:40

It's always nice when you watch a talk and you immediately think, yes, I'm going to use this as well. So I have a few questions here. I think my first one perhaps might be, I historically have used Zotero. I had to think very carefully and not place a note for that. Is it easy for me to migrate if I was inclined to migrate from Zotero or other plugins?

22:10

Yes, so over the years quite a lot of people have developed some plugins for the interface of Zotero and Papis.

22:20

So you can export and create Papis libraries. But I am also aware of some people that actually use both. So they have a workflow to export this dynamically the whole time. So it is in principle compatible and there are a couple of projects that do this.

22:41

This is this coming from the community. Thanks. That's actually really appealing to me. Because I was watching the way you added with the DOI and I thought that was really cool. There's still like seven command line flags and I really like the button that says add this to Zotero.

23:00

Yeah, so the thing is, yeah, I should have maybe given some more examples of adding some documents. So in principle is also possible to add a document just by the URL. So there are some some automatic recognition in Papis.

23:24

So most URLs are recognized and it could in this case even rely for URL. Or within the HTML page.

23:44

I noticed that there is a lot to use. Sorry, your connection has gone just rough. We have implemented a lot of testing, testing.

24:09

Do we still have you now? Yes, that is much better. Could you maybe just repeat your last two or three sentences because it was just a bit hard to hear. So there are some, Zotero has implemented also quite a lot of metadata fetchers from many sources.

24:33

And there is also a project that tries to reuse these metadata fetchers from Zotero for their use in Papis.

24:44

And maybe also in the web application in Papis. So this might also happen in the future. But in general, it's much easier to add documents than what I showed in the video. Cool. Thanks. It's also really nice to hear how interoperable you all are.

25:03

So we have a couple more questions in the chat. So Paul says, does the YAML format follow bibliographic standards of any type? So we try to use most of the BIPTECH keywords when they are applicable.

25:25

And in general, the YAML format is really free to the user to use. So you might want to use a particular convention in your YAML files. But keywords are mostly motivated by BIPTECH.

25:44

That's the only one. It still sounds like there's some decent interoperability there, which is really nice. Celia asks, who are the main uses of Papis in terms of discipline, students, researchers, etc?

26:01

Well, I know that's a good question. A lot of biophysics and biosciences, so bioinformatics. I know quite a lot of people that use it. Physics, mathematics and computer science, I would say. These are the ones. But for instance, Julian is one commentator and he's a philosopher.

26:26

So it really helps if, of course, you have to be a little bit acquainted with the command line. Maybe, hopefully, through the web application in the future, this will change.

26:40

But it's in general people that really care about their libraries, the metadata that they have in their libraries. And they really want to have a very clear representation of their data. They don't want some abstruse database somewhere stored.

27:01

So they really want everything in plain text. Yes, I think you demonstrated so beautifully how accessible your own data is. And it's surprisingly rare. Do you have any trainings for PEPES so that people who maybe are a bit less confident could learn more about it?

27:21

Sadly, not right now. Maybe that's something, if enough people are interested, that's something that we could certainly look into. But we have the discussions in GitHub, so quite a lot of people ask questions there. So there are also frequently asked questions there.

27:41

And we have also a Zulip chat. Also, we are on Libera, but right now not so many people are there. And yeah, so just drop by and ask whatever you want. Super, thank you. So we have about 45 seconds left.

28:02

That's time to squeeze in one last question. So we have one here from Paul. And we've got some love for the timeline, which I agree. I was like, oh, I want that. Do you plan any other visualizations, like maybe publication networks from citation data? Yes, actually, yes, because I realize I really like these visualizations.

28:26

I plan some, like with the citations, some trees and stuff like this. But I would like to have more feedback from users to really know what's really sensible and useful.

28:40

Thanks. That's a great point. So we have three seconds left. Thank you so much. Thank you. Thank you, all of you. All right. I think we're off the live stream. I'm so going to be going back and like setting my own purpose up with a web server. Thank you so much. Yeah, thank you. Thank you.

29:00

Okay, I'm going to hop to the next talk. Bye.