New Approaches towards User Research and Software Architecture in Research Software Engineering: A Humanities Example - TIB AV-Portal

New Approaches towards User Research and Software Architecture in Research Software Engineering: A Humanities Example

00:00

22

Gesellschaft für Informatik e.V. (GI)

Pohl, Oliver Notroff, Andrea

Formal Metadata

Title

New Approaches towards User Research and Software Architecture in Research Software Engineering: A Humanities Example

Title of Series

deRSE 2019 - Konferenz für ForschungssoftwareentwicklerInnen in Deutschland

Number of Parts

60

Author

Notroff, Andrea

Contributors

de-RSE e.V. - Gesellschaft für Forschungssoftware

License

CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/42509 (DOI)

Publisher

Gesellschaft für Informatik e.V. (GI)

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

The "Census of Antique Works of Art and Architecture Known in the Renaissance" is a research database containing cultural heritage information. Currently, we are working towards a reconception of the "Census" and face the following two challenges. First, we know very little about the Census users' behavior. Second, it is necessary to steer towards a paradigm shift regarding the software architecture design of the current Census web application. In our talk we would like to present our findings and thoughts about user research in Digital Humanities (esp. focussing on art history) and discuss an approach to modular and sustainable software architecture for object- and image database application in the Digital Humanities.

deRSE 2019 - Konferenz für ForschungssoftwareentwicklerInnen in Deutschland20 / 60

1

08:12

Introducing: GERMAN INFORMATICS SOCIETY (GI)

2

44:17

Research Computing and Computing for Research

3

04:23

Potsdam Institute for Climate Impact Research (PIK) welcomes you to the 1st deRSE conference

4

05:36

A warm welcome from the GFZ German Research Centre for Geoscience

5

04:46

Welcome from AWI (Forschungsstelle Potsdam des Alfred-Wegener-Instituts)

6

15:56

7

22:49

HIFIS Software Services, the Competence Cluster for a Sustainable Spftware Development in the Helmholtz Association

8

29:46

Empfehlungen für bessere Forschungssoftware

9

22:58

Neukonzeption des DLR Software-Katalogs

10

23:49

PALM – a story of developing and maintaining a scientific model system

11

24:16

Integrierte Entwicklungs- und Publikationsumgebung für Forschungssoftware und Daten am Helmholtz-Zentrum Dresden-Rossendorf (HZDR)

12

16:39

Software for autonomous astronomical observatories

13

24:40

Help me help you

14

22:52

Challenges and Opportunities of Open-Source Software: the case of SU2

15

16:52

Parallel-in-Time integration with PFASST: from prototyping to applications

16

28:35

The Quest For Better Tests In Scientific Computing

17

23:59

Umsetzung effizienter plattformunabhängiger App-Entwicklung in einer bestehenden Forschungssoftware-Landschaft

18

38:53

deRSE 2019 - Poster Lightning Talks

19

27:41

Zwischen Digital und Humanities, oder die digitale Kirche im Dorf lassen

20

26:32

New Approaches towards User Research and Software Architecture in Research Software Engineering: A Humanities Example

21

30:27

The Debian Astro Project

22

25:39

ESM-TOOLS: a tool for Earth-System-Modellers

23

22:39

Develop, License, Test, Curate - Mathematical Optimization In The Real World

24

17:12

Building a healthy and vibrant volunteer driven community: The Bio-IT project

25

25:32

Von Closed zu Open Source

26

30:10

Portable Container zum Entwickeln, Erstellen, Verteilen und Ausführen von komplexer wissenschaftlicher Software

27

18:37

Die Hard 1.1024.0: backward compatibility of a search engine with persistant Ids

28

22:35

Softwareentwicklung zwischen Forschungscode und Industriereleases

29

26:31

GUI-Architektur für interaktive Datenanalyse

30

26:25

Building research software communities

31

26:26

Curious Containers: Framework zur Reproduzierbarkeit von digitalen Experimenten

32

22:11

The Research Software Engineering Landscape in Germany

33

27:03

NICOS - ein Steuerungsframework für Großforschungsgeräte

34

27:13

Eine virtuelle Werkstatt für die Digitalisierung in den Wissenschaften

35

17:03

Breaking down scientific mono cultures by cross-disciplinary software development

36

24:33

ediarum - from bottom-up to generic programming

37

23:27

Forschungssoftware als digitale Ressource erhalten

38

17:02

GitLab pipelines for every need: testing, documentation, and writing a paper

39

09:50

deRSE19 and de-RSE e.V. - Society for Research Software

40

09:56

How to save a scientist’s career with data classes?

41

35:43

deRSE 2019 - Closing Session

42

19:42

Building scientific communities - Lessons learned with AIRR

43

19:17

Development of research software at DLR - role and status in practice

44

29:34

Keynote: Delivering on the promise of Research Computing

45

26:52

Entwicklung der Forschungssoftware RCE im DLR

46

28:47

Decentralized software engineering and CI/CD in a European joint research project

47

19:11

Against Schematisation – Mapping the Choreographic Vector Space

48

49:56

Keynote: Sustainable Research Software – as Code, as Paper, as Book

49

45:34

Keynote: RSEs together - Building careers, collaborations, groups and communities

50

13:42

Evaluation of the semantic research data management system CaosDB in glaciology

51

30:55

Data mining made easy, reproducible and open-source

52

21:25

The art of giving and receiving code reviews

53

22:03

54

16:57

Generation of a wrapper library for MPI - MeDiPack

55

13:47

Rückblick: Herausforderungen für die nachhaltige Entwicklung, Bereitstellung und Pflege von Forschungssoftware in Deutschland

56

27:10

Rahmenbedingungen für einen nachhaltigen Umgang mit Forschungssoftware am Helmholtz-Zentrum Potsdam - Deutsches GeoForschungsZentrum GFZ

57

23:28

Lebensverlängernde Maßnahmen für Fortran-Codes

58

15:57

Auto-scaling deadline-constrained workloads

59

15:57

Automated Deadline-Based Scaling of Experiments in the Cloud with MiCADO

60

1:01:26

Paneldiskussion "Nachhaltigkeit von Forschungssoftware in Deutschland" (in German)

Automatic playback

Speech

Text

Image

00:00

Sima (architecture)Partial derivativeAerodynamicsPartial differential equationSoftware engineeringProjective planeFood energyLecture/ConferenceComputer animation

00:42

SoftwareSoftware engineeringSimilarity (geometry)Asynchronous Transfer ModeState of matterDatabaseMultiplication signComputer animation

01:27

Similarity (geometry)Asynchronous Transfer ModeGraphical user interfaceProjective planeUsabilityMultiplication signState of matterPhase transitionComputer animation

01:50

Graphical user interfaceProcess capability indexGraphical user interfaceDatabaseProjective planePhase transitionPhysical systemWeb 2.0Cartesian coordinate systemInformation retrievalWebsiteField (computer science)Computer animation

03:02

Performance appraisalSoftwareArchitectureCellular automatonProjective planeSoftwarePosition operatorPoint (geometry)ArchitecturePerformance appraisalDatabaseSoftware architectureComputer animation

04:02

Hand fanNumberWordAreaDatabaseGroup actionArithmetic meanComputer animation

05:06

Link (knot theory)Sign (mathematics)Graphical user interfaceDiscrete element methodMultiplication signComputer animation

05:37

Software engineeringInformationQuicksortObservational studyFile formatGenderMultiplication signSoftware engineeringComputer animation

06:47

Software engineeringMathematical analysisImplementationMIDIComputer-generated imagerySoftwarePerformance appraisalMassCodeProjective planeGraph (mathematics)Performance appraisalGoodness of fitMultiplication signDatabaseFile formatMereologyDivisorUser interfaceProduct (business)Process (computing)Instance (computer science)Query languageLibrary catalogDigitizingSoftwareUser profileCodeHand fanPoint (geometry)Focus (optics)Sinc functionComplex (psychology)Cartesian coordinate systemDigital rights managementServer (computing)Field (computer science)Computer programmingRevision controlLattice (order)Maxima and minimaParticle systemPhysical systemPhase transitionPrototypeSpacetimeOpen sourceTheory of relativityLinked dataCASE <Informatik>Software maintenanceFactory (trading post)Computer animationDiagram

12:01

Position operatorService (economics)Run time (program lifecycle phase)MathematicsPoint (geometry)Web applicationSoftware engineeringSoftwareWeb 2.0Computer configurationRevision controlSoftware developerComponent-based software engineeringWordPosition operatorMultiplication signSoftware architectureDatabaseData miningOpen sourceDigital mediaCartesian coordinate systemProper mapRight angleAndroid (robot)Data structurePhysical systemState of matterTemplate (C++)MereologyForestOrder (biology)TheorySinc functionProjective planeSet (mathematics)Computer animation

17:07

SoftwareFront and back endsTrigonometric functionsPosition operatorRepresentational state transferArchitectureDatabaseSoftwareArithmetic meanFood energyExtension (kinesiology)Cartesian coordinate systemCanadian Mathematical SocietyComputer animation

17:31

User interfaceOpen sourceGraph (mathematics)DatabaseRight angleRepresentational state transferFront and back endsProjective planeLecture/Conference

18:25

Front and back endsSoftwareArchitectureRepresentational state transferInternetworkingPosition operatorProjective planeQuicksortCodeComputer animation

19:16

Standard errorPrototypeInformation systemsProcess (computing)Projective planeWebsiteCartesian coordinate systemInformation retrievalData miningPrototypeComputer animation

20:24

Discrete element methodStandard errorPrototypeAmenable groupData miningMereologyTouch typingComputer animationLecture/Conference

21:09

Standard errorPrototypeProjective planeRun time (program lifecycle phase)CausalityInsertion lossMultiplication signMereologyPhase transitionStatisticsPhysicalismCartesian coordinate systemLatent heatCodeMathematical singularityState of matterDesign by contractPosition operatorProcess (computing)Frame problemDivisorGoodness of fitUniverse (mathematics)Incidence algebraFocus (optics)DigitizingBus (computing)ArchitecturePerformance appraisalComputer animationLecture/Conference

24:24

Standard errorPrototypeCartesian coordinate systemSoftwareComputer animation

24:45

Proper mapData managementProjective planeCartesian coordinate systemSelf-organizationSoftware maintenanceVirtual machineFile formatSoftwarePlanningService (economics)Multiplication signTwitterNormal (geometry)Different (Kate Ryan album)Lecture/Conference

Transcript: English(auto-generated)

00:00

Thank you. And after corpus linguistics and aerodynamics and partial differential equations, I think I'm kind of the exotic bird here in the room or maybe also in this conference because I don't come from the STEM side. I'm a research software engineer in the

00:23

humanities. Well, and the humanities can be a little broad. In this project here, we are focusing on art history, so very exotic maybe. And I work for the Censors LOD project and no, it has nothing to do with the big controversy back in the 80s where the German

00:46

state tried to inquire a lot about the German population. No, it is about art historical reception data. Basically, what did artists in post-Antique times, for example, in the

01:04

Renaissance, know about art in antique times? For example, ancient Greece, so we can basically collect data to answer questions like what kind of artworks did Michelangelo see and how

01:23

did it influence his work? And this database dates back into the early 80s in the U.S. and then later migrated back to Germany and has been in a very, well, for some people

01:40

usable state, but somehow also for some people not usable state because for a long time this has been a long-running project for about 20, at least the funding phase was I think 20 years until, and then the funding ended. The people contributing to the project applied

02:05

for another grant to continue using this kind of, to continue this database, collect more data, enrich it, but the grant was rejected. And one of the reasons was, well, your application is running on a commercial system that is kind of out of date and the

02:25

graphical user interface is very difficult to use because this might look very familiar to you, like this is how maybe websites looked in the 90s or like early 2000s with a lot of fields where you can put stuff in and get stuff out. However, people have been

02:46

googly-fied and now you are accustomed to more, let me say, easier approaches to information retrieval. So what I want to talk about now is how are we going to fix

03:07

that problem with that kind of database I just showed you because we are in the fortunate position that the city of Berlin granted us another three years to come up with new concepts and recommendations, how to go from there and to apply for that kind of grant

03:26

for once more. And this is not three years, not six years, but 25 years. So we have to think about the software consequences in a more long-distance future. And unfortunately, my colleague Andrea cannot be here today who focuses on the user-centered design

03:45

in the agent art history or in general user research, but I will try to summarize her key points anyway. And later on, I will also discuss a little about the software architecture evaluation we did for this project, how do we go in the future there. So, we had this database.

04:07

There were some users and they have been used. The user numbers are not very high, but in the art historic community across the world, our database was always seen as a kind of

04:22

treasure, what you can actually do with the data, how to use that data, for example, for your own research, for publication and so on. But for some users, it was not always clear what do you actually collect, what is in there and why is it in there.

04:46

And these kind of conclusions we could only do by doing some user research on conferences, talking to art historians, what their goals were, what their actions, what their means were to do their research. And I have, can you read those papers?

05:09

It's just the notes from my colleague. And, whoops, not, but time, times are, basically times, times are changing and people

05:29

are getting accustomed to more modern solutions, to look things up on their phones and to actually want to have the data in a pure format, which in the humanities is, we're going there,

05:43

but in not every sub-discipline, we are there. So, we have talked to a lot of people what they know, what they want from us, so we can take this into consideration. But since we are only, well, we're not really software engineers per se, that's not our background,

06:07

I come from information science, my colleague actually from gender studies, so we're both just kind of wiggled in there and just learned, well, software engineering on the go.

06:25

We were kind of, well, now we have this information, what do we do with it? And since our time is short and, well, resources are always kind of small, we also thought about, well, if we don't have the expertise maybe somebody else has,

06:41

and we did a rare thing to do for a humanities project, we actually got in contact with UX designers and now we're working with her to understand our users better, to observe them and to accompany us along the process of

07:06

creating the kind of new prototypes, applications, user profiles and so on, so we get to the goal we want to go. But, unfortunately, I cannot summarize it as well as my colleague would have, so I

07:25

skip to the software evaluation part because, well, where did we come from and where do we go now, right? As I said, the software, the database is based on a commercial product provided by

07:41

a company seated in Berlin called Progamfabrik, so basically program factory, if you want to translate it literally, and they offer a digital asset management solution called EasyDB or EasyDB,

08:03

EasyDB instance has been modified to the max, so it's really hard to upgrade it to newer versions and to develop new things from there, and on top of that it's not even open source, so, well, I can understand why some reviewers of our grant application said, well, you should,

08:25

you should go open source, dude, and so what we basically did is, well, what are other people doing, right? This is the first thing you usually do when you're diving into a new subject.

08:41

We looked at our colleagues in the fields of digital humanities and art history, what kind of database solutions they were using, and they listed a few of them on the left side there, and basically we just started trying them out, and I don't mean, well, I am now a user and I click

09:03

here and I click there, no, it means actually, well, how easy is it to install this into our local like server infrastructure? Is it easily deployable? Is there well-documented code and is there long-term support? Who are the maintainers? Is there a big community?

09:22

And so forth and so on. The Software Sustainability Institute also provides a good guidebook and catalog of criteria on how to evaluate software for those kind of projects, which helped us a lot. Other factors were like, because

09:42

a focus on a project also was maybe we should go into a graph database or linked open data to also look at the capabilities of that. And, well, there were some solutions like ResearchSpace which provided, well, very, very nice user interface and queryability,

10:05

if that's the word, where you can query complex graph data in a very open format. However, there were basically, there was, well, one or two big walls in the way of actually

10:22

testing it the right way. And this was porting the data we have. Well, it is in a Postgres database, but, well, if we want to just take our data, put it into a different system, we would have to do a lot of adoptions that it actually works there. So this was also not a

10:47

good idea to actually go in the long run. Well, what could we do now? Well, just a few days ago, we had a meeting, well, and we kind of have to scrap that kind of evaluation approach there, and we somehow have to do it differently. Because, well,

11:09

the recommendations we say now have impact to them for the next 25 years. And I will, how am I doing with time? Okay. Okay. Like, if there's still time tomorrow,

11:23

I will showcase some of the system, but I think that's not the main point here. And, well, since who, maybe one or two of you have delved into like DevOps or something, and if you do that, you come across the Rugged Manifesto. And I'm usually not a big fan

11:41

of reading quotations out loud in presentations, but since this is being recorded, and so people can listen, I will do it anyway. So the third point in the Rugged Manifesto means, says, I recognize that my code will be used in ways I cannot anticipate, in ways it was not

12:01

designed, and for longer than it was ever intended. And I think we've all been there, that we've kind of used some application, not as it was intended. So totally different, differently, and I have anecdotes from the humanities. Maybe I just drift off here, because

12:21

I'm, okay. So where do I go? Yes, for example, in another project, my two colleagues of mine are working very hard on, they are creating a digital edition and geographical information

12:42

system about historic data, about the Prussian state, or more like the royals in Prussia, and what's called the important people. And since people, they wanted to start entering

13:03

data right away, they did not use some kind of database, or even like XML, or something like a structured format, no. Well, they say, okay, we want to create organic grams, what do we do? We take Excel and create clip arts. And now, if you want to use clip arts for a web application,

13:24

for example, well, at least it's XML-E, so you can go from there and start migrating the data. Or another example is people creating transliterations of Quran codices in Word documents,

13:47

using, well, some kind of template, but it's not always very pure. It's like, okay, it works, but it doesn't work really for if you want to have pure or data from the fair principles. So, well,

14:01

this is basically the problem. How do we actually go against this kind of misuse, and how do we make sure that we have the data and the applications robust enough so they last at least 25 years? Because, well, if you think back, who was computing 25 years ago?

14:28

One, one person, right? And it has changed a lot, right? Well, and so I don't know what the future brings. I can't even tell like what the new features of Android phones will be like in

14:42

the next two years. It is very hard to anticipate how technology changes and how this technology again changes user behavior. So, what are the recommendations we can actually provide? You know, everybody's very tense now, right? And there are only a few.

15:04

Well, although we have this kind of runtime of 25 years, the grant will only contain a developer position for like, what's it called, a half time or like maybe two-third position. So, while

15:22

the actual art historians are collecting and collecting data, doing their research and so on, there's this one gal or guy who has to keep this service up and running. And it is no option actually to just keep the old version of this database up and running. So, what do we do?

15:44

Okay, we could of course upgrade it to the newer version, would mean first it costs a lot of money. Well, it is a little more extensible, but we're still faced with a problem. Oops, still not open source. So, that's basically a no-go. And so, it has to be maintainable for

16:07

basically research software engineers in the humanities, who has to not only under, has to also be in the mindset of an art historian and a proper research software engineer or a

16:22

software engineer in general, you kind of have to create a more modularized approach of your whole software. I don't think that still having a monolithic software architecture for web applications is still, is not proper anymore for these kind of runtimes. And I don't

16:43

speaking anything in you here for some of you people. So, basically if we started, for example, migrating, do we have a, the point is, just once he said no, where's the, so basically

17:02

by modularized approach, I mean, just break it up into different components and then replace it when it needs to be replaced. Right now, everything is one application, like basically the database is connected to the CMS, which directly outputs the data. We have no API, no means of extensibility, but if we have to, if we would have to start

17:26

redesigning the software tomorrow, an idea would be, well, we keep the current easy to pick up and running since it is configured to enter the data properly in the current way and it works. And then we start maybe to create a REST API or GraphQL API,

17:46

talking directly to the database, providing APIs for the open source community and humanities community, and the better usable front-end for, well, the art historians and interested people in

18:09

goodbye because actually implementing the front-end is actually not the hard part, but since this is so intricate data, redeveloping all the features to

18:24

enter all the data we need in the proper way for the researchers in the project would just take another two years. So it's not a sensible approach to start there, I think. So going from there, we always have to have a mind, a kind of trade-off of sustainability

18:51

and up-to-date-ness to always take the approach of the minimal effort in the long run so it stays maintainable, but you always also want to have an up-to-date code base with the best

19:04

principles and a testable suite and such. So it's very hard to find the up-to-date-ness, to really be up-to-date with their code then. And I'm really looking forward to how this will evolve in the next years in the hopes that the grant will get accepted.

19:25

So where do we go from now? We're currently in the process of actually creating our own now project website where we will share our findings. We are hoping to get more insight about UI and UX in complicated research applications

19:44

and we will start prototyping our new research approaches or like information retrieval approaches in a few weeks with our UX designers so we maybe might be able to produce the first prototype at the end of the year or maybe since we have to consider other projects as well maybe start up

20:04

next year. And with that I want to thank you for staying this late and I'm thankful for questions that you hopefully have. Thanks. Thank you very much for the talk and we have

20:23

two questions I see. So yeah I'm working in mines in an archaeological research institute so you're not the only humanities guy here and I know all the problems you mentioned. And I just want to know one thing. You should be part of the union of academies in Germany, right?

20:43

So did you get in touch with the academy of science and literature in mines with the digital academy inside because there are a lot of people and they have a lot of ideas about all that things. Have you talked to them somehow? We have been talking

21:05

to them a lot because I'm also working with Torsten Schade and Sarah Pitroff at different projects and yeah so we are in contact. Thank you very much. One comment or

21:21

question more. I know also humanities is not like of physics where you have a lot of money but still if it's a 25-year project and just outlined how much work has to be done and you have a bus factor of less than one shouldn't you kind of say well this is actually a large project we want to have two people. Which I think is kind of impossible in the landscape of

21:46

research funding because it seems like so much work and also like a work for the actual 25 years. Yes we are still in the process on what to recommend and the current state is that they actually want to want to insert like as an like two-third position but we urge them to do

22:09

actually more because the the long tail will be that the architectures and the whole application will get even complicated more complicated in the future because I don't know what will be there

22:21

like VR, AR maybe this that might be where people are in a few years. And unfortunately like just from experience within the humanities and digital humanities world in the

22:41

the union of academies there are now projects that have been accepted for like with a full one full-time position but usually if you apply for more they just say nah because the main focus there is in those projects is not on the digital side but still on the

23:02

humanity side of identifying that the art historical data doing publications and complete a corpus of maybe let me just make something up of all antiques situated in a

23:20

specific part of Italy. So this you have to find a trade-off and but and another thing is that this person employed then might not be there for the whole 25 years since the contracts are not necessarily bound to the whole runtime of the whole project but to

23:46

phases of evaluations which take place every three or six years. So and so other people can also adapt you have to create a good kind of approach to your code

24:02

to make it maintainable so other people in the future might as well take over. Does that answer your question? Yeah okay um we have another question. Yeah my desire from Humboldt University I just wondered why you're talking just about application sustainability and giving a time

24:23

frame of 25 years I would just talk about data sustainability because it's for sure that you will have to change your applications every 10 year latest. I mean no if looking back 25 years it was 1994 and if you try to use software from 1994 that won't be too successful

24:42

today I suppose. So why not just talk about some data sustainability and then change your software stack every 10 years as it's normal for many infrastructure organizations. Well data sustainability or in general research data management is also a thing we

25:02

we have on our minds and also the academy we have now two people also focusing on that so we have this ball is rolling and yes like changing your application every every decade or so well

25:21

this is kind of normal but the the problem that I personally experience within my my team of still I think 15 people were like 15 people like me in different kind of projects and is that as these as the fundings of these projects run out there right right now

25:46

the we are still inclined to keep those services up and running so we have to keep kind of maintenance and we don't also want to and we cannot change the basic the basic UI and UX because there is no time for that

26:03

so these are basically the the challenges we face there of course if we have a proper research data management plan and the corresponding repositories and export the data in a human and machine readable format we can go from there but this will also be I think

26:21

a step that is that has to be taken within these 25 years to to make to port this data into that kind of format actually we have this on our mind