Fraunhofer Publica - GoLive with DSpace CRIS7
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 9 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/57544 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2022 | |
Production Place | Berlin |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
3
00:00
Computer animation
02:18
Computer animation
02:45
Computer animation
03:18
Computer animation
04:54
Computer animation
09:23
Computer animation
10:45
Computer animation
15:38
Computer animation
16:06
Program flowchart
23:54
Computer animation
24:19
Computer animation
28:52
Meeting/Interview
Transcript: English(auto-generated)
00:03
dSpace Praxis 7 2022, that's finally the Praxis 7 where we can say it is released, it is there, dSpace 7.0 is out of the door, dSpace 7 Chris is available and to use for all of you. And we will have the presentation, I think many people are waiting for a presentation about going live with dSpace Chris 7.
00:22
We have Gio Zappe and Dirk here, the stage is yours. Thank you Pascal for the nice introduction and the invitation that we can present. As you said, we are in the final stages of the go live. So, the presentation today will
00:42
be a short run over the last years, we have been waiting for, we are working hard on it. And maybe it's a little bit pressure, more on the pressure we still have because in 10 days the go live will be there and we are hoping that it will go well.
01:01
So, my name is Dirk Eithinger Pabst, I'm working for the Fraunhofer Competence Center of Research Science and Research Services in Open Science, based in Stuttgart at the IRB. And today, I tried to talk a little bit about a project we had about dSpace Chris and our new publica based on the system.
01:29
And I tried to make it a little bit fast so that we can have more time at the end for asking and maybe you ask for a preview. I have some screenshots, but you won't see anything, it's only an impression, but maybe
01:45
we have time and I can switch to the server so I can show you something. So, let's dive deep into it. Maybe the marketing part is a little bit important for today because Fraunhofer is Europe's biggest, largest RTO and to have some figures and facts in mind, maybe explain to you why we
02:09
constructed or organized our infrastructure, publication infrastructure, service infrastructure, as we did. Fraunhofer was founded in 1949 and until today it keeps on with the mission
02:29
of combine the research, educational part of the research part and the RTO part. That means collaborating with industry and other economical partners. So, to understand our both
02:47
sides we had of one side, we are well connected to the universities in Germany. Normally more than 90% of our directors hold a chair at a university or a university of applied science.
03:03
And this is a basis of our old publication infrastructure because we have similar tasks or requirements as a normal university has. On the other side, we are well connected with the industry and our partners there. I
03:26
think you all know things and inventions like MP3, I don't want to dive deep into this. But that means that IP management and patents and linking patents and IP management to publication projects, etc. is quite important for us.
03:45
And we had some linking in the old databases. And it was a very important focus for us for our specifications to take this special Fraunhofer need to the new system.
04:01
This slide shows you in a nutshell what Fraunhofer is. It's widely distributed over whole Germany. It's about 76 institutes and research units. That means we have 76 submission workflows because each institute is individual and has own library staff, for example.
04:21
We have more than 30,000 colleagues today. That means more than possible 30,000 researcher profiles talking in DSpace language. And we have a revenue only that you see, nearly 3 billion of revenue. 70% is done with projects of industry partners and openly sponsored projects. That means that's the main focus we have.
04:49
Taking this picture of this slide, I want to jump into or switch over to the Publica relaunch project which started some years ago.
05:04
We had an old Publica normal repository system. It's about a quarter million of publications in there. Some patents have been in there. The data suffered one prior data migration about 20 years ago.
05:25
And for sure the idea was to get a new system which should be a change from a normal database with more people traffic tasks and the normal literature repository
05:42
to a new platform which could serve as a data hub for linking and interchanging data, a base data hub for all kinds of research output. I think that's the general output we can talk if you talk about the trinity like literature, research data, research software, patents, etc.
06:12
And to get really a toolkit for science and open science. The idea of the data hub is not really a revolution. I think we are working in the same sector.
06:26
It should be a data hub to connect external databases with our internal Fraunhofer databases. Thinking and connecting it to the talks we heard before this morning about Kibana and reporting.
06:43
Sure, this reporting is done over a data lake. We have an internal data lake in Fraunhofer and Publica, the G-SPACE system is only a source of this data lake. So you don't have really the need to make something like that. But here you see we are part of a data lake like that. We are a source. And the idea is to be the open part of Fraunhofer in the universe of different databases, projects and initiatives.
07:12
So, as I said, in 10 days we will have our go live. We would say the phase before is something like phase one.
07:20
And this was a phase of early adopter. I will talk a little bit later about that. And our main focus was the migration because we had a lot of old data in different quality states and which we had to move to a new data model. We are talking about hierarchy, linking, connections, all this.
07:43
And so we changed to some minimal requirements in this phase to get the go live done. And now after the go live, we are working on the data quality to add more research output objects. And we have a lot of other projects in mind which had to wait because the go live, as you can imagine, was planned before some year before.
08:08
Okay. Have a short view about DSpaceChris. Why did we take a decision to take DSpaceChris in our mind as a system? When we started the project, we had a lot of specifications for sure.
08:23
We have a special Fraunhofer needs. But I think there are four main reasons why we decided to take the chance to take DSpaceChris and DSpaceChris 7 at the end as our source code for the new system. First, we wanted to have something which is open source. I don't have to discuss this in this round.
08:44
I think we are all on the same side and that kind. Then we had a special need for a very flexible data model. Third, we had experience with DSpace6. We implemented our research data repository for DATIS on the basis of DSpace6 in collaboration with the library code.
09:06
Thank you Pascal. And we had a good collaboration with DSpaceChris 5.8 for science for internal FIS system. These are the main reasons why we decided, yeah, let's go for the DSpaceChris way.
09:22
Now, why in this time, I would say we had two possibilities. It was three years ago. It was 2019 and we could call it early adopter or on the snail trail. We waited. We worked on that. It was a hard time. And we thought possibility one is, okay, let's make a migration to DSpaceChris 5.8 and migrate later to do DSpaceChris 7.
09:51
Or jump into the risk of being our early adopter and say, yeah, we go with DSpaceChris 7. And in this decision, we are able to push the project forward.
10:07
We wanted to be an active part. And this was the main reason why we decided to take the risk and the chance. There are benefits to be an early adopter, to be an early adopter. And we did it by sponsoring special features and cooperating for science.
10:22
A really good cooperation, I could say in this way. And Giuseppe, who will take over the talk, I think he will talk a little bit about some features we did together. What they did for us, special workflow part, the DOI and the deposit license part.
10:41
But Giuseppe, that's your stage later. I tried to gain some time because we have time later to discuss and to maybe I can show a little bit the system. Yeah, that is only a small impression from some sides of us.
11:00
You won't recognize anything. It's too small. The main story of the slide is that the layout matrix is one of the features we sponsor too. It helped us a lot to work on our Fraunhofer screen design and to accomplish special needs we had from our central division department.
11:30
The central idea is that we have four research entries. One, you can see the research output, some for the project, some for researchers and our institutes.
11:43
This is the only thing I wanted to show here. Maybe if we have time, I can show you later more on the system, on the live system. Infrastructure, that is our actual working focus. As we are in the final stages of Go Live, we are still testing what is the best possibility,
12:06
the best configuration of launching DSPACE 7 as a system in a productive phase because we have a lot of test systems. We are lucky that we have access to an internal Fraunhofer server cloud so we can create, rebuild, change infrastructure very, very fast.
12:31
We are using Ansible for that. But I think probably we will start the following configuration in 10 days.
12:45
We will have two angular servers, load balanced, well equipped rest server and we will try to use solar cloud with a zookeeper.
13:01
And the digital objects are saved or in S3. That will be our configuration, I think. But we can talk later about that if you are interested in that. The lessons learned, I think that is something I was thinking a lot when I built up this presentation.
13:25
I think there is one key lesson I would like to share with you. We had the problem, as I said before, that we have a lot of data in a really heterogeneous data quality.
13:41
And we had a change of the data model. And as you all know, the idea how to present an author and an author is linked to a person item in DSpaceChris is well organized. A link means it's a well-known person item.
14:04
In our old system, and now we are talking about the user habits of our users, all our authors were linked in the old system. But clicking the link was only something like triggering a facet search. So we really were thinking about how to present this old data and how to change this data and the new data,
14:28
which should enter over the submission masks or over the other sources and to present them in the same layout using the metrics layout.
14:42
And as you can see here on the right side, we have a lot of authors. Maybe they are not linked because they have no person item in the Chris system. But maybe they are authors in many other publications. And so we decided as a solution to build a magnifier button on the right side.
15:07
And maybe it's not a solution for all the life, but until we have data quality on all data that make it possible to only use the linking as we know it in DSpaceChris 7 as linking to a special item.
15:24
And we have the possibility for users to look up the publications of an author in a facet search way as they got used to. Thank you. That was on the fast run and I pass over to Giuseppe that he has
15:43
time and the rest time we can discuss, please ask and maybe I can show you the system. Thanks, Giuseppe. Thank you, Dirk. Hi, I'm Giuseppe Di Giulio from ForScience and now I would like to focus more on the
16:06
collaboration in this project with ForScience and Stronofer and what this collaboration bring to DSpaceChris 7 code.
16:20
We started collaborating with the front offer since 2019 with an ongoing support that results in a continuous alignment of the front offer public project code with the DSpaceChris 7 code.
16:46
And now we are also supporting the front offer for the Go Live phase. As we said, this collaboration brings many benefits to the DSpaceChris 7 because with the help of this project,
17:14
we were able to improve the matrix layout that is the engine used by DSpaceChris 7 to allow to have a configurable item detail page.
17:33
We also ported just some functionality that we have implemented for this project also in DSpaceChris 7.
17:46
For example, the correction request that allow a submitter to request for a change of an archived item if something is wrong in the item.
18:07
Or for example, the opportunity to create an entity during the workflow phase.
18:23
Next, please, Dirk. Thank you. This collaboration was a very challenge for science.
18:41
We needed to take care of a large amount of data that this project has. And this result on a better performance that now we can have on the DSpaceChris 7 code.
19:10
For example, we worked a lot to improve the performance on the importing phase with the DBMS feature or also during the solar indexes.
19:28
Let's consider more in detail also some of the main customization that we have built for this project.
19:44
For example, the configuration workflow and the reset of the UI functionality are two of them. Can we switch to the next, please? Thanks.
20:02
In this slide, we can have an overview of how the workflow of the Fanover Publica is configured. As Dirk has said in his presentation, there are a lot of libraries that are working during the workflow.
20:29
The default workflow system present in DSpaceChris 7 cannot meet the existence of the project.
20:47
So, we needed to build some customization by means of the configurable workflow that is by default provided in DSpace7 and DSpaceChris 7.
21:08
Just to have a quick overview, the workflow is divided into steps.
21:21
Once the submitter creates and deposits an item, the submission goes to the relative Institute library who the user belongs to.
21:43
The library team can approve or reject the submission. After the approval, there is a further step where the central library can also approve.
22:05
This means that the item is finally published or rejected. This item can go back to the institution library or, for the submitter who belongs to the institution library, go back directly to the submitter.
22:31
Jozef, we have a comment left. Yeah, okay. We can just switch to the final slide.
22:41
This is another feature that we have customized in the Fanover Publica. Fanover Publica is based on the current DOI functionality provided on DSpaceChris 7, but it's allowed to request a DOI during the submission phase.
23:11
Indeed, the submitter can ask the DOI to the library team that can pre-generate once and reserve it for the submission.
23:30
After that, the submitter can see this pre-generated DOI and use it for what it can use.
23:52
Yeah, I think that's all. Now we would like to wrap up with the question from you.
24:04
Before we go to the questions, do you want to show a short live demo or shall we start with the questions? Now we can do both. I'm now on the live demo, so that's our system in live. Let's see, Wissenschaftler, the gender police wasn't there, sorry to say that, but we will change it, sure.
24:26
That's only trying to find out, searching for some researcher like me, then you should find an item and you should see the name in the first step and you know the ORCID ID.
24:45
You can put the picture, the picture sure is optional, has to be done by the researcher him or herself. There's a connection to the institute where we are working. As I said, I'm based in the IRB, so we know there's a picture in there, we got contact.
25:09
And as each institute is in front of us, organized to another group, like you're working maybe more in the IT part or in the genetic part, so it's all linked.
25:25
So we can really use the new scope, which offers us a system to show how all the different topics are linked. If you want to take me the questions or if you have something that you want to see.
25:44
I have a first question for you. We saw that slide with hardware requirements, several servers, good web configurations. How well do you test and scope the necessary hardware requirements? Yeah, we are still in this way. I think Florian is here in the panel too, because it's under his survey.
26:06
If you're really interested, I think Florian, you want to say something to that? We made an assumption and said we want very potential hardware to set up for our go-live, but we are still planning to do stress testing.
26:29
We had a discussion for science and they said one solar server should be sufficient for the deployment, but we want to be sure that a lot of queries are handled.
26:46
So we decided to use Solar Cloud. In Solar Cloud we do not use sharding, we only use replication. We hope that we have three times the solar power from a normal solar installation.
27:05
The stress testing is not already done, but we are planning it before the go-live. Now we assume that our current configuration, which is very potential, could lead to a successful go-live.
27:26
Thank you Florian. Sorry if I missed it, but will the functionality of pre-generation of DOIs be added to dSpace CRIS? If we answer that, the library could already make a pull request with pre-generation of DOIs for dSpace CRIS 7.
27:46
I know it misses some unit tests, that's why it's not merged yet, that's something we have to do. But Giuseppe, maybe if your pre-generation of DOIs for this project is different from the one we submitted and if it will become available for everybody?
28:04
I think that if the community is interested on this feature, for sure we can port on the dSpace CRIS 7 main code. Great. So I think something will come up in this way. As I mentioned, we
28:23
have provided something already as open source code, as a pull request against dSpace CRIS 7. Publicly available, it misses some unit tests, Force Science has a solution obviously. I expect this feature to come soon. There's another question that I would like to put into the lunch break. Would you be available for some more live demos on the lunch break?
28:43
I don't know, five or ten minutes maybe? Yeah, sure. Great. So maybe we can put it over there. Then I would say thanks a lot for this great talk. Let's continue a little bit in the lunch break and thanks again to Giuseppe and Dirk for this presentation.
Recommendations
Series of 9 media