Open Science and Collaborations in Digital Humanities Part 3

Video thumbnail (Frame 0) Video thumbnail (Frame 1658) Video thumbnail (Frame 5319) Video thumbnail (Frame 8511) Video thumbnail (Frame 11315) Video thumbnail (Frame 12420) Video thumbnail (Frame 17458) Video thumbnail (Frame 20120) Video thumbnail (Frame 22047) Video thumbnail (Frame 22756) Video thumbnail (Frame 25100) Video thumbnail (Frame 26202) Video thumbnail (Frame 28184) Video thumbnail (Frame 29432) Video thumbnail (Frame 30528) Video thumbnail (Frame 31395) Video thumbnail (Frame 34453) Video thumbnail (Frame 36714) Video thumbnail (Frame 39204) Video thumbnail (Frame 40969) Video thumbnail (Frame 41689) Video thumbnail (Frame 42888) Video thumbnail (Frame 44004) Video thumbnail (Frame 45400) Video thumbnail (Frame 46356) Video thumbnail (Frame 49401) Video thumbnail (Frame 50632) Video thumbnail (Frame 52622)
Video in TIB AV-Portal: Open Science and Collaborations in Digital Humanities Part 3

Formal Metadata

Open Science and Collaborations in Digital Humanities Part 3
Title of Series
Part Number
Number of Parts
No Open Access License:
German copyright law applies. This film may be used for your own use but it may not be distributed via the internet or passed on to external parties.
Release Date
Production Year
Production Place
Dubrovnik, Croatia
Collaborationism Observational study Digitizing View (database) Multiplication sign Collaborationism Bit Open set Line (geometry) Open set Data management Data management Collaborative software System identification Quicksort
Touchscreen Social software State of matter Real number Multiplication sign Source code Workstation <Musikinstrument> Mereology Twitter Web 2.0 Mathematics Causality Different (Kate Ryan album) Hypermedia Authorization Software framework Multimedia Website Category of being Condition number Scalable Coherent Interface Area Observational study Touchscreen Information Block (periodic table) Digitizing Closed set Structural load Bit Digital signal 19 (number) Document management system File archiver Software framework Website Condition number Family Library (computing)
Point (geometry) Identifiability Multiplication sign Workstation <Musikinstrument> Digital signal Device driver Information privacy Data management Hacker (term) Information System identification Address space Dependent and independent variables Observational study Information Building Planning Digital signal Lattice (order) Set (mathematics) Connected space Vector potential Data management Internet service provider System identification
Point (geometry) Web page Windows Registry Digital signal Open set Event horizon Twitter Web 2.0 Broadcasting (networking) Radical (chemistry) Hooking Hypermedia Repository (publishing) Boundary value problem Data conversion Condition number Observational study Information Quantum state Projective plane Mathematical analysis Sampling (statistics) Shared memory Bit Digital signal Set (mathematics) Open set Vector potential Peer-to-peer Data mining Arithmetic mean Word Document management system Personal digital assistant Universe (mathematics) File archiver Video game Right angle Whiteboard Boundary value problem Family Resultant Asynchronous Transfer Mode Library (computing)
Principal ideal Complex (psychology) Slide rule Thread (computing) Multiplication sign Gauge theory Programmer (hardware) Radical (chemistry) Software framework Process (computing) Data conversion Standard deviation Observational study Forcing (mathematics) Projective plane Shared memory Bit Lattice (order) Category of being Data management Process (computing) Telecommunication Universe (mathematics) Order (biology) File archiver Pressure Resultant
Pay television Standard deviation Observational study Pay television Service (economics) Open set Content (media) Serializability Revision control Mechanism design Mechanism design Goodness of fit Search engine (computing) Repository (publishing) Different (Kate Ryan album) Internet service provider Universe (mathematics) Revision control Tunis
Area Service (economics) Standard deviation Observational study Digitizing Projective plane Website Cuboid Directory service Bit Content (media)
Coefficient of determination Service (economics) Moment (mathematics) Shared memory Video game Correlation and dependence Online help Metric system Software development kit Measurement
Area Web page Observational study Key (cryptography) Link (knot theory) Information Web page Materialization (paranormal) Code Open set Uniform resource locator Blog Different (Kate Ryan album) Hypermedia Videoconferencing Abstraction
Goodness of fit Information Profil (magazine) Blog Bit Metric system Number
Standard deviation Standard deviation Observational study Service (economics) Identifiability Transportation theory (mathematics) Uniqueness quantification Structural load Division (mathematics) Sound effect Bit Set (mathematics) Profil (magazine) Different (Kate Ryan album) Search engine (computing) Search algorithm Intrusion detection system Quicksort
Standard deviation Service (economics) Standard deviation Observational study Service (economics) Matching (graph theory) Key (cryptography) Real number Projective plane Moment (mathematics) Fitness function Usability Cryptography Neuroinformatik Element (mathematics) Computer configuration Different (Kate Ryan album) Computer configuration Universe (mathematics) Data conversion Position operator Condition number
Point (geometry) Collaborationism Dependent and independent variables Observational study Projective plane Range (statistics) Virtual machine Artificial intelligence Bit Digital signal Open set Plastikkarte Planning Data management Data management Goodness of fit Collaborative software Different (Kate Ryan album) Personal digital assistant Internet service provider Video game Energy level Electronic visual display Cycle (graph theory)
Observational study Digitizing Projective plane 1 (number) Data storage device Mathematical analysis Digital signal Open set Mereology Planning Uniform resource locator Process (computing) Software Repository (publishing) Point cloud Video game Self-organization Cycle (graph theory) Local ring
Web page Server (computing) Building Open source State of matter Multiplication sign Port scanner Computer programming Planning Repository (publishing) Spacetime Data structure Library (computing) Execution unit Observational study Arm Mapping Web page Projective plane Shared memory Data storage device Database Digital signal Cartesian coordinate system Degree (graph theory) Uniform resource locator Data management Document management system Process (computing) Repository (publishing) Different (Kate Ryan album) Video game Cycle (graph theory) Routing Data structure Row (database)
Standard deviation Observational study Gender Texture mapping Multiplication sign Web page Source code Latin square State of matter Maxima and minima Library catalog Digital signal Open set Term (mathematics) Formal language Arithmetic mean Mathematics Frequency Term (mathematics) Telecommunication Labour Party (Malta) Row (database) Category of being
Web page Point (geometry) Service (economics) Pay television Video projector .NET Framework Digital library Open set Subset Phase transition Operator (mathematics) Software framework Data structure Physical system Module (mathematics) Observational study Theory of relativity File format Weight Digitizing Software developer Projective plane Content (media) Planning Digital signal Line (geometry) Radical (chemistry) Process (computing) Phase transition Order (biology) System programming Website Quicksort Library (computing)
Server (computing) Observational study Video projector Multiplication sign Software developer Projective plane Letterpress printing Planning Digital signal Mereology Writing Process (computing) Phase transition Operator (mathematics) System programming Cuboid Process (computing) Quicksort
Personal identification number Probability density function Observational study Logical constant Content management system Relational database Feedback Projective plane Letterpress printing Database Digital signal Letterpress printing Degree (graph theory) Word Word Process (computing) Video game Iteration Text editor Process (computing) Cycle (graph theory) Singuläres Integral
Building Code Multiplication sign Assembly language Control flow Client (computing) Product (business) Planning Architecture Frequency Iteration Circle Process (computing) Computer architecture Computer font Observational study Broadcast programming Software developer Digitizing Projective plane Digital signal Line (geometry) Term (mathematics) Product (business) Data management Process (computing) Procedural programming
so will run through and maybe for about forty minutes now than technical he break and then come back to finish off so in this session against to open science and collaboration in digital humanities i know there is a view he went to the first week in hundreds of the first couple days and have a would have had some stuff that i can sign so great. the but this is a sort of humanity spin on what open looks like with the work that way to ring.
so in the first session that we can ever or these really exciting things on the first line of copyright i peer g.d.p. are the fair principles which i'll explain a bit more detail. d. identification and what we've called studio anonymize ation because it's actually pretty much impossible to anonymize things completely and collaborative tools and methods data management reproducibility and if you've got time in which we may not be a small practical excise and then after that will talk about openness. it's broadway and public engagement and then we're going to get you to be somewhat at the end of the day.
i'm aware that nobody gets excited about copyrights a but it's you know running a close on copyright come along every one may be well but it's really important because it does affect by significantly what you can do with your search in what you can publish so i put that it's a significant issue for digital humanities research but it's a significant issue if. everybody because it affects what you're able to get access to a particularly difficult i think for trans national research is the kind that we're talking about in this program because of all the different copyright frameworks that exist within different countries which is sometimes similar sometimes no overlap day for and cause real problems when you. during the day together from different sources but broadway copyright restrictions apply for something like the death of the author plus seventy years so if the author wrote something when they were twenty and died at a one hundred you would have one hundred fifty years when their work. is still in copyright and you can't do anything with it without permission so it's really really restricting certain kinds of access and in some players george orwell the author died in january nineteenth fifty so his works will only come out of copyright in january next year so that's that's a real block on what you can do it. what do you know now i saw so now you know. well you. you know i mean they said that within the family but somebody is not part of the family can't you sets so unless the family gets the commission on a yes and yes they can dig yes year in electricity for example james joyce's a state loads of people want to work with that and the family won't let anyone. have access to it without paying a lot of money so it's it's really and quite often you can't work out who would hold the copyright even to ask so an hour of took a little bit about that. and if you've got multiple author is a particularly striking example us with film its seventy years after the day on which the last contributed died so that might be for film the person he wrote the school you have to know seventy years after they died when that comes into out of copyright again so it's really tricky. to get this information.
the uk contacts before the copyright designs and peyton site in nineteen eighty eight and published material was in perpetual copyright say that means that a medieval manuscripts that was produced in thirteen hundred woods is still in copyright today you have to get permission from the archive that holds it and that one. now expire in twenty fifty nine because of the legal changes but that still thirty years when something that was produced ever thousand years ago is still copyright so because of all this most digital humanities research is concerned with data before nineteen hundred and that's why that has the trust material that we showed you as a kind. and for the and date of rant about nineteen hundred. and many of your going to be looking at web archives and they have really really restricted access conditions. so in the uk you can't publish a screen shot of the website without getting permission from the area and he may not know he and his and if that website with all coy to fifteen years ago he got almost no chance to all of tracking down here the anger is. and in the uk and france you have to go into a library to look at a digital archive related to the web you can't get a time for example and there are different kinds of restrictions on ip other fact that if he wants to every station media day said that's pretty much the property of commercial and to taste. and and they're not there to let people like us have access to their data that to make money out of it so if you've got a huge amount of money in us and mighty you can probably pay to access to twitter's archive that's not going to happen the most of us so that limits the kind of research she can do to something that's and pretty much instantaneous you start to lose. these data from the search a.p.i. for twitter after about seven day he said during any kind of historical investigation or deciding three weeks after something happened that you really want to research it you've already lost some of that material and you're not going to be able to access to it. so it's it's difficult for twentieth century researchers all of that data that is produced in film a multimedia unborn digital and the access is even worse into a lot of the older printed material.
in some of that's for commercial reasons but a lot of it is around data protection and ethical research and treating your data subjects in ethical ways and large digital data sets contain information about people that generated by people and that brings lots of legal and ethical responsibilities. with it. i think we should try to have at the back of our minds of my doing any kind of quantitative research which tends to obscure the fact that is more to put all your data points the people it just looks like information actually need to think about the people that it's describe him. so data management plans which i hope you're all thinking about have to take account of where data will be stored what you discard what contains personal information what doesn't to make sure that you're meeting your obligations as a researcher said none of my station or d identification of data is difficult and. pretty much impossible relate because you can start to aggregate data that you went to wear all of which then opens up some information that you might not have anticipated so we do need to be careful about what we publish a penny will stand committed to open access as a principal it's not possible to everything. and the potential for damage to individuals really increases when you start to aggregate combined data and highlight unexpected connections and things that they would never have anticipated that you would be able to infer from the data that you have and you probably wouldn't be able to see that it might be somebody else to use your data. the two and to identify something personal about somebody.
this is an example in the us are few years ago when some data about new york city cab journey was apparently anonymised and released in response to a freedom of information request and within a few hours a hacker had been able to retrace the individual taxi drivers on their addresses and lots of information. about the journeys that that customer taking and they did that to show that this was not an ethical release of data rather than friends malicious purposes but it really did just take them very little time and that was where the data providers have made an effort to anonymize it just wasn't successful a tool will always be somebody cleverer. and work out how to link it something else that will reveal the slayings.
and the board digital archives that most of you are going to be working with the event registry material and the web archive data that contains information that's generated by people who are largely still life said the stakes are much higher than if you're dealing with historical information where people may not be directly affected. and it's also the case that that digital data that is blurring of the boundaries between what's public and was private and people's understandings of what's public and what's private and not very clear. sample of that is twitter where i think people have the broadcast major they just to eat something and then they have the other mode where their replying to somebody and a conversation he said there is a completely public but i don't think most people think of it like that and think of that conversation as being something that's a little bit more private so they post things. that you know very very personal information in very public spaces. yet while. it will all of you. and yet two points is around fair use say you're allowed to quote the intellect to be published huge amounts or you can publish drives data but not the whole data says. so all and yet you know you will. yes or and it was like it's a percentage no money in the uk as a percentage you can fight to copy a certain percentage of distribute its people you can create a certain amount again it's very blurry what constitutes very fair use is not really very closely to find and and that can be a benefit as a researcher because it's not. not stopping new explicitly but it can also be a disadvantage because you don't really know and smart he said with a happy trust examples they can produce every word for analysis but only because she can't you start to than reconstruct the whole page said. so yeah it varies. but they use doesn't get you off the hook really is it's only for a very short obstructs rather than the whole data sets. and it's also a mean this is a wonderful thing for research is pretty much nobody ended up in an archive previously it was any at if you had come into contact with government because you're a criminal or you're paying your taxes or you're somebody important in politics that you would end up in an archive generally but now there's the potential for all of us to end up in. national libraries and archives through all social media activity for example of oil producing data as you go through our lives and that's great because people in the future gas find a ordinary people lived but not everybody wants every detail of their life to end up in an archive for people like us to look up and piece together d. research which later. but he is a great article by julian late which i can circulate and she just as taking as a given that all people want to be remembered by academic westernized history is a grandiose assumption i think she's absolutely right some people it will be great love for their families to find out about him in the future of the. people not so much and that's all we get with the right right to be forgotten discussion at all there's an archive exemption from the right to be forgotten.
i say and openness open science isn't easy they're all he things getting away from being completely open about what we do but it is really important. that's not just meeting funder requirements there are quite restrictive hundred in current requirements are projects like this about making everything is a punish you can say we do have to do that but there are other reasons it's not just about a mandate it's about wanting to share your said which and making sure that our work is freely available to the widest possible. audience the most obvious example is medical research a colleague of mine he was very very ill was able to research his own condition while he was in hospital to talk to the people who are treating him but it's not going to be the case but most of the work that we do but it shows you how important and valuable ace the material that we were. king on to be made a pen. and that's how we influence culture and society and make sure that everyone has x. ray accessed expertise and new knowledge not just all peers or people who are working in universities that can afford to subscribe to things it's to make sure that we can influence conversations and not criticize people for not understanding difficult. picks if we haven't been able to provide the answers to those questions openly to them it's no wonder that people don't understand climate change for example if all of the academic researchers looked away behind a paywall so we need to try and get that out into the i can. in science can just be a means of publishing the results and deposits in your data and i hope we will all be doing that but it can be something more radical i think much more radical approach to openness transparency and sharing which is starting to be discussed this concept of radical i unless it's not just sharing your and.
results some of your data it's about working in the open to the lifecycle of the whole project. this is something that universities don't like very much sharing intellectual property not try to look at the ways that you can make money asked for it later but opening bell said other people can use in reuse it sharing knowledge and expertise and being open about processes marty with an earlier is one is just results and talking about what didn't work as well. this what did the same much pressure on us as for searches to show that we're delivering value for money every day great things that we don't talk about what didn't work can be as useful if not more useful other researchers are reporting on ass excesses and it's also a really broadly about listening to the people he want to country. you to make use of our research the meeting those conversations outside universities and academia and that's the really important thread for this whole programme designed gauge with wider culture and society and it's easier for some kinds of research than others we've all been talking about each other's research here and not always understanding what everyone else is doing so it is. difficult to communicate complex topics but it's worth funding time to do it. happen if you know about the fair principles when dealing with data management.
no accounting this pretty much a kind of archival data preservation framework but it breaks down she can see on the slides to find double accessible interoperable and be usable and that the four key principles for your data management someone should be able to find it used to it should connect up to other things and it should be reasonable. i and the this is the force eleven framework which sounds very dramatic which has the various technical requirements that you need to fulfill in order to meet those that principals it's about meditation standards and i'm not going to talk about that today but i'm going to took a little bit about humanity stayed on the ice for things.
and so refined ability and discover ability there are very well established mechanisms finding things by publishers you know what channels you use your conference proceeding to use you near east asia repositories you katie and those tend to be run by service is expected to pay respect to university to pay. and so we need to kind of combat that habit of going to a normal places and look more widely to find things that might be useful. the subscription version sadly is often much easier to fine tune the open access version because publishers put great matter data on things researchers tend not to put quite such good matter data on things and search engine the biased towards different kinds of data providers as well so it might be open but it's pretty much invisible it's not describe. i probably its there you can tell your funder is great i made a person but no one is going to read it so that's not particularly useful for openness and i think that's going to become even more problem a more open access material that is the harder is going to be to find your particular piece of research so that something to think about the show going. along with the documentation and matter data standards and where you choose to put your data for other people to find.
that means meditator it means unique and persistent identify as and it means putting your data in as many places as possible as far as you're allowed to say when you're an institution to seven has like to know the he got personal website to pick a patch or website the more places its and the more likely people are to find it if he puts it. where they're going to look rather than expecting them to come to you more people find your your research. but we also need to be a bit more active than that it's not just about the technical standards it's about being more active about communicating to nonspecialist it's easy to talk to people who understand your very specific terminology it's not easy to do that to other people.
an open access to musa technical financial and geographical areas but it's just not enough on its own and it's too often talked about as an end point this is open access i don't need to think about it anymore. there are lots of notorious digital humanities examples where projects they were open we've got an a.p.i. and they don't provide any instructions of documentation about how you use it. and again from known specialists in particular just not going to understand what they need to do so that's effectively closed they tick the box for their funder that they have really made it open. so publishing without documentation is no use to anyone and in fact he went back to u.n. data after about ten years and it was undocumented you probably wouldn't be able to use either so think about i always say everyone hates doing documentation and someone in the rooms is no i really liked to so i just say i hate to documentation but it's really. importance. think about what audience you want to reach as well and we've got multiple different audiences for this program because of that wider society will aim six think about who you want to talk to. and how accessible is an academic journal article to someone who's and specialists to left education eighteen say probably not very.
so there are lots of services around that help you think about how we describe in present your search in a way that is going to be accessible to there is much wider audience as many of you had of key dogs. a really great service which i suspected something and get both up by major publisher quite soon enclosed often become less useful but for the moment it's a tool kit for researchers to describe your work in a way that will make it more fine double and more accessible and they describe it the three key things to bring your publications to life are to explain to.
share and then to measure the impact your having and how people are using and citing your material it's completely free at the moment so you can set up an account and you get metrics back for how people are accessing and hints and tips about what you can do to make your work more open and accessible and as i described here it's the opportunity to tell the story if you.
more research and connect together materials from different locations and you can keep updating your page to reflect your ongoing research of new materials demonstrate continued relevance that continued relevance thing is really interesting because we quite often do have such publish it and forget about said but the area that you are working on my suddenly come to prominence. because the particular new story or topic and if you go back to its and refresh that and link on to something that's in the public mind then you're going to get that kind of impact and usage and again at the end it all helps to make it easier for readers to find and understand your work which is the key thing about open science to me.
and you can find out very easily what people are doing with your way this is an article i published just recently with and and cheaper school on negotiating the born digital a problem of search and whether we like it or not that has all metric information about how people are relating to work on lying through social media.
they get free to air and on what metrics score seventeen seventeen what you know it's just a number but it's either a high or slower so if it's high it's good it's like it's not. and then you can get a bit more information in that about exactly what people are doing and what is blokes or twitter i had no idea if someone had written a blog post about it until i went in and how to look at this so you can sort of stewards your own profile as a researcher and do things to make it more accessible and.
see if those work there is metrics however fallible that will help you to do that.
next question to the room who has an ok to id. most if you ask is not quite all of you this is again a completely free service and the unique identifier stays with the you not your institution throughout your whole career as a researcher you can attach it to anything research grants but checked his daughter calls bloke her. most data sets and it's a way of bringing all of your work together in one place that's transportable with you as you go through your whole career as a researcher so i'd really encouraged to sign up is very simple this is this is my won here which i've sort of feel that with various bits and pieces. it's quite a lot there are lots of different places that you get encouraged to put your profile and that's the pain you know i don't have to do it again but this one is a really strongly emerging international standards so i would very much encourage you to make good use of it and certainly in the uk some of our funding bodies have now made it mandatory that they will not give you funding unless you have. one of these ids so i suspect that's going to happen more often as well but think about if somebody search for you online what would you want them to find. if you take ownership of it this is your opportunity to determine what they will find rather than just allowing the search algorithm to throw up a whole load of rugged and stuff that might not be what you want people to know about you so if you want your most current research to appear at the top of search engines put some effort him to curating it online effectively. i can say interoperability his next which i think was nice to listen i'm doing the highest fed story gets a certain standards.
and i'm sure you have to tell you all but think about interoperability from the start of any research project because it's very hard to go but can do that retrospectively if you haven't gotten their the beginning and relisted he won't do it it would just be too much work so you might think about the appropriate standards for your discipline and in multi-disciplinary project. slynn this that's quite a difficult conversation because they will not be the same standards in different and more he talked a lot about different standards that's going to differ whether you're in computer science of data side so digital humanities they won't always be the same so have a conversation about how you can make those talk to each other. and think about the services and resources that are natural fit to link on to your data have they been put together how a day euston and let that influence how you decide to do it. someone else has done something really well and just hope he is trying to reinvent the whale because it's probably reason that there is a standard has become most common so you don't need to do this tendency to think my projects not quite like anyone else's so i don't need to follow the same standards that everyone else this or any to adapt it slightly of the humanities. such as a particularly pranks about my project special nothing that's around at the moment is going to be any good for it. but you know the main standards of a match for a reason.
and reuse ability as i said i said maybe i don't like doing it but good documentation the sensual license saying at the mall position permissive to license the more likely it is people he's your work that's self-evident really but it's the people in the humanities have a real. problem with their its and they do not like publishing things as i'm c.c. by the crypt you wanted the creative commons license things system. yeah ok he said the most open when the c.c. zero which is i don't care what you day you don't have to say where this came from you can just taken do anything see she buys perfectly fine for most purposes a lot of people tend to the non-commercial elements in their than the and that's that closes of. awful lot of use and not just a big commercial companies it might stop your own university putting on something if they run divertissement on the page or they're using it to sell tickets to a conference depending on how you interpret non-commercial so the most open license he can possibly do and be clear about the license because if you publish material any haven't said the conditions under way. which people can access it they might not use it because they weren't know what they're allowed to do to tell him what they're allowed to date and try not to close down options i think is the key thing with license thing if you're happy for people to use don't shine say what you can use it for that but you can use it for that or bad make his i can she can and this. pretty famous quite now by reefs pollick the best thing to do with your data will be thought of by someone else so don't shine pre-determined use that people can make with your work and.
so that's around about openness is this a good point eleanor display for coffee. yes ok on hand to martin that case the collaborative tools methods. ok so i'm going to go through some examples of project planning some data life cycles a little bit of a rudimentary data management stuff which might seem like common sense but a lot of people have known don't necessarily think about it and probably not going to do a little bit but to. the citation but i will talk about crowdsourcing because that's more relevant for the machine learning stuff i'm so with project pointing and life cycles. you need to basically your responsible for your research and the collaboration and partners that you bring into your research you need to kind of define exactly what the responsibilities are for those how they're going to work together same with the supplies and infrastructure you're going to have different levels of access different requirements have access to. for computational infrastructures so it's important to clarify the roles for each of those is a huge range of institutional support so i just grabbed all of the logos for the partners that are involved in this program and and then all of the lawyers for the data providers and i'm sure you've kind of mingled and spoken with a lot of.
i'm not sure if you've kind of looked at all of them individually to see if they might have data that would be useful for your research project or just the ones that you thought might be appropriate for you so you reach out and explore even further if you can and then as jane said there's way you put it and how you make it accessible put it in places where people can find it put in many places the.
people can find it so i'm claire and his research network we can publish a lot of research data get hobby is used more and more for publishing research data it may not be the best place for it but it is another location that people can use to find that one dr is our institutional repository. they have something similar which is the cloud storage that you get with your local institution and there's other organizations like the european open science clout in the lives of digital humanities that the networks that you can plug into to find other opportunities to publish your research data to find other people who are talking and thinking about these things as well. so part of the life cycle is well this with million to a lot of you he collected data first created then you do go through an analysis process and then ultimately have to store somewhere.
is the anonymous nation problems there's the copyright problems of storage which really about what degree to publish how much to publish and is usually a lot of work involved in the arm then there's the sharing and putting it in the locations as jane mentioned is to get out there so that people can use it and find it and then i'm not to show that you begin. down this route so much but digital preservation this is putting it in an institution repository of digital archive somewhere so that in fifty years' time it's still reusable.
and just an example in a life cycle project in the digital humanities for the last decade or so a lot of the funding is going to tool building and one of the resources that was produced a research project was the liberalism in the americas digital archive so did digitized to what to make sure how many williams was there is one hundred and twelve. hundred sixty records which was but twenty three gigawatts of page scans are the server cost fifteen hundred pounds a year to host and the research funding application didn't have any provision for keeping the state online post research project so when you think about data management and research funding the onion you pay. each day programs now but in the future you need to think about what do you do at the end of the life cycle with the research data that you're producing so are we have to migrate this to another positive or institution or poetry the sounds like a simple process and just lifted from one database and put into another database but there was a kind of structural mapping we have to consider.
with the academics the meaning change of the terms of the vocabularies and that we used in the middle data and so took a lot of back and forth and took a lot of time and in total it took about ten days but with all of the delays of the communication going back and forth actually took six months to complete and this was just taking data from one source that we had complete access to and putting it into another source that we had.
complete access to so i think about that when you're publishing your data for people to reuse in the future so getting back to the standards and the openness it makes it easier for other people to pick it up and work with.
speaking about radical openness and transparency are just quickly go through a couple of the example digital projects that we've run which are less page the research projects more online digital tools on the first was called british a strong line which is run out of the institute of historical research it is a digital library.
from resources and spent fifteen years old. yet so they've been digitizing texts double wreaking them for that long so the ninety nine point nine percent accurate and that was written in ice pay dot net i had to be running in a speedo net for a decade and the research project was to transform the and rebuild the sought and because this is kind of pinned in as a subscription. service days there was kind of modules that pending to the financial institution departments they was subscriptions that plugged into the library for access to subsets of the data and the data itself was in and on standard format it was consistent throughout its own said but admitted it was stored in relation to databases the external. content was on the fall system so there was lots of aspects to consider here and so the project faces that we went through was in order to have the existing system and structures a planning for the reform nation of those structures and then the development phase and then putting the new site obviously into operations afterwards and the business systems that were impacted by. each of those was the sale system because those revenues that was coming in quite regularly for this the editorial process needed to keep going so we wouldn't want to store the digitisation process during the rebuilding process so we wanted to make some provisions to keep the old system running and then do a quick switchover once we had prepared for it but that required some kind of tactics and. question to us which things over with the i t departments and then it is obviously that the technical requirements were shifting from an ice paid a net framework to a pitch pay based framework so this skills requirements in the teams to take on operations for these sorts of things so and this colour as a nation is kind of my way of representing how.
this was an acclaimed waterfall base process it didn't go one of the time it kind of went a step by step the yellow is informing the brown or the brown is investigating forward and eventually we kind of got a development server into operation was still planning on exactly how to change that and so it was not a clean process. it was lots of these boxes in these two facets of the project plan what kind of feeding in off each other and different people were responsible for different parts of it at different times and what we fortunately ended up with was a kind of a relaunch with a continuous deployment in a continuous operation so that was quite expensive process. but i'm just an example of the sorts of digital projects that we've gone through another example of the methodology is that we've used in some projects is a full history of parliament this is a traditional research print publication process but what we did with this one was transformed through a an experimental data.
place so we put the external database at the center of the publishing process and kind of pin the print the traditional print research life cycle around that and when that with quite a lot of short iterations so once we had received the word of us from the research is that we're all signed off to a certain degree of quality that was then put into.
would ripple that amount of drivel content management system which had an experimental schema for exporting from the relational database the experimental was not really standardized but it was tailored towards what in design could recognize the in design and picked up an applied style streets or talk a graphical style sheets for the transfer. question and then ultimately went through the traditional way of p.d.f. iterations with the editors for corrections and the like and went off to print so the what we do with this project is put the external database in the middle of a traditional publishing process so what we can do with that at the end is take the extra male and ultimately make that. open data.
another example and all stop after these because we should have some coffee is there is a lab hawking's digital lab at king's college london and their digital humanities lab that specialize in producing tools like this and they quite open and transparent about the way that they operate the way that they build maintain. ability and sustainability into their research projects and it's pretty much in the middle where the circles i hear it's an agile development process prix process prix projects tell go through and not consult with their clients all the research is who have funding to build a tool of some kind quite in quite a lot of detail discussing the. project with the pardon is talking through the online architecture is protecting requirements a traditional project management time activities things that you will have to do in your own research project if you've got collaborate is you'll need to coordinate with them in a project management style ultimately the two coloured lines the evolutionary development in the deployment is that what we traditionally think. of tall building just hacking away in producing code but what they also have with this is post project so when you saw not to build a tool and host a tool with kings digital of the very clear that what you do at the end will have to be archived and switched off at a certain period of time so they require you infrastructure to be running full i think. five years and they do that so they can maintain their own risk sustainability for the teams and their resources so these are all kind of project techniques project management methods that are not quite the same as ph d.'s but if you think about your ph d. in this kind of procedure way in a project management way you'll actually kind of. be able to prepare for any publisher data in clearer waste and the methodology in the documentation for running research like this is a by product which is also something that i'll talk about think off to the coffee break. k..