Open Science and Collaborations in Digital Humanities Part 1

Open Science and Collaborations in Digital Humanities Part 1
Dubrovnik, Croatia
i on the nz you know it scraped well by o. can every but take i ions way going to get you a ak crush coasts i a in digital he money she's today i i'm nd sayid this session is maini about or quanta to to his in kuala to tiff math urdd ce but we star self ste at just talking a little bit about what digital you manage he's a is.
is not even people he working des jeux hm humanity ff can really agree about but so oem re him tort three some of the differences that meat on suh qual isse is if in quants to tiff meh said zz look it day ce ih context an pravin an switch as sudden it's a very important the digit humanity sir search and and then have a practical x. the eyes easing some of the he stark or day.
they ce her at their hattie trust in the u.s. and an just going to hulled it a mmmm and then the sucked in iin session willed dive a little deeper into things like daish of formats and a taishan language structures and sour mm sheekey ok he said a watery ees des jeux alee humanity a's a mmmm the best.
scrip shun of it i've pean able to fines ak she the won on wick pedia and that you'll see from they ce the date is not sure touch it's a very capacious or definition and includes pretty much whatever you want to ing played if you're doing des jeux of the humanities and his got a des its if you doing humanities for search an is got a digital alum unt to or it all erg you ub lee you are doing.
digitally he mannus ease and i think nats why a it's so you difficult to define that basic lay a a and and area riscal early active assi it the into section if computing or digital technology used and the disciplines if the humanity aids and and it includes the systematic use of digital resources as well this the reflection on their at pick ation and i think that.
tense it's really important it's not just news ain digital tools and methods it thinking about what that dollar's for humanity sir search gen really how does it change the way the historians were called lng greece work for example a i an this is just to cherie that to ge early munshi says a pretty new khan set dutch e.a..
the ang nd eight used to be kotey man as he ce computing i'm night goes but a law for are there are but all mace an ram which its of ron on the uk web archive the tea terms craw so aver only in t. thousand a nine so there's still turkey about humanise ooze computing at decade ago and them sudden lay to ge of he manatee ce to call and as in you way if.
thinking about doing digital research in the humanise seas.
or just to share you ever qc quite hell ironed much people og year about what to ge a-t. humanise heath main that is a website called what is digital you man's you stalk call mm a ol kate aznar i we ca i can with highs on here.
a any weight it refresh as it self every time you click the u r l and comes up with a complete you different definition and their a hundreds of them in their a including des jeux of he mullah pte ees doesn't actually exists disa disa played so that's the whole spec sherm of people talking about walks to ge a-b. mousy say his cry i din courage you to guard have a look at that and then you can see the.
for it kinds earth definitions that people come up with orde if that's not very helpful when someone our ski to explain watch he tees that you do you is they were searcher.
so just a little bit of history eye exam he historian so that's always my starting point is to think about how did we get here the beginnings if humanity ce competing a usually describe spring the work oaf the jesuit prix strub etape bew sir he began to creating index of words in the writing suh thomas aquinas sin nineteen force you nine and had a whole.
team of punch culled up or h. is doing that work shull him and that's why haven't got a picture of or besh a piece or up pair i've got a pitcher of all the whim in who did the punch culled who work and have been raul that neglected in the histories ove it but i ch lee it tens i hal pts that a stanford lit to cheaper sep profess ical joe sfi miles actually colt their f. first and just about.
in thames of thinking about they swe irgc so i i think a heck take a weighty humanity pts people is that they story of waun individual sudden he changing everything a's is not or you the why suh go it's always a team effort and there are mall people a volt the new might have the weiss they nw.
so a some things that digital humanities might in clade and most people way into paul if they east that anyone working in the failed hull probably d. sri or four oaf them add tex jewel an alice ace is really at the heart of it that's her it started and it's where most people still ficus there are tench end and that that's app city.
the the core for they scree purrs whelan thinking about how we anna lice taxed osei relevant fee you nyi media studies and multimedia that's increasing me p coming something that people aware king were ste mapping in spatial unal or say ist that's a very big area pretty much any des jeux utley manish ease project at some point a member of the team will have said can we.
of a map for their ce how can we pull they sauna map i cre a scene digital materials thinking about digitize ation itself not justice i'm a can in cull price s. the happens in a lie ber he but something that in vo zz achatz every choices about the tex that year in clue to the method she a use and so on sheekey a lot about or.
this goal early communication using digital meth a aedes him most so cull it'll base about that will you king in erg ajman laich or own today and using a in t. chang aei using digital tools in teaching is something that he manatee ce research is there really in tris you can doing in exploring how that chain just the way people learn and the things he eat kuntar in clark stir in sheekey.
and digital eth no gra shea he stayed not just studying pte witted data for example but talking to people about what they think they're doing when they you session mead year in of serving hell they do that and what that means so ossis researches were men looking at that that dater what you have to take you council ff i an increase in.
me now haven't to lick a extended tree allah pte a he and t. are aei are and so on an and how or searches kunde allah life lap in n e kind of coin to to ff way a in the shul each it we even preserve start fix future and al assess just to give yoon in sample of a digital he manish ees project tutu they range uhm the barry daw large to the.
very small again ist really did sort of capacious stephen ih shun of what fulls on des d.-at eight cheech and this is earth fantastic project could chasing london com the ix in brit new nor stray ll year the digital panoptic come and it brings together a whole range of historic or source ers ste to lit at what happens to people and care to riyal kabba.
i am doux to look at pts and.
what happens a people he was send sense to tram spore taishan or han ying or i nother form of punishment in at britain in the and ninety's century pert allman unt bay and there can you journey sir hugh the a criminal say stumm when not possible to do without bringing together these different source's be cuz the rei chords of there are.
a rest of their core proceedings and them what happen to them a oft of herds where held in library suh cross the whirls ck predominately britain in australia but other places tay so you could get bits if the pick cheer bit you couldn't see the whole journey and that's what this project his starn and i am an vision lice their results and founds thumb really exciting new things that we didn't ria.
a lice before so they ste rep rozan zz what happens to people he we sentence to dess and it turns out that almost none of them were ek sick each it so if you'd stopped with the court rei cords you would have stuart why our the sussed terrible i were x. to hugh ting say many people but predominately they were being tram sportage bore they had ol or shortest center not score in some.
aces what even let off altogether so a its that's really at the heart of tidjane he mount a-t. says bringing together different source's an and allies ing them to come up we cme huge findings that we wouldn't have nine about other why ce.
and at this is another big project which is just started at the british library in the uk a chord living with machines ste and the aim of this project is to use all of the digitize news papers that have been cre ace it to investigate people's relationships with mush eames in the indus pte or revelation say a add trance bought and sold.
factories how with ease king's disc awry he ved i'm did people give any agency chaa the machines how did they tool ck about the ff ek to the industrialise ation was having or mad law i eaves and doing that at really large scale am using at and and him ext your data sign tears ska mm p. to sign tay ce humanities are searches so there's are the sort a big projects.
big collaborative projects that ted's to character ice d.h. and then of the other and to the spec sz him you have something light based sayat which is pretty ce by the museum of london in the year qed a for the four hundred on averse surrey of the great fire of london been large parts of the city ban to helm i miss is aimed at children it's said allows an to.
he construct the great far using mine curl ft there's a gay aim there are teach he rizzo sirs so these as about teaching and publish heene erg ajman pts and using digital matted sss for that https so you got really high level re search but your ce that got they ste using digital tools to get people in hte rest is in research and telling them something they dec now about already wow and it's great fun.
i'm and brilliance sound effects n you can reconstruct how the fire spread let kind of thing.
saw i'm a and that's just to peri sz o u's of year you and will try to put a little more d. tello matt but the tech i pretty much everything falls under d. ict shove he will mm tit tisch but oak bob pas they rison marcy she took a little bit more about collar city a www a erg ca nz to to www methods.
can soa lot of those project so the ember result ove www a development pasa a sore research per gram pp an ch a lot of design iterations going to that so add job development is spel about itcz or aisha gns and research is all about iterations and the graphic design process is aback going around or ended circles.
and the just one of quickly go through aaron drape ll un's vidia or description of how he does on the logar yet n n o u what comey thigh data but i think this is more of a call a tide of meth a that's used in graphic design industry so ho stott was sketch you on sketch book what's have little.
little diagrams of the logos the he's conceiving annie's mind yet variations on those then you kind of transcribe those into his veh keita ce stott to put more des fun ish an to them.
it in stop to go through more versions and mo iterations in the graphic design software ste soeur what he do don't piper ease no doing on computer and she starts to give more calm cratered mall mo defined more final with ease does on.
and keeps.
replicating erg little objects capes it aerating on the design realize aziz going down a wrong paf goes back to ease a regional sketches which is back to a different medium that he used in the power aust ump to re inform he's next version on the compute on so he he's going back and forth between different mediums he's created multiple versions of low.
girs and he's referenced old versions and brought old versions back into he's funnel des zone so isse is a very fluid at the very don't amec a very on de font process its kind of figure it out as you gay or i lock the look of that this is working this isn't working it's a very qual the tide of meth a it's difficult to prescribe or but it's very iter it even it's very.
or tree.
on the other saud re such projects ten to be very lynn ia yet you get questions designs you clicked you data you perform an hour asus and then you present your results.
and i've noticed at conference is that i've been to a lot of what's presented is exactly that it's a very leni at explanation of this entire processed that you've going through previously but this is not really what you did this is how you present what you did yet and so your presenting it in a linney a fashion and des ce trust rank is an example.
i'm tech look at them they basically taken the the page rank jabr is a many in flu in informs d. og or him somehow with m i guess or thor a-t. of new source's to come up with a different metric full trusting and ranking web saw pts an documents but again it's just a farve point leni.
pl and i shun of the ri such project which distils a down to something that really was unt there was a lot more to the ri such project behind the sainz and so a lot of des jeux alee men it easy's about explaining ewa methods as you go and documenting you methods as you go or not just document in your outputs soeur a just go through quickly the a the des finish un's of quanta to to tiven call the to des if his.
this the des za ready kind of understand what they zz on have any sense of what they are are ready ing known a-k. so ck wanted any of his about object to phenomenon it's about the facts your ff its aback converting the phenomena www the observable zz that you're interested in in today tone and it's about op oth asus.
it's the n. confirming you her pot the sea ce making deductions at cetera the qual it had of process is really about the experience as a source of truth so there's a distinction here between facts and truth and will when we're looking for truth were looking for the context wih looking for the social values where looking for the way that people pp.
a c. and explain the situations around them were not looking at measuring those papal it's about meaning a it's about exploring the meaning and aback capturing that somehow and then these two methods you can kind of bring together a n what the call mixed methods and there's meth that there's there's methods of bringing those together in regular way.
days a m but really it's yet it had so it quanta teddy of its some pyrrhic ol in its data driven it's about converting phenomena to measurable objects which is dh what you might culley unit of an ow cess and you can define that in many different wace its starts to do with to to ce tickle tut technics once you've got more and more day don't yep to anna was a so yer.
i you dealing with you mare aqel in for my shun and you get into probability seen i what you doing is collecting data to ensor specific question about your hypothesis so it's very narrow are in it and ick a the method continues generis so ff down and you have descriptive an experimental taught so the descriptive talk.
these way you would measure the subject hua nts and the experimental talk bees you would measure the subjects before conduct a bit of an experiment and in say what the changes were after woods and so the benefits so that its answering the what the when and the where this to to ce tickly significant result so it's a post positivist piste mole ajai.
and i'm kind of low was cost because we're gook computing technology daughter made it a m takes less time but it limits it limits.
what it ants is only the ri such question that you powar zz so you you'll are under going to get one answer out of this i'm a dozen ats a wat i.
you so the methods in www old in capturing dead aleck this os oeuvre ace question is online polls manipulating sec under dotto which is a lot of what you go as will be working with not just core purses and falls' but databases in linguistic cope aura and then statistical programming gets built on top of that and kuala tidd a.
on the other hand is os pp you at it i is this right not is this true so there's a pragmatic alum unt to it it's the about bewkes the experience and the observes as.
which is about the subjects so it's a subjective asst a piste mole ajai new collecting subjective in some matic information and you have to do with the coding of that information in order to interpret it and group it together and make sense of it.
an you dealing constantly with changing triche an frame so a it's quite difficult to berkow app sz are the same information a second tohn around so if you think you at interviews for example you could interview someone and then ext to me interview them they going to give you different answers to the same questions that you asked them and so there's there's not really a fact to that is an opinion.
so these a phenomena logical which is the person's experience is that you trying to capture ethne a graph eat is how these people fit into this social contexts where the cultural contexts i end reception is really about you the the way people in terp or information all recall information repay tidd lee so.
so if some as had traumatic experience for example and they recall ing and experience mull to pall taw ins over the osce tot to re form that memory and so you need to be aware of the context of memory recall changing what this saying as they erg or which tso sort thing you don't have to worry about with call teddy ff measurements you so this is really providing the watering.
and i'm its aback gathering personally from ation and that would talk the softer noon about an on my as i shun and the the i'm g.d.p. our requirements for those sorts of things i'm the points of the papal i'm it takes a lot more time.
and you caught rir paid it as i mentioned a amen it doesn't yield stats so it measures of him pyrrhic was significance so ce not probabilistic it's much more chg ince if to conduct and.
andrs the ways that you conduct it i really with interviews and focus groups k. studies which are reports having conducted interviews in ficus groups ethnographic the search which jane mentioned earlier a discourse analysis which is kind of looking across a corpus of literature a end inter per ting the theme zz and the ways of the argumentation that's been written down in.
there's corpus a a bei zz irving papal end kind of looking through secondary dot a would be examples like diaries all written accounts of the pa ste and a lot of those a hand written soa already there's a technical challenge of getting no is into some khan of computational text you know the to do anything with it so really a lot of call teddy re such could just be reading these texts and looking ye.
this in it or are.
and qualitative you're at or i or are there are but yep a.
a-k. yeah guess that would be the difference between a surveying a poll if your polling you'll the os king fu more of a one to five measure on somebody where's a survey u b ah skew for descriptive response to the ants as you.
ck and mixed methods is a back combining by ther vase it kind of explanation driven and there's his term called abduction whiz so there's induction d. duck sz an and abduction at duck sz an is kind of ha paf the sizing an imaginary result and then building it back woods or reasoning from that hit which is kind of in the op ursa direction from in duck.
cont andy duck shun an terms of analytical methods.
and buse ing both of these gives you the benefits of the statistical analysis you get the benefits of the coding inthe the matic can tepper taishan which takes ta nw to do that's a very iter itif process to label in re label and clos a fart an tex on of mies you content i'm the a a or you can converge on these ste two studies and you could do them really into defra wor.
rice so you could kind a run kuala tell even quanta dated method simile tiny a slee in kind of bouts what you find from one off the other a where you could do them in a sequential way which is a kind of most more it's a most you quench away you could do a qual the teddy study first and then what you fine from that buik could inform the questions that you stuck tricks plough for your corned for you quanta teh div experiment oil you could go or are.
but the other way around you could stop to be a bit more open ended him brew ailed and stott os king random questions and what you gether from though is you could then ff for more prus sauce questions that you want to go had a measure with a quantitative a perch soay at this kind of ads the costs to a cuz you combining both he automated ck want to tide of measure and the the more expensive time consume.
tidy of approach i'm but econ of get to bounce off h. approach you'll discover things that you would have eye suh ll aided when you thought about a naira re such question with a quad just a pure call the tidy were perch and so it really kind of gives you mall exploration tug conduct that does on process that usher g b ful and to this is an austin.
lil diagram it just summer us is them quote he's a lee really aim for the one in the middle.
a using mixed methods yet.
an ch i hide flee that was useful a lit pos over to jane eh to talk about data contexts.
thang keen rights an the this is the des pay that i think mace t. man is he's reset ges really like to talk that's aughts that i and we do you cre a r n day ted and a lot if an he might he's are set just will say that they dying sss eye and they thing they don't have for search tater.
i and they have text an but you nour your you've a rain should or structured it in some way a to try to tension today pte said but as their that might be a transcription of a man yi scrip der a database it names in places it yet strache tidd from other material but most of the data that and humanities people wet quit his stir oi ft from prime resources light new.
his pay pose books letters pte say shull media s. aced taken from somewhere l. https and and often it's from multiple ce somewhere ulcers and maemo pssst to store ins light me white wet with a single source of data a you been bring in together mm multiple sources of data and trying tear and to what across them sheekey and they might be hell by different.
a lie breeze all kaif ce commercial organisations be from different periods be in different formats from of it might be to ge attal ready he some of it as marty said g. my have to des jeux a tyne ease i the firm handwritten text auld fr ole much an printed taxed so sheekey getting v that dates preparations stage takes cancian ulm tyne.
and and you might be as a said combining digitized an born des jeux asst hole and knowing howl they ate been prejean ust it's really important in helping ied to duda effect if laid other weiss you chef flatten al out too many if the distinctions between the day to types a new end up coming to conclusions that and not probably going to be right and i'm not.
that means and wiccan to took about a pen publication a ick consign slater but this is a particular challenge when you dine i own new ein dater and and off earn nd cumin ear swe have not haj vt the brights to re publish ch that data sets beak as they ft been collect tick from different people a ne'er all sorts of different copyright stan hr an and to give an example am all qt hissed.
store in zte struggle with aces or lol pts they can publish text but it's fos expensive to publish the images that they pee mwai king ways so that sir real challah nw nj him where king open lay when you were king in digital he manna sees a i think a fundamental principle of she manatees for search estate u. conn pts interpret your data praagh plate if you don't know.
they were it came from who pretty stay itcz how it was pretty ste and white was purje ste just getty new daish a set that you date nair anything about his over limited value he a an the importance of those different elements is going to vary to pending on the kind of or search questions that you want to all sc and for example yi made very welcome.
all knock hw wace be i bl to my rwe oi something was purje speak as that so such a subjective piece of information puttick you the further back you tined you go a but if you've got an idea a who it wall oast will how it was done know where it cane from than that can re ailey help he stott tune tut pricked the materia that you were keen way.
but ak she on the why this huge i'm out of digit e.u. man if she's for set to to people's makes of aisha gns i i mean i know out it's what king with which pedia and the make shove a shuns sippy put to get and vote in creating that dater is a really interesting airier study just on it sime.
and probe n n it's its kind of quite similar to context wedges it come from but i'm in and our kine evil contacts on they were king with i archives digital an otherwise as a very very specific meaning and this is quite from the sayah tif american arc at their ste provenance is a fundamental principle of all kaif sir fairing to the.
she will family a or organ i sation that created all risi aef the eye pts un's in ick collection so that's the hue produce this why did it come from and where his it ended all and those of bear eem puerto n't on the principle of provenance dick tate's that wrecker woods of different origin's be kept separate to preserve ther context so a all.
our eaves what explicitly to separate chell to the day shit that we is for such is then want to pick but together again and and they may have re you were going ice tit a coating to all kaif ol prince pulls to reflect they ce provenance that's not very helpful for rhee searches he want to in tet prus it in different way salou long different ak see sss and say all of.
they ce its kind and going on behind the day to be shore you can even starts to wet could it pts.
sz a n this is a really nice creigh sz i think which really to falling to book the meme by khan taxed that pertain simile tain you slee to physical arrangements social relationships situation on to f n isha gns temporal my mm a nts and distinct like calve zz say whetted a haj purred here with a people who were involved how did they relate to each.
each other what was the environment like that they aware ck yn you and what with the tools that they had available to a tim i am when did it happen i've how long the perry eight if taw ein with that have changed the way daiwa way king it's marty said that if you come back to people who recollecting things a for time that changes i am nd am uke a lot to time his spend.
that's fear ice thing about this because we don't gul wace a know a but having in a way aaj nda isse that these of thing she might want to can stater he something that's very important to digital he manatees sz and that context is low i bl as ste i a mmmm social scientists a describe it to collapse particularly in relation to born digital day cer and the i'd.
sears context collapse is where people information and norms from one context seep into another stay were at you cart really ted the way you're reading something it's not the wait was intended to be read oprah gees but you don't know you that because she just got the data without its come taxed and so that specific term of context collapse was coined by dh.
dana boy eight in the alley t. thousands but that comes an with context him provenance and its absa gns it runs rights tree digit he humanity ce are set ched.
sz again coming bat today ce add to ge of pun ott to come pro checks they have a very very large section on the website which describes exe that klee what rei cords they been use eying how they were des jeux tie zz where they've come from when they were praa chea aced he pretty stan because that's the so to starting point tree digit he humanities are searchers.
and to be i don't just want to use this set changin i want to know who what's in their how was decided what would being klee tidd and where it's come from so you can see just for the criminal registers from seventy ninety one to eighty ninety two you hear and the arra gia gns an contents but also strengths an imitation zz what you cal in ca n't trine tay.
yeah hr and how it was des jeux tice ste and guen the method to digitise ation is something they get studied a lot.
i and and that context is all whom not very often visible actually most to the big digital projects go out of their way to hide it because they want to give your an eye search experience they want you to have this kugel likes the arch books and your get and on ce or that it's meaningful but it may not be represents to www of what's in there and.
it may be wh high you day ngs some of the problems with the underline dater sz soay this is the all kite of persist nice papers at the ber sheesh lie pp or a and to my immediate problem with they ace is a at talking at me spake has which rick physical of jarre pts but looks of different the news grounds together there are siege and ragin a particular way and that tookey bout pages.
they say meath got twenty five million pages but that's not what the original muse papal it like that so ready decision to br age kit down to a pay h. item mm level which is changing the way you that you think about and new sat material she just get a little bit batter may diggin to it in sang we've got muni four thousand ish she news.
these and ove them year range an so on for one particulars title but it's changing that physical ohb jeh qt and obscuring have wooded been yea eased and that something that and i as a research you want to be aware oaf.
looky it digitize a shen and the things that you need to know about this is from the internet i archive ft and this look speech full marriage to a suffragette sketch of modern life so that's out i the late ninety their alley twentieth century and that's what somebody looking at his season you can mead that per ff he aim i and when you clicks true to the under lyme i'm.
c.r. nd the the name as the authored the title large parts of it have got air is n said a what you caught find things easily a if you do any kind of in al assists if have free cont lee occurring terms that's cool going to be role you'll know that i have found five instances of suffrage in the here but and may.
maybe sz fifty and you just caught find them and when the workings of the digitize ation hidden mm then you're really guessing misleading re celts drum using this kind of data and it's all credit to the inter now archive that they do let you download dart oh a.c.r. said ie can really see what the problems ol awe and change the research questions that you.
or sc ing as a results a i i'm a mutt brings me own t. the hattie cer aust which is one of that the big projects in the u s u e cement he may node outta it a kind a if i'm all turn ih tay www to google books in some way a ce all their it it use a some gyn goods iter thaw pos ever to marshy.
so grew gould des jeux a toy hrs to lot ove the well the google books corpus sz sc a and hundreds of thousands of wool millions of lahr brie books run them through a owes she are and then made and engram viewer which is what most people mutt be from ill you with the wood frequency view a for the google books.
opus but the or rip but ole of the scanned a material and ole of the o.c. ah materia was also handed off to the hath he trost as a trust in this digital collection and what they've been doing with that is providing computational infrastructures as and a p.r. a's and ick struct id would features is so the people can conduct a literary analysis an loo.
when and had to use ick struct id would feet shes from a really le ge collection oath literature and pot of him explain this is to kind of give you a hands on experience with the context of breaking down books into pages as jane's said and then ff thinking through what that.
delineation ove woods means a m and try to look for some i guess some interpretations across ih school pus any why so that a custodian of the google books on its the the bit boy este i think the them edda dutt of doesn't can tyne very much not taints century.
metadata aam but the due can tie nineteenth century literature.
and they a as a said the provide the dater analysis infrastructure and they've the had the trust research centers produced what they call the ick struct id features day to sit which is known consumptive use so what they've done is from all of the o.c. ott tex from all of the books collected the medard data from the books decomposed all of the.
the osce sentence as on each of the pages and then produced would frequencies for each of the pied ges of each of the book so there's an index for every page this sim mmmm biblis graphic mehta dot a for every book and the a is the engram view a to slawson dawson interrogate this so or there's lots of ways of using this.
ce a so the ick struct if each is they've got paf n n a are lahr breeze they've got a p i access to download the specific ick struct id fee ch as for specific book so for collections of books i'm have to borrow around on they whip sought a little bit but you can ck construct a collection say of a genre that you much the interested in get ole of the ids of.
the books and request from them down mode oath the ick struct id feature is there's some books that are in copyright some that are out of cop ear awe tso or the a's kind of working with the hath he truss to get pa ste these sorts of dater ek was ish an problems is something that's i think you'll ole become more more famille you with not just with hath he trust but with of tiny you're using you a own data i.
and like a said they've got the biblis graphic mehta data so the got things like the taught old a publication date to have the language ch the zhou on rome the imprint ched the rot tuck trib you shuns a i'm pei ch counts and the saw eyes as the physical pie huge sizes of the books that was scant here i yell.
but that i you know ok.
so a the data's provide it as jace on i'm when ott going to os ck you to what directly with the j. sauna just wanted to give your little bit of an exp ll and i shun of the structure be hind it so you get a bit of an i'd your of the pro-gun a nts of the data that you going to work with at a very hari gooey level i so it there is if you can say that there's the.
id at the tom pp.
with the medard dot on subsection so there's the scheme a version that the using here which is not really going to be that useful for literature in el asus the dike that these were created the right set trib you shun which is their own cooed mm the sore since to chew shun which a mostly american universities a so it the the a huge proportion of english.
a language in this cool pour a but there is enough mehta dot a few to sliced down to individual sub languages and sub topics for instance if you need to kind of get more famille your become more famille you with the dem a no the content that you looking at a and they have point of speight op part of speech tag ce they have begin end end carrick.
to counts for each of the lawns so this quite a lot are rich mehta data that a describing the o.c. odd texts of these pages page by page vall the in by volume that there's you knit lots of quantitative uno us assen that could be done on top of this what they what they've done with this is the non consumptive use allows i'm to get pa ste cop ear awe.
but so the not route releasing the entice sentence as so you u. conn get linguistic context out of this but you can get would frequencies at of it and the reason that they've been a lout to release this pte a to set all these data sets is that you can't recreate the original copper up material which khan sox but it does allow you to do quite a lot of a things and.
i so for example this is an example of the sequence which would be page thirty three for example that has two hundred and seventy three tokens on an thirty six lined sss.
nothing in the page hedda so they've actually mocked up whether or not this head isn't food is or whether the is lined call ums to the saw ayat so you can start to build up with this mehta data a per ff ah ol of the cons of pages if you want to des deux in an ouse as across to the corpus and then the body text of the page you gain.
anes to tune ince every three tokens so there's kind of a d. construction ove the talk a graphical forms of the page so are and this is the can be used for things like distance reading would similarity topic models cause all you've got as the the role woods in the frequencies a a mercian an owl asus a mm an visual structure essay this is an example of.
this five copies in this cope as in the the collection of the arjun of the species i'm a nts someone as analyze these have noticed that there's two trends so from nineteen twenty nyunt on and arf done there on every jubb out a quarter of an inch tall are and one ites of an each water than the books published be fall so maybe that so useful measure to.
now of that to the page saw eyes as a changing as the printing technologies of all ft i and the fon sore is his kind of increasing to fill that space up and yet yes it it.
and you do have the words on the pite ges so eight you couldn't you without context yes so you are a that's right you that are yes it and this cope as was built to get around some of the copper at restrictions for releasing the full language.
ish to been or see ott soay ye its its use what's another prob an unt some the reasons that they've made this available in their strick shuns that require them to make it of elah bulk but you're already identifying uses that it's i a bill to being want of you.
i us so someone is done topic modeling at topic modding za little bit weeded at know if you can in took pert them it but they've plotted it over the course of the volume as well so you can see the talk one topic sp a tiny suh custom house office survey are official general he's happening fr apps in the first chapter www.
the volya mm but then nothing in the rest of the chapter and so these kind of die iconic presentations of topics of a time can give your bit of a cents of what the text it sofas about yare what the topics a changing of a time but it's difficult to interpret so this one at the bottom that same zz pretty persistent in thai way through so it's about ne chaa life.
eric pte a minds change states oh ff soeur.
and the scala pte letter.
so it is is the practical that we'd lucky to form unt to maybe groups three groups if that make sense i would like you tick kind of load up this webbed sought which is the book whim which is the engram view across this ih ck struct id fick just data set for the hath he trust a of the.
urals here it can be a little bit slaughter so it just take you time loading it up i thin if you can foreman to the group's what we'd like you to do is fine three to five queries that relate to your project will discuss what you would lock to focus on as a group or yare that i'm three five queries use the phil to xers facet down in to sub collections and.
sub languages.
and look for three to five queries with multiple k. woods that kind explore a trend of a would change of a tone so at make sense and were yat yet or are or whirl and show or.
or that are a way are or are were.
but it.
you and what are you what it at scale not really at scale it stoss to become difficult but you can do it pee wee thin de vigil texts yap yet whey you have to literally read the right and you.
and how when you her eat it in hm i chime aunts of the and fend read the fd you a in did huge fed and mix you up debt and i that's their pts you dh your ball tech and if angry a dull ict get which means you much half to get up and will caret you and who hoon ould.
you the and but pts but says a alls you a of touch ngs sudden ticket pink of the o. of but i'm and a-t. n't with its it to lt and lumb pulled uppity also aled oh ting a kit you a it.
it put it pro wound up and gonu nd you and it have this it and pts you and it pts ict this of pink you can quot now it it bund called a was eithne it like picked think suit me to ca nz it the true pulled pummeled it at toned.
you know a olden it will build it the an.


