Open Science and Collaborations in Digital Humanities Part 2

Video thumbnail (Frame 0) Video thumbnail (Frame 1068) Video thumbnail (Frame 2049) Video thumbnail (Frame 3471) Video thumbnail (Frame 4622) Video thumbnail (Frame 5706) Video thumbnail (Frame 6736) Video thumbnail (Frame 8482) Video thumbnail (Frame 9904) Video thumbnail (Frame 11807) Video thumbnail (Frame 13440) Video thumbnail (Frame 14823) Video thumbnail (Frame 15972) Video thumbnail (Frame 16941) Video thumbnail (Frame 19615) Video thumbnail (Frame 21332) Video thumbnail (Frame 22672) Video thumbnail (Frame 24353) Video thumbnail (Frame 27393) Video thumbnail (Frame 29143) Video thumbnail (Frame 31228) Video thumbnail (Frame 34564) Video thumbnail (Frame 36822) Video thumbnail (Frame 39132) Video thumbnail (Frame 40583) Video thumbnail (Frame 42303) Video thumbnail (Frame 44692) Video thumbnail (Frame 45969) Video thumbnail (Frame 48047) Video thumbnail (Frame 49307) Video thumbnail (Frame 50681) Video thumbnail (Frame 51662) Video thumbnail (Frame 55906) Video thumbnail (Frame 58117) Video thumbnail (Frame 59232) Video thumbnail (Frame 61896) Video thumbnail (Frame 64330) Video thumbnail (Frame 68194) Video thumbnail (Frame 72229)
Video in TIB AV-Portal: Open Science and Collaborations in Digital Humanities Part 2

Formal Metadata

Open Science and Collaborations in Digital Humanities Part 2
Alternative Title
Quantitative and qualitative methods
Title of Series
Part Number
Number of Parts
No Open Access License:
German copyright law applies. This film may be used for your own use but it may not be distributed via the internet or passed on to external parties.
Release Date
Production Year
Production Place
Dubrovnik, Croatia
Frame problem Slide rule Standard deviation Observational study File format Execution unit File format Data storage device Cartesian coordinate system Formal language Frame problem Formal language Arithmetic mean Word Hard disk drive Data structure Abstraction Maß <Mathematik> Data structure Abstraction
Point (geometry) Standard deviation Observational study Data storage device Bit Data storage device Theory Power (physics) Magnetic-core memory Mathematics Arithmetic mean Process (computing) Semiconductor memory Computer hardware Energy level Natural language Codierung <Programmierung> Representation (politics) Abstraction
Standard deviation Building Observational study Graph (mathematics) Relational database File format Codierung <Programmierung> File format Code Numbering scheme Semantic Web Arithmetic mean Mathematics Utility software Musical ensemble Data structure Abstraction Spacetime
Standard deviation Observational study Texture mapping File format Codierung <Programmierung> File format Numbering scheme Numbering scheme Revision control Word Uniform resource locator Mathematics Wiki Set (mathematics) Representation (politics) Representation (politics) Abstraction
Point (geometry) Axiom of choice Scripting language Table (information) Code Transformation (genetics) Codierung <Programmierung> Unicode Graph coloring Mathematics Different (Kate Ryan album) Operating system Representation (politics) ASCII Multiplication Physical system Standard deviation Observational study Touchscreen Mapping File format Decimal Code Cartesian coordinate system Data transmission Arithmetic mean Process (computing) Interpreter (computing) Abstraction
Group action Context awareness Codierung <Programmierung> File format Mereology Graph coloring Code Twitter Software bug String (computer science) Square number Energy level Multiplication Form (programming) Thumbnail Multiplication Standard deviation Observational study File format Projective plane Bit Cartesian coordinate system Word Arithmetic mean Error message Software Ring (mathematics) Interpreter (computing)
Scripting language Observational study File format Web page Mobile Web File format Set (mathematics) 8 (number) Cartesian coordinate system Mathematics Arithmetic mean Process (computing) Positional notation Order (biology) Data structure Object (grammar) Library (computing) Address space
Web page Surface Standard deviation Context awareness Proxy server State of matter File format Numbering scheme Measurement Mathematics Latent heat Insertion loss Pressure Standard deviation Observational study Surgery Graph (mathematics) View (database) File format Computer Attribute grammar Medical imaging Spheroid Histology File archiver Peripheral Game theory Data structure Pulse (signal processing) Library (computing)
Standard deviation Observational study Turtle graphics Information File format File format Turtle graphics Open set Data structure Form (programming)
Standard deviation Observational study Link (knot theory) Markup language File format Sheaf (mathematics) Virtual machine Content (media) Measurement Field (computer science) Attribute grammar Ontology Mixture model Revision control Data structure
Module (mathematics) Scripting language Standard deviation Standard deviation Observational study Theory of relativity Ferry Corsten Markup language Virtual machine Content (media) Bit Mathematical analysis Line (geometry) Inclined plane Wave packet Compiler Ontology Interpreter (computing) Quicksort Endliche Modelltheorie Data structure Writing
Standard deviation Group action Markup language Execution unit Content (media) Mereology Smith chart Product (business) Revision control Latent heat Ontology Authorization Cuboid Maß <Mathematik> Position operator Form (programming) Standard deviation Observational study Constructor (object-oriented programming) Mathematical analysis Bit Measurement Inclusion map Process (computing) Interpreter (computing) Cube (algebra) Natural language Data structure
Web page Frame problem Trajectory Digital filter Server (computing) Proxy server Transformation (genetics) Length Plotter Multiplication sign Execution unit File format Event horizon Plot (narrative) Measurement Time domain Thermal fluctuations Diagram Pairwise comparison Maß <Mathematik> Linear map Arithmetic progression Pairwise comparison Multiplication Observational study Trajectory Digitizing Length Mathematical analysis Volume (thermodynamics) Grass (card game) Measurement Frame problem Latent heat Event horizon Frequency Thermal fluctuations Damping Volume Arithmetic progression Arc (geometry) Singuläres Integral
Point (geometry) Frame problem Digital filter Proxy server Transformation (genetics) Multiplication sign Execution unit File format Fourier series Average Plot (narrative) Theory Measurement Time domain Revision control Term (mathematics) Energy level Smoothing Pairwise comparison Maß <Mathematik> Arithmetic progression Linear map Pairwise comparison Observational study Trajectory Magneto-optical drive Length Volume (thermodynamics) Latent heat Arithmetic mean Word Event horizon Frequency Vector space Thermal fluctuations Green's function Interpreter (computing) Volume Reading (process) Singuläres Integral
Probability distribution Frame problem Building Context awareness Observational study State of matter File format Graph coloring Measurement Indian Remote Sensing Word Frequency Bit rate Vector graphics Matrix (mathematics) Representation (politics) Energy level Data structure Endliche Modelltheorie Maß <Mathematik> Computer architecture Source code Observational study Horizon Sampling (statistics) Continuous function Frame problem Word Hill differential equation Hydraulic jump Matrix (mathematics) Abstraction
Frame problem Standard deviation Random number Observational study Multiplication sign File format Horizon Graph coloring Measurement Frequency Term (mathematics) Diagram Computer-assisted translation Maß <Mathematik> Descriptive statistics Observational study Theory of relativity Bit Mereology Term (mathematics) Group action Word Frequency Object (grammar) Metric system
Web page Frame problem Standard deviation Random number Observational study Decision theory Multiplication sign Execution unit File format Horizon Graph coloring Twitter Measurement Frequency Term (mathematics) Energy level Gamma function Maß <Mathematik> Form (programming) Observational study Web page Mathematical analysis Mereology Term (mathematics) Group action Word Frequency Interpreter (computing)
Point (geometry) Cognition Codierung <Programmierung> Rule of inference Formal language Information retrieval Cross-correlation Read-only memory Term (mathematics) Semiconductor memory Kognitionswissenschaft Data structure Endliche Modelltheorie Error message Form (programming) Observational study Slide rule File format Expression Sound effect Bit Term (mathematics) Formal language Connected space Word Process (computing) Interpreter (computing) Endliche Modelltheorie Energy level Cycle (graph theory) Quicksort Diagram Data structure Resultant Surjective function
Dialect Asynchronous Transfer Mode Context awareness Multiplication sign Icosahedron Mereology Formal language Computational physics Endliche Modelltheorie Bounded variation Surjective function Dialect Standard deviation Observational study Information Digitizing Projective plane Mathematical analysis Morley's categoricity theorem Analytic set Digital signal Line (geometry) 19 (number) Formal language Word Root Process (computing) Endliche Modelltheorie Stability theory
Mass Formal language Word Network topology Representation (politics) Computer worm Endliche Modelltheorie Scripting language Data dictionary Multiplication Standard deviation Observational study Electronic mailing list Mathematical analysis Sound effect Semantics (computer science) Mathematical morphology Formal language Type theory Arithmetic mean Word Data model Process (computing) Software Personal digital assistant Einbettung <Mathematik> Endliche Modelltheorie Natural language Resultant Library (computing)
Context awareness Computer-generated imagery Virtual machine Similarity (geometry) Mereology Semantics (computer science) Dimensional analysis Formal language Product (business) Word Medical imaging Machine learning Network topology Representation (politics) Endliche Modelltheorie Task (computing) Form (programming) Collaborationism Standard deviation Observational study Weight Dimensional analysis Computer network Equivalence relation Virtual machine Similarity (geometry) Word Process (computing) Software Repository (publishing) Blog Endliche Modelltheorie Quicksort
Dataflow Multiplication sign Letterpress printing Translation (relic) Repetition Sparse matrix Semantics (computer science) Formal language Mathematics Semiconductor memory Representation (politics) Energy level Endliche Modelltheorie Data structure Pairwise comparison Observational study Information Artificial neural network Software-defined radio Bit System call Word Arithmetic mean System on a chip Normed vector space Interpreter (computing) Video game Endliche Modelltheorie PRINCE2 Quicksort Fingerprint Data structure
Mathematics Web crawler Word Arithmetic mean Observational study Machine learning Translation (relic) Bit Quicksort Translation (relic) Data structure
Multiplication sign Floppy disk Virtual machine Computer font Computer icon Formal language Twitter Frequency Gaussian elimination Mathematics Robotics Term (mathematics) Representation (politics) Energy level Observational study Web page Projective plane Mathematical analysis Word Database normalization Interpreter (computing) Speech synthesis Volume Data structure Resultant
Algorithm Observational study Algorithm Cellular automaton Computer network Evolute Likelihood function Force Frequency Arithmetic mean Uniform resource locator Different (Kate Ryan album) Order (biology) Representation (politics) Endliche Modelltheorie Data structure Metropolitan area network Near-ring
Presentation of a group Observational study Civil engineering Digitizing Multiplication sign Sigma-algebra Combinational logic Mereology Wave packet Sign (mathematics) Word Data model Quantum entanglement Arithmetic mean Word Software Hausdorff dimension Endliche Modelltheorie Spacetime Einbettung <Mathematik>
Context awareness Group action Divisor Algorithm Multiplication sign Cellular automaton 1 (number) Numbering scheme Formal language Frequency Mathematics Bit rate Hypermedia Different (Kate Ryan album) Term (mathematics) Shift operator Observational study Projective plane Sound effect Computer network Bit Term (mathematics) Arithmetic mean Word Interpreter (computing) Endliche Modelltheorie Bounded variation
Point (geometry) Link (knot theory) Mapping Projective plane Mathematical analysis Physicalism Expected value Frequency Arithmetic mean Mathematics Word Term (mathematics) Different (Kate Ryan album) Universe (mathematics) Core dump Abelian category Spacetime
Point (geometry) Trail Context awareness Hoax Multiplication sign Chemical polarity Food energy Computer icon Twitter Web 2.0 Frequency Finite element method Mathematics Arithmetic mean Different (Kate Ryan album) Term (mathematics) Hypermedia Game theory Descriptive statistics Scale (map) Turing test Projective plane Graph (mathematics) Basis <Mathematik> Instance (computer science) 19 (number) Radical (chemistry) Arithmetic mean Word Universe (mathematics) File archiver Website Endliche Modelltheorie Convex hull Library (computing)
Axiom of choice Complex (psychology) Asynchronous Transfer Mode Identifiability Ripping Euler angles View (database) Multiplication sign Decision theory Source code Mereology Numeral (linguistics) Number Formal language Revision control Blog Network topology Authorization Parsing Weight Mathematical analysis Bit Uniform resource locator Word Root Googol Process (computing) Speech synthesis
Scripting language Complex (psychology) Wechselseitige Information Matching (graph theory) Multiplication sign Mathematical analysis Fitness function Dressing (medical) Formal language Telephone number mapping Frequency Programmer (hardware) Latent heat Interpreter (computing) Authorization Formal grammar Software testing Quicksort Series (mathematics)
Web page Point (geometry) Complex (psychology) Group action Identifiability Multiplication sign Modal logic View (database) Source code Analytic set Perspective (visual) Theory Formal language Neuroinformatik Twitter Frequency Term (mathematics) Different (Kate Ryan album) Authorization File viewer Abstraction Form (programming) Pairwise comparison Shift operator Observational study Google Bücher Scaling (geometry) Inheritance (object-oriented programming) Closed set Mathematical analysis Lattice (order) Type theory Word Arithmetic mean Process (computing) Point cloud Speech synthesis Table (information) Reading (process)
but ok so will come back we're going to get into the next session now here again on qualitative and quantitative methods but more about data formats and of patients and units language structures and then frames of meaning.
and there were practical at the end against similar to the previous one but just building on on that about meaning changing words so want to kind of go through layers of abstraction and hopefully it'll make sense once again towards the end the letter of obstruction is really working with data but it has to be stored somewhere and.
ultimately you write applications that consume and and so the letter obstructions really we don't use hard disks quite so much anymore that this is the technologies and the formats and the standards that have come in the past does anybody know this is on the slide.
yet this is a core memory this is one kilowatt of memory stored in that nineteen fifty's that's how i store to put electrical impulses to each of the rise on the side and it would polarize a magnet that was wrapped around the intersection of the cables and that's how you stuart a bit here so we kind of doing that now. with natural language processing but just not in physical hardware will almost and physical harder and the whole point of this is to recognize that obsolete power corrupts absolutely so at every layer of this obstruction i'm going to go through now you've got standards and four months and technology is changing all the way and the impact of these changes.
is well what is the impact of these changes on meaning you know as i said nelson here and so the next level up is the encoding were a lot of you will be familiar with these theories kind of started with most hundred which is really dots and dashes and signals and sounds being submitted a single wires.
we've got that out quite substantially the space next lap up from that is for once fall for once i'm sure you are all familiar with a standard for four months yet but we don't really consider these meaning full changes we just use them as a utility we don't consider what the impact of the four much.
changes are on the meaning of the text that we use building on for months with then get into scheme is on told the g.'s and i know you don't some sessions on the semantic web in the knowledge graph and things like this this is a simple relational database but the principles of the same music formats and the un building more structures more meaningful structures on top of that and ultimately we end up with up.
locations this is what we do all of this layers of representation for and when you're working with texture really kind of working at the encoding formats and scheme is layers years are very familiar with these technologies these formats.
you know.
again they always change that always change and the best way to prevent against that changes to recognize that the change is always going to happen. standards help as well as originally as the us asking her child if anybody saying the nineteen seventy two version but it was very constrained yet it didn't allow very many non english characters that was extended by.
obviously to ask the. with a while yet an extended again to encourage which is what we all work with today i'm not sure if the data that you end up processing is unique heard proper when you work with little you have to go through transformation processes they will affect character changes and interpretations on top of that though it's got that code point in the unit could.
standards nine seven three one is the little girl if for the snowman. and on top of that we got fun changes with operating systems as well and not really sure what the impact of this is on meaning changed so i think i rented the same code point on the map and took a screen shot over here.
it might be the mac tends to have more colours on a can remember which operating system that was as well but i had no choice over the iconography or the uk can agree that was being used for the representation of this kind of point and for me it still looks like a snowman but if this is a more meaningful live if it's a character for example and it's being presented in a slightly different phone format. and yet have control over that the application were then i not really sure how to measure the meaning change of the so this is another layer of abstraction for consideration.
and with some of the twitter what we've been doing were extracting multi but emerging and kind of e-mailing those back and forth amongst the project group and read extracted of haiti can say there's to skin turned thumbs up so that when a copy the string that i instructed extracted using python from you to death. rings i pasted into my mac mile application and now that to jane who opened up in the market word application and replied to me markers of word. interpreted the bit level coding and stripped out the multi but you to the emerging codes that basically strip that these little squares on the side which are the skin color ation characters now that's just above in a pace of course software in mars' soft outlook software that was probably written in the eighty's to some all. the text standard but these days you could probably interpret that as a form of racism stripping skin color of your multi by encouraging and so this is where it's kind of getting into a slippery turkey between bugs and interpretation of meaning and the social context and part from the us so we're going up again from the next level is open to the format.
everybody is probably quite familiar with x.m.l. here.
this is not the same document but the job script object notation is serialisation of the same data but serialisation is not really considered as a meaningful change it's more just a structural change not really sure if serialisations have an impact on meaning change does anyone. you have any examples where that has happened. no i think the most meaningful change the serialisation change will have is on the skills and the tools that you need to learn in order to work with the text to work with the data so it's not so much in meaning change of the content itself but a meaningful change in that you need to learn a whole new set of. tools and applications in order to work with the data. two years they have separated formats again there's lots of libraries and scripts that allow you to work with these and this is a simple data structure to tabulate a structure you could use commons instead of tabs for example to split things out but again you need to learn a whole new set of tools and techniques and libraries to load these up and iterate through the struck.
years and split them so it's not meaning change but it's a meaningful change in the methods and tools and techniques that you need to use to work with the state.
and i.d.'s no idea has not really a former it's more of a scheme.
and the way the war. standards in scheme is help us learn a specific tool and technique and software library to work more rapidly with additional data of this format so we're all familiar with the now standard were able to work with web pages quite rapidly and if you're working with web archives. the familiar with the tags available in thanks to know is one way of extracting the idea what you're not going to be quite so familiar with is how specific web page used that tag. the which is i guess a context to consider and i.d.f. the game you've probably exposed to i.d.f. in the knowledge graph its kind of difficult to get your head around for most people that are competing national in their thinking and.
so yeah anyway.
i guess. who works with triples in turtle format. the. no one knows what troubles in total form. they're ok for. that. but it. and it. his serialisation is representing the same information and it's more difficult to utilize then you kind of sawing off using your data based on serialisation.
these are also not human readable i.d.f. is not human readable this is idea thaksin else very verbose there's a lot of attributes in the next america that link through to the ontology is that each of the field values are supposed to represent it's not human readable what they've done in digital humanities is create the text including english.
to which was much more focused on lane markup much more focused on the content rather than machine verbosity if that makes sense and so for example the tea i suspect has a section for pros tags and you can see the tags used in the exile here very early in their only two characters to represent a clause best.
had to represent a sentence. so it's the kind of very lane tags used for syntax market so you can get quite into the text with that. being overwhelmed by the measures are and there is another.
kind of sub module in the tea i standard for market versus so there is an attention is here for the top of earth so there's a sonnet or a quick train or to set the different types of frozen and l. tag for each line of text so what i'm getting at here is that r.d.f. is used for machines it's very of abbas and it's not really. human readable but if you've writing scripts that will link things together then it makes a lot of sense because you're touching identifies to ontology is to relate pieces of data yet. i was much more about the kind of the interpretation a mock up of the content as well he said.
and what happened i guess with t i over the years is that it because it was an exit no by standard but it expanded to include all sorts of models for all sorts of specialist literary interpretations and it's mostly english based feature is simple analytic mechanisms or this some linking and segmentation. tags that can be used and h. of these models within the tea i kind of work a little bit like ontology is doing that you borrowed tags from one module on the other and you put them together into the mark up of the document that's interesting for you at town. i.
it's also been used i guess for special arrangements as well and some tools have been built on top of on top of it. this is for i guess genetic mark up of texas where you can annotate annotations yet so you can say the boxes of a side here actually marking up with t r tags i'm a handwritten correction and then a correction to the handwritten correction and they contain the coordinates of the position of these. marks on the package which is useful for some kind of analysis are mostly it's the benefit of the research are going through the thinking process of the author who was hand writing these things and correcting the document as part of the kind of revision reprocess why corrections. said certain things whether mocked in certain positions so it's more difficult to see how market like this could be used computationally but it's very useful for the thinking and the quality process to go through the interpretation of the document when you've got very specific tags and standards to help constrain the way that your interpreter. him and digitizing texts asked him. well not to. i think it may yet. i don't. but this that that correct think that my best advice. and this helps you to put a whole different contribution that what stage in the process they were made to me which is the construction of the group effort that was involved in the production of texts.
so with multimodal of standards with ontology is that you can borrow bits and pieces from which forms of an attention to you pick and choose and which units do you then pick to mock up and to annotate so are going to try to go into a little bit more linguistics now talk about measure.
units for months and frames and some of the literary techniques that have been used in digital literature analysis matthew jokers has recently released a book and this is his plot of a sentiment analysis for the fluctuation of that what a fluctuation in sentiment analysis for the plot trajectory for joyce's push. of the artist and i think this is simple sentiment analysis is just doing token matching it's not doing any ground causing but you can say that the grass like this is not very good him and it's measured but it's not difficult to interpret. what is looking here here is called shies it a linear progression of a narrative which helps to understand the the offer presents events to the reader so this is over the course of the story arc and he was speaking i think to somebody from cern who suggested that he proved perform a free a transformation on that data.
which transforms it into the plot like this which looks like it's more meaningful and gives you a sense that there's some positive things happening at the start there's a disaster in the middle of the story and then everything in that happy at the end what a for a transformation also does is bring everything from multiple volumes into the same kind of narrative times. so from zero to one hundred server godless of the length of the volume which the previous diagram shows you this is like five thousand pages this is kind of bringing it into a percentile form so that you can then start to compare the comparison of the narrative across multiple volumes yet so what this is done is transform the measurement and four.
but for comparison at a higher level of interpretation yet he did this in theory twenty fifteen and in april twenty fifteen he was criticized for changing the meaning of the interpretation of the text and so what is done here is then clot the for a transformation in graeme.
the rolling main in blue and the original sentiment points in black behind the same than you can see the for transformation paychecks here at about seventy but it's actually been that pink is probably influenced by the narrative time around eighty eighty five it so it's actually shifted narrative time. through this for a transformation and while it might seem at a higher level more useful to compare multiple texts across with the for a transform version of the literary narrative it's actually kind of shifting things back and forth in time when you do that and a lower level of interpretation when you're reading of all the more reading a text more closely. perhaps it makes more sense and contextual terms to know that it happened towards the end of the volume rather than say the chapter before. so it then kind of units a day assume everybody's familiar with word vectors here and.
so another way of chongqing of texts are thinking about the layers of abstraction building on top of this word to check came out i think about twenty fifth day.
which used to skip graham so building on top of individual word mattresses it's now kind of taking a word and looking for the the contexts by grams next to a study to build up another sector representation on top of that and then taking that to the next extreme his perch as everybody had a play with but.
year but is a new architecture by directional incurred a representation to transform architecture that is essentially building across the and graham skip graham by graham kind of deal in that state instead of dropping out were instead of kind of looking forward contexts yet it drops. without words in that sentence and then build probabilistic models on top of that and it does it not just with every other or every second word it starts to build further on top of that and dropping out the next sentence and filling that with a random sentence afterwards and building a probability distribution across whether or not the sentence that follows a year should follow it. probably stickley or not. and then you can kind of theater in seed text and it will use this probabilistic matrix to start to generate text but again all of this is theoretically buy stuff of wood taken as a nation and would frequencies i'm not sure i haven't looked into the details of the book architecture whether it's using for names or more frames or any other grammatical structures in the. the way that it's representing texts contexts probabilistic and so i think the idea here is that we're just kind of building abstractions upon obstructions and starting to lose some of the more meaningfully grounded linguistic contexts. so getting back to the high level of with frequencies this is another has the trust example from ted underwood's book distant horizon which is about distant ratings and here he is plotted the frequency of color turns in a random sample of fiction and he's noticed that after eighteen hundred the the frequency of the.
of color words is starting to increase and he's had a bit of a hunch about whether or not represented a decline in third person aeration in fiction it was kind of shifting from third person aeration to a more descriptive way of writing and he would look for another study and found one from.
stanford in twenty twelve from has earned the cat which were looking at the rising frequency of composite concrete object to see nineteenth century novels sober colors names body parts not just colors but other related descriptive terms these the physical descriptions and he gathered not just their data but have to trust data. and plotted a longer time period and then decomposed it down into his owners so can start to use this is a correlation metric for his hunch in the previous diagram that the way that literature was changing after eighteen hundred was studying to become more descriptive it and we found here is in the fiction you can see that their the.
the of the color words is rising a lot more than the frequency of the colored and barbara free which suggests that the genre in the form of writing in fiction is changing over time and this is just with very high level word frequency analyses and kind of taking two studies and correlating them together to form an interpretation. so what we don't know at this is what the annotation unit was how is the plight of these is obviously not using web pages but it could be going down to the paragraph level in terms of his aggregations pretty sure it's would frequencies but is he using side have to trust data and would frequencies by page. i'm not really sure that would have too much impact on has built up the frequency analysis across this but i. another decision that you need to make about and analyzing data is which what is a document what your definition of a document is it going to be the entire book is it going to be just a single tweet is it going to be say just a chapter of a book the impact of these decisions on the pipeline that you build for analysis is. kind of difficult to measure in some ways. i.
so language structures. talk a little bit about kind of internal and external models whose heard of cycling with sticks.
yeah ok who's associating with sticks. ok this sort of a rough correlation here between internal quantitative an external called a qualitative yet it's not direct but that's what i'm trying to draw the connection between so cycle language models and cycling with sticks is really about the cycle.
elegy and the brain science of the interpretation of language yet it looks about how long term and short term memories are fact our understanding interpretation of the words that we hear it looks at the way that words are coated included the phone aims and the more famous that we hear and whether or not they have priming effects it can also look at it. kinds of glitches that the human brain. exhibit so this things i think called spin results anyone heard what spinners amiss. they have gotten example so the kind of slips of the town yet if you say the day old queen he said the queen old dane actually done a segment switch between the d n a q on the team and point here so people express this without any prescriptive form of language this is just the way that we naturally speak so. there's word switching that happens so the rules of word formation can be uttered as the words of reform nations of people exhibit these sorts of expressive errors which give a bit of an insight into the way that the brain is actually doing language processing and another one is more from switching so the example sentences and forgotten. about that all you might accidentally say i've forgot about that and so these little kind of examples that humans expressed give a bit of an insight into the linguistic structure and this is the sorts of things that cycling mystic language modeling is looking to interpret it and i guess a lot of this. is difficult to grab onto.
social language models are looking about the languages and selves the dialects and the regional influences of a time of the literature and the verses and the songs they get transmitted between them the ethnography of the language use and how that affects the culture of the people that use them so this is an example from a project for. from the digital humanities conference in twenty nineteen which was just a few months ago on digital for cory sticks and there are looking at the kind of northern regions of charms and sewn on the way these these kind of cultures were transmitted the information was transmitted and part of the analysis that they did is.
that they were what the epic songs they realized that the personal pronouns which are often used to stop words and dropped out of standard analytical park lines for simple kinds of processes actually had a fairly market impact on the categorize ation of the epic songs in the region and those correlated to different places. as. so it would have got in korea southeastern and finished charms were identified by personal pronouns as a feature in the language so are dropping stock words that is usually used as a way to expedite way to simplify some of the analytical processing but when you're thinking about social contexts. actually really it's really useful to know if someone's expressing it from the first person or the third person and you can then draw conclusions from that as well so is speaking many more just america that linguistic models and.
the way each things out he can have an impact on the way that your script and your results interpreted serve stemming for example of the sea effects the meaning of words but it's a computational technique to simplify your data representations.
you then have kind of multilingual models to work with which is starting to become quite there's no standards that i'm aware of yet i might be wrong but the standards for reuse of data models across multi-lingual software tools is starting to have a crate not show too many standard jet you guys. many with more spicy which is a pitch pay natural language processing library and a very good about exposing through isn't easy downloads their multilingual models but again i think what they've only got one multi language model the otherwise the individual language models and him. not to show whether or not the kind of providing old the deep linguistic features that you might be used to using in which case downloading the french list here if it's just word embedding is and but you need something else for it you going to need to produce an additional model for the types of analysis that they want to do in french so is a massive fracture of them. models that you need to learn to use in to the software tools that you bring together into natural language processing power plants.
so while these are some of the models that are available as anybody years many of these. neither a german net is a german language equivalent of word net and word next used quite extensively. the linguistic data consortium has quite a lot of tighter on their repositories that you can sort through and browse around you need to check what the provenance of all that data is before you start to use it but i guess that's the challenge of doing interesting research. i'm that i just know what i'm trying to say here is that there's a lot of slippery standards that they're starting to come into general use but part of your research process process should be to evaluate what are the standards available for the sort of tasks that you're planning to do and try to adopt the standard so that others can reuse your research later on. or reuse the models that you produce later on the m i guess back on linguistic production is i found this blog posts dimensions of dollars by droll simon and we're talking out like a guest semantic spices and meaning alignments and what is done is use collaborative machine learning he called the collaborative but it.
it was a variant of adversarial machine learning which is to machine models one producing examples one critiquing examples and they both kind of learn from each other in this competitive way he's called that collaborative rather than competitive exactly how is implemented or i'm not sure but what he's done his amuses networks to create. emergent language isolates which of these small victory of a pictorial representations of what these words main and he started to say in language jostle it's that simple concepts of producing similar images so they grounded in the lexicon an english lexicon but they don't have the semantic linguistic context of just what the lex. starting to bring out even with the generic methods i'm similarities in the word form representations that are produced.
there is a mother. another technique another neural network model that or that's called hierarchical temporal memory and it's a company that i haven't written on the sly could new mental and but they've built the hierarchical temporal memory and a neural network model based on nearest saw. science of the structures of the neocortex of the human brain so i've been looking at the way that the neurons a structured in to close which the call cortical columns and there's seven life isn't this critical columns and they've built this new model to represent the way that information flows across and between these neural columns in the knee. the cortex the human ear cortex which is a slightly more advanced way of representing neurons hear what they've ended up doing is they've and but what i sold this technology to complete a quarter called a lawyer which is produced what it's called somatic think a prince which is again another representation of the. the semantic spices of these words so you can see or hear that example they got where my says that the think a print has a bit of a cluster down on the side he does as well but they both got a cluster top court memos and so are you doing it level comparisons you starting to get a different rep it's not quite perfect. the comparison but it's a pictorial level representation of the kinds of semantic meaning changes in this magic relationships with words of a time. so it has lots of magic meaning changes in words and a lot of these i don't know the meaning of myself but i do know that these are the sorts of language interpretations that we just struggling to get a grasp on with computational with sticks.
so our homes and translations for example.
and there's an example in germany where it was a bit of a maim where they started to print postcards and so much in the us under the idea of i think a spider which is a literal tex translation sophie i don't speak german that if you translate the words directly says i believe i spied are actually. it's more of a clerk realism that you know that the linguistic were translation was quite funny to native speakers but completely lost in other cultures and a few more examples here and my english is on the pig which means you're english is really bad so if you had to represent those sorts of meaning changes with some magic friends.
it's all with cortical learning algorithms it's still kind of grasping at these sorts of things.
this is an example from robot home guard called smoke and mirrors what they've done with this is take a cavalry of words from a previous research project which would seem to represent vegas in english and they essentially did would frequency analysis cross cope or. have i think some some literature but some speeches from a putin to nato and from speeches from nato to putin and applauded the vagueness over time said are chronically the crunch these corporate up and i noticed that over time putin's language of creating more and more of a and nato. his language is getting more and more concrete but how you interpret that i guess is related similarly to the exercises we did earlier which is that that word frequency level here so what they've done is due would frequency analysis to see that the lexicon of vague terms is increasing over the speech time. again that's just the word level there's no kind of syntactically understanding that's going on there and then at the iconographic level was starting to see with things happen online so in the same where should you the snowman font representation of meaning change this came up on twitter a few months ago so why is the safe but no markers of excel.
represented by the picture a vending machine so people who don't know what floppy disks are starting to lose what and what this icon means as floppy disks and so we've got the social interpretation occurring because obviously technology is disappearing from the minds of society and so another tricky thing to grab on to. and even trickier one would be double speak so this says i'm currently economically inactive due to being offered an earlier time and opportunity as a result of my previous employers human resources redundancy elimination initially.
which is basically and on a man employed because the company was foreign people and i got fired so are those two sentences mean the same thing. how you would deconstruction represent those meanings in order to detect the same thing in different texts is a pretty complicated problem just raising problems and.
but what's happening with all of these layers of attraction in the evolution of these techniques and methods and technologies is that we're starting to use combined methods and so we start with them graham frequencies go through to occur locations where the the likelihood of coaker it says that occurring ago when bearings us from twenty six states earth. as a twenty fifteen of them. and then some of this kind of near couric or goodsell algorithms might be other methods to use for an alternate representation to combine.
and this one. this example is called disentangling a trinity as another presentation from the digital mary's conference in utrecht a few months ago and that we using most three methods here to analyze and crunch the dutch newspaper corpus to look for network relationship.
it's a meaning relationships between the words modernity civilization and europe your answer with age of those daikon segments they plotted out how each of these words related to age parts of the newspapers over time and so that was a combination of these three different methods used.
i'm so yeah i guess.
more methods you can produce and the more open you can be about how you're using those in your interpretation that so.
and no possible to jane. i. i. them. and how to make an ok and historians are very very interested in concepts over time it's that temporal aspects which is something that that we spend a lot of time thinking about there is even believe it or not a society for the history of concepts which has a very big. international conference every year so it's a big to fail and say it's over time and the meanings of words and concepts can change. there are lots of different factors working alongside each other to effect this and say different social and cultural contexts and within and between different languages are some of that martin was just talking about that's the kind of work this group is dealing with these very difficult a multi-lingual contacts and it's not over the sort of extended two. hundred year period of time that we've been talking about earlier this morning but language changes very very rapidly now partly because of social media and the exposure to lots of different context and cultures there was a wonderful project funded by the digging into data scheme a few years ago which looked at the spread of new words on so. social media and found that very interesting national and specifically cultural variations into how word spread in north america where they came from and which ones were successful and which ones want to. it's particularly difficult to unpick all of this because when a new meaning emerges you don't lose the old one they coexist for long periods of time and you may end up with those meanings always come existing or some of the may start to fall out of usage and that the rate at which that happens varies wildly. and it may be a generational thing they example we going to use a bit later on his sick which people my age which still never used to mean good. the younger people in the uk be reduced to mean something that's really get so there's some groups never get this new meaning and others use that the time and stops least the old meanings. it's much easier to identify neil it isn't so his new words the first occurrence of the term that is to trace those shifts in meaning between words over time.
there is a really good project at the university of glasgow the historical thesaurus of english which has been looking mapping these changes in identifying the points at which meaning started to shift they found a very very complicated picture this is for the.
would think as i mentioned and it finds forty seven different meanings and usage is thought term within its core purpose wildly differing some of them are to do with mental health as you would expect physical health around inferiority and.
king understanding some of their meanings that would mean anything to us anymore because they've been lost their historical meaning that all of these exist alongside each other complicating whatever analysis of the text that you're going to do so. and incidentally the people who make the most use of this resource are actually historical novelists because they want to write using words that are appropriate for the period that setting their research in so this is kind of broken out of the university space has become something that gets used much more widely in popular culture i think it's very. interesting that even people writing fiction starting to get interested in how meanings change and trying to make sure that the dialogue that they using is appropriate for the period that they're working in.
another project at the island cheering institute in the uk with using the web archive the uk web archive and again they started out being interested and tracking changes of concepts you the time using that historical thesaurus as a way to try and identify this. and she might see from the description of the project out there that they apartment and that approach fairly quickly because it was just too complicated to do even with a fifteen year period of data so they started to look at polaris a rather than getting into those more subtle differences so on his and negative or positive concepts rather than how is this. what's changed where is it being used what context is it appearing in.
some of you looked at this earlier it's the and graham view it that six ever the uk web archive and basis to show really that we had all that can be complicated mess and on difficult to interpret and graphs earlier when we were looking at the and grams but this is fairly straightforward and its to do with tracing the first occurrence of. term in the school purse and a colleague at the british library jason webber came up with them steampunk is a good example that would be that would work well for data from this period it was first coined by the science fiction writer and the standings jeter icon of his first night in nineteen eighty seven so it's a very new word. and it appears in ninety ninety six which is the first date that we've gotten no co-pays here and there are only sixteen instances of it in more than eight hundred sixty six thousand records so. and all affairs premature if there is on its capability to that was his name almost all of those are on university websites so there are sixteen examples of this new terminal repairing on university websites and energy get through to the later period it really start to take off in two thousand and seven and peaks towards the end of a parachute expected it starts to. you can really mainstream and fiction and media and people using it more commonly. so that you can see that some things interesting this started to and to people so i can hear is that this point and it became really popular at this point what you can say is that all only sixteen occurrences of this in ninety ninety five ninety six will you know there's there's more going on that you can get a sense of a trend but you can't really. rely on the fake is that emerging. i want to have a concrete yes this is that i'm sick i was showing you a year with its forty seven different meanings.
you can see that their locations and how often appears that you have absolutely no idea which versions of fat words you're looking at here say it that's really not telling you anything at all other than this word appears roughly this number of times but you can't read anything meaningful into that linguistically. i and you can start to dig down a little bit more we didn't talk about this earlier but is a way of penetrating some of that engram obscuring of what's going on that you can identify parts of speech by these are the different parts of speech that google and graham view will let you look outside attitudes of verbs pronounce and.
so on again you don't know how they've decided that this is an adjective ripper noun but still it gets you some part of the way to get into the complexity of language and it's a it's worth mentioning that something and most of your probably used to excluding stop words from any kind of textual analysis that you're doing for some humanity source. search the stop words might be exactly what you want to look at to identify he's the author of a particular text because it may be somebody that uses the particular basic from a formulation lot so choice that you make to start stop words out because that simplifies the processes might actually complicate the work of her search or who's interested in precise. city that aspect of the text that all of those decisions really stopped playing into this. few at the parts speech qualification to google and graham year and this is called the world to tackle which is the example they give is being a very straightforward example to disarm big weight so you've got to tackle as a verb and tackle as a noun.
but tackle is the third of course doesn't just have one meaning it can mean tackling somebody in a football match or it can mean tackling a job and i'm sure there are various other interpretations of the hottest receiving the example it's given is very simple way of getting through some of this linguistic complexity really isn't during quite dots.
i. this idea about using an appropriate language for the period that you interested in studying is something that's been picked up by a lot of research is this is an american academic open schmidt who has spent a lot of time studying the scripts for television programmes to see how many anachronisms that are and then. how many of you know the and the british television programme downton abbey yeah it's the sort of early twentieth century period drama and he's going to all of the scripts for the series that covering the years nineteen twelve to nineteen twenty one and found thirty four phrases. that would not have been used at the time that has been used in the scripts some of these are wonderful and things that you think sound very period specific stalls luncheon which is a very kind of formal old fashioned sounding phrase but that was not in use at the time things like dress fittings a wartime marriage. want grand children realistic prospects there's all things that have been used by the authors that nobody at the time as far as we know you from our analysis of digitized and contemporary material was ever have used to and he goes on to find another twenty six phrases that would have been possible a you. used in the nineteen times but only really read a and they're much much more common now so again night that's disproportionately sprint early period and those things like likely outcome hospital costs off limits overall charge the basics those kinds of things so i think this is a really interesting. testing way of one attention humiliating scriptwriters for not having done their research aaa but it's it and you know and interesting approach to using these methods to tell us something about the way people communicate it and how we should think about and represent not to.
that the final example of research using the source of method is an english literature professor in the uk called martin eve who's done some amazing work all know the novel cloud atlas by the author david nutt show which is also made into a film he's just published a book on that which he calls close reading with computers which is providing. using these quantity of methods that are often used to get a really large coopers to looking great detail a single text which he would also be able to work with quality of play but china identify what quantity of methods can get you when you're working with a much smaller text and this particular novel has to. the multiple fictional characters contributing different chapters to it and one of those is written in the style of the nineteenth century biography so that was what he wanted to look at how appropriate is the language that the author has used here to reconstruct this a parent's nineteenth century material and it turns out that it's. very very good in the whole chapter other any theory words that would not have been used during the period which is from eighteen fifty one to nineteen ten. and they are speech which dates from nineteen thirty four latino which is from nineteen forty six and lazy are which wasn't started to be used until nineteen sixty said the authors any got that wrong three times but i think more interesting from the humanities perspective is that the author his own. we used racist in colonial terms compared to how often they would have been used in the nineteenth century so the authors had an idea a sense of literature from that period as using lots of colonial empires a racist related terms and is used and they are appropriate for the age. which but he's use them far more than contemporary writers would have done so there that's an interesting different way of looking at it's not just about term occurrence but about the type of term and how often you using them and what that means to studying language as well so this research is being used to a huge scale but also to. get different insights into a much smaller texts and complement quality research process as well and again that meeting backwards and forwards between the two is something that we do indigent humanities a lot. so there are just some examples of how the techniques that martin was talking about and we try to get to that complexity that that's involved here the possibility for the last movie. ok so what we want to do is build on what we do the early on using the and graham here again another water problems with it but this time we want to introduce word senses so if you go to page t. to ac the uk it's the his historical thesaurus.
yet. and who would like to do is kind of form groups different groups from last time if you can please and we want you to commingle with each other. and pick some woods look them up in the historical thesaurus look at the senses of the use of the would over time and then try to find trends to those affects in half the trust and grand view on how we hear it. it aims for that. yet in fact that probably isn't necessary point of comparison. it doesn't make sense. you are you ok so this morning we look for trends in the have to trust and graham you are you now what would like you to do is pick a couple of woods look them up and his historical thesaurus look at the senses of the wouldn't have changed in the source of a tie. time and then try to find those trends in the wood frequencies you know when that america which one point with applied for much to women's health and and women's mental health problems in particular and then moved more broadly into mainstream. psychiatry and then it's also has this meaning of something being hysterically funny say and there is a very tiny bounded shift in the meaning of that turn the other other words that you are aware of the search for things and see how often they might have changed or had their it's very large shifts in meaning and see if you can actually identified. but from the tools and made available for this kind of analysis. to give you the ten fifteen minutes rearrange tables yet mingle with people you haven't made with the fall. and will come around and talk with you as as it goes on your way. but. i.