Terminology and classification in the Prosecution Project
Formal Metadata
Title 
Terminology and classification in the Prosecution Project

Title of Series  
Author 

License 
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2018

Language 
English

Content Metadata
Subject Area  
Abstract 
A recording of a presentation from Mark Finnane for the December 2018 AVSIG meeting

00:00
Mathematics
Graph (mathematics)
Projective plane
Bit
Statistics
00:21
Domain name
Mathematics
Graph (mathematics)
Projective plane
Bit
Quicksort
Distance
Statistics
00:36
Building
Uniqueness quantification
Graph (mathematics)
Statistics
00:52
Frequency
Video projector
State of matter
Multiplication sign
Graph (mathematics)
Musical ensemble
Statistics
Row (database)
01:11
File archiver
Statistics
01:26
Execution unit
Service (economics)
Computer file
Graph (mathematics)
Moment (mathematics)
Virtual machine
Database
Statistics
01:51
Type theory
Group action
Graph (mathematics)
Statistics
02:08
Maxima and minima
Bit
Statistics
02:21
Type theory
Term (mathematics)
Statistics
Mach's principle
02:36
Divisor
Term (mathematics)
Graph (mathematics)
Shape (magazine)
Statistics
02:52
Cuboid
Energy level
Quicksort
Statistics
Family
03:08
Statistics
Event horizon
Descriptive statistics
03:21
Pattern language
Quicksort
Statistics
Event horizon
03:36
Point (geometry)
Category of being
Chain
Visualization (computer graphics)
Statistics
Form (programming)
03:52
Web page
Email
Term (mathematics)
Computergenerated imagery
Convex hull
Statistics
04:04
Email
Link (knot theory)
Computer configuration
Cube
Computergenerated imagery
Code
Database
Information
Statistics
04:17
Source code
Email
Link (knot theory)
Observational study
Interior (topology)
Database
Grand Unified Theory
Number
Attribute grammar
Graphical user interface
Computer configuration
Ring (mathematics)
Selectivity (electronic)
Fingerprint
04:49
Source code
Link (knot theory)
Observational study
Computer configuration
Database
Information
05:06
Web page
Email
Functional (mathematics)
Multiplication sign
Range (statistics)
Set (mathematics)
Database
Counting
Number
Attribute grammar
Frequency
Graphical user interface
Term (mathematics)
Energy level
Selectivity (electronic)
Condition number
Area
Source code
Execution unit
Link (knot theory)
Information
Direction (geometry)
Range (statistics)
Field (computer science)
Bit
Complete metric space
Cartesian coordinate system
Category of being
Process (computing)
Bridging (networking)
File archiver
Row (database)
07:16
Source code
Email
State of matter
Multiplication sign
Web page
Programmable readonly memory
Login
Attribute grammar
Usability
Group action
Number
Attribute grammar
Order (biology)
Process (computing)
Term (mathematics)
Personal digital assistant
Different (Kate Ryan album)
Commodore VIC20
Videoconferencing
Website
Electronic visual display
Extension (kinesiology)
Row (database)
08:48
Source code
State of matter
Lemma (mathematics)
Aliasing
Attribute grammar
Usability
Group action
Attribute grammar
Order (biology)
Number
Network topology
Commodore VIC20
Electronic visual display
Free variables and bound variables
09:00
Source code
Email
Gender
State of matter
Aliasing
Source code
Attribute grammar
Floating point
Group action
Counting
Total S.A.
Wärmestrahlung
Digital electronics
Supersonic speed
Number
Network topology
Information
Energy level
Identity management
09:14
Area
Source code
Execution unit
Gender
Aliasing
Floating point
Mereology
Number
Thetafunktion
Network topology
Normed vector space
Traffic reporting
Identity management
Identity management
Row (database)
Window
Flux
09:35
Email
Source code
Link (knot theory)
Key (cryptography)
Aliasing
Programmable readonly memory
Visual system
Floating point
Attribute grammar
Electronic mailing list
Group action
Number
Identity management
Row (database)
Window
Free variables and bound variables
09:50
Email
Number
State of matter
Aliasing
Electronic mailing list
Group action
Control flow
Physical system
Row (database)
10:08
Email
Source code
Trigonometry
Number
Key (cryptography)
Aliasing
Energy level
Identity management
Maß <Mathematik>
Row (database)
Number
10:47
Web page
Email
Mechatronics
Statistics
Code
Aliasing
Visual system
Row (database)
11:48
Area
Statistics
Meta element
Electric generator
Key (cryptography)
Code
Multiplication sign
Product (business)
Category of being
Frequency
Visualization (computer graphics)
Term (mathematics)
Energy level
13:06
Area
Point (geometry)
Category of being
Statistics
13:41
Frequency
Mapping
Moment (mathematics)
14:11
Frequency
Observational study
Code
Multiplication sign
Database
14:32
Point (geometry)
Information
Quicksort
Event horizon
00:00
so mark welcome please introduce yourself a bit more if I've missed important things and your project yeah and this is a complete change from the
00:14
more technical side that so he's very expertly presented there even if there's
00:21
the sense of her being a bit distance from the research domain because I'm sort of making potatoes researcher I'm a
00:29
historian a criminologist and professor of history at Griffith University for
00:34
the last five years I've been directing this project or the prosecution project
00:38
which is a history of the criminal trial in Australia and what's unique about it is that we're building a database of as
00:47
far as we can get them all criminal prosecutions in Australian criminal
00:53
jurisdictions which are mainly the state's the six states and the Northern Territory over very long periods of time so we have records dating from 1788 through to the 1960s this has been a
01:08
digital project that has relied on on
01:12
partnerships with archives that provide the data so a typical data is from
01:19
original court registers and we extract
01:23
that data transcribe it because mostly
01:27
manual data so there was no way of accessing the data file machine technologies at the moment so I've had to organize transcription using the
01:37
research and the volunteered community into a database that we built with a research services at Griffith University and on this topic today we probably really should have somebody from our
01:53
research ten years to talk about some of the issues that are likely to be most
01:58
interest to this group but yeah I mean
02:02
you've indicated an interest in this new type of research so I might just
02:08
introduce a little bit about it and show
02:12
you some of the tools we have and particularly the issue around what we do with the data once we
02:20
get it because let me say there are two
02:24
types of users of this kind of data there are researchers like ourselves who may be interested in telling the individual stories or looking at in kind
02:36
of conventional social sciences terms
02:38
looking at aggregated data and analyzing that in terms of one of the factors that
02:44
shape how a criminal trial develops and
02:49
what its outcomes are so at the
02:52
individual level we also have very large community of people involved in family history and genealogy and so on there are also access our data box those sort
03:05
of users are really interested in individual stories and really in
03:13
descriptions of events and individuals
03:18
as they were recorded originally and not
03:23
really classified into some sort of higher aggregate but for the purpose of thinking about patterns of the events
03:33
and we're talking about then
03:37
visualization of our data's is coming quite important and it's at that point that we have to think about how we
03:47
aggregate into meaningful categories that respect historical forms but also
03:55
make sense in terms of abilities so this
04:03
public search page I think you can all
04:06
see that here that just outlines the
04:10
project and so we have search historical
04:15
trials here which has got a basic
04:17
keywords search which works across the select number of attributes of our data and simply searches in an uncontrolled way for any term arise in that somebody might choose to investigate so somebody come in Imam want to know of that particular individual and they type that
04:47
in or they may want to know about some
04:51
particular offense and without having to go into more advanced search they may wish to see whether we've got anything on forgery and there's plenty of stuff
05:03
there for them to look at but if they've
05:06
got more information about the area in which they want to search then they are able to search across a number of our attributes now this is a select number of attributes for a specified period of time which is constrained by archive access conditions some of our records are from closed periods or more under restricted access above the kinds such as children's court material now the third most of the records we have are P concert' across this range of attributes and we're in the process at the moment as we're getting to a more complete data set of starting to consider releasing a bit more of our data so how do we derive these things I think in terms of any kind of application principles of classification then you know the original data challengers just at the transcription level of getting accurate terminology off the page the data so first name and surname are significant challenges so it's very important for our data that they be as accurate as possible the offense category is one where we have a possibility both an original transcript considering how it might create it purposes and I'll show you that in a minute most of the other terms we have available we simply transcribe up in the original record and we have an open search that enables people to establish whether somebody guilty offenses in New South Wales in yes so that's just how
07:11
that search function works well I might just draw attention to what lies behind
07:18
this and that this is probably of more interest to a lot of people our first challenge was that we were dealing with a number of jurisdictions in which terms that we'd regard as you know comment or them might have been represented differently in the original records and the records in any case vary in the extent to which they cover all aspects of the criminal process so Queensland of Victoria are particularly rich datasets in terms of including the earlier stages of the trial as well as later but we had to develop a process that would enable the research is to define the different registers as we call them different
08:14
state jurisdictions in the particular course which which we were accessing data from and have an approach that would allow us to add attributes as they emerged over time and to have registers that had different numbers of attributes and at the same time respecting the original data we
08:49
so so this is a typical example maybe
08:54
Queensland state Supreme Court 67
08:59
attributes here some of these attributes
09:03
will be shared between different with
09:08
other states and others not some of the
09:11
data is available in original sources others is very inconsistent it's very
09:18
important in this area looking at indigenous identity for example but for the most part these records don't contain that and that tends to be derived from other reports such as news
09:32
historical newspapers which can be searched through a trove API that we
09:37
link to our records I'll just show you
09:40
quickly how this looks in in practice
09:44
with again examples from cuenca so a key
09:52
thing for us is verifying the data and the system for most of our states enables us to check the data extracted
10:04
against the original record and that's
10:09
very important because our data has been prepared both by researchers on the research team and as I mentioned by quite a large number of volunteers and this this record itself has been entered by a volunteer just and last day up to some are able to check the accuracy of this record and this is a pretty experienced transcribe owned by the
10:36
expecting now thank you one of the key classification challenges for us is making sense of this offense
10:52
here you're breaking open a locked showcase and stealing there from which is a very specific definition of an offense that if you looked at crime statistics you wouldn't find it for that and so we've done quite a lot of work over the last couple of years code in our fence data in particular to enable us to visualize the records so out of that wonderful so back on the
11:26
main page people are able to visualize their records through this facility and here where as I say we've run a current over well one second sorry Mike you you just
11:51
cut out for about a sentence there if you can run just that last sentence
11:55
please yes so the visualization is a product of work we do on aggregating particularly our offense categories because this this obviously key area of interest for people looking at this in social science or historical terms we run a code over our offense data of the whole jurisdictions over long periods of time and generate levels of aggregation through that code and the classifications are pretty familiar to people working in criminal justice anybody looking at criminal statistics since the 19th century would recognize these are generally the kind of categories that are used and really across national borders now as well so there's a lot of work gone into that and we have both meta level homicide
13:06
offenses and property offenses person offenses and then within those categories looking at more refined aggregations that still have their reference point in historical statistics or end now in in contemporary criminal justice statistics of the kind you see on ideas the other areas are pretty much drawn direct from our data although we do agregate again the fields and
13:42
sentences particularly because there's some interesting and sitting during this
13:47
period when the death penalty was still in place currencies in which death penalty in fact was the plight particularly they're not in century the trial place in committal place we just used the original data there were involved in some mapping exercises at the moment where we
14:16
yo code they got an IRC to look better
14:20
more detailed studies in interpersonal violence over long periods of time using this database and extending it and we'll be very interested in geocoding crime
14:32
events if we can get more specific information so look that's so sort of what we're about and as much as I I think I can say at this point I'm very happy to answer any questions thank you very much mark that's very very
14:50
interesting stuff