Terminology and classification in the Prosecution Project
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 23 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/34208 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
2
3
7
9
16
17
22
23
00:00
Graph (mathematics)StatisticsMathematicsMusical ensembleExecution unitMaxima and minimaMach's principleChainBitTerm (mathematics)DivisorMathematical analysisMathematicsQuicksortDomain nameFamilyState of matterMultiplication signDescriptive statisticsFile archiverEvent horizonVirtual machineProjective planeMoment (mathematics)Pattern languageDatabaseUniqueness quantificationService (economics)Row (database)Visualization (computer graphics)Category of beingType theoryPoint (geometry)Video projectorBuildingFrequencyComputer fileDistanceForm (programming)CuboidGroup actionLevel (video gaming)Computer animation
04:02
Convex hullStatisticsEmailComputer-generated imageryInformationCodeCubeDatabaseWeb pageProjective planeComputer animation
04:13
Link (knot theory)DatabaseComputer configurationSource codeObservational studyFingerprintEmailRing (mathematics)Interior (topology)Graphical user interfaceGrand Unified TheoryInformationField (computer science)Range (statistics)Execution unitBridging (networking)Direction (geometry)CountingNumberCondition numberTerm (mathematics)Selectivity (electronic)FrequencyRow (database)Process (computing)Functional (mathematics)AreaAttribute grammarInformationComplete metric spaceSet (mathematics)Multiplication signBitFile archiverCartesian coordinate systemCategory of beingRange (statistics)Search algorithmLevel (video gaming)Web pageClosed setComputer animation
07:16
Web pageVideoconferencingWebsiteProgrammable read-only memorySource codeLoginEmailAttribute grammarGroup actionCommodore VIC-20UsabilityElectronic visual displayOrder (biology)Network topologyFree variables and bound variablesLemma (mathematics)NumberAliasingFloating pointSupersonic speedGenderDigital electronicsInformationLevel (video gaming)CountingTotal S.A.WärmestrahlungIdentity managementWindowNormed vector spaceExecution unitThetafunktionFluxVisual systemElectronic mailing listState of matterTerm (mathematics)Multiplication signNumberAttribute grammarDifferent (Kate Ryan album)Source codeIdentity managementRow (database)CASE <Informatik>AreaExtension (kinesiology)Process (computing)MereologyKey (cryptography)Level (video gaming)Traffic reportingSet (mathematics)Computer animation
09:49
EmailElectronic mailing listGroup actionControl flowNumberAliasingIdentity managementLevel (video gaming)Source codeMaß <Mathematik>TrigonometryVisual systemMechatronicsRow (database)Representation (politics)NumberPhysical systemKey (cryptography)State of matterCategory of beingStatisticsCodeControl flowComputer animation
11:24
Web pageRow (database)CodeMeeting/Interview
11:48
Electric generatorPoint (geometry)Multiplication signProduct (business)Category of beingAreaVisualization (computer graphics)Level (video gaming)Meta elementStatisticsKey (cryptography)FrequencyTerm (mathematics)Field (computer science)CodeAbsolute valueMeeting/Interview
13:47
MappingFrequencyMultiplication signMoment (mathematics)CodeEvent horizonDatabaseQuicksortInformationObservational studyPoint (geometry)Arc (geometry)Meeting/Interview
14:49
Meeting/Interview
Transcript: English(auto-generated)
00:00
So Mark, welcome, please introduce yourself a bit more if I've missed important things and your project. Yeah, thanks very much, Nick. And this is a complete change from the more technical side that Sophie's very expertly presented there, even if there's the sense of her being a bit distanced from research
00:25
domain because I'm a sort of meat and potatoes researcher, I'm a historian, a criminologist, I'm Professor of History at Griffith University. For the last five years, I've been directing this project called Prosecution Project, which is a history of the criminal trial in Australia.
00:43
And what's unique about it is that we're building a database of as far as we can get them all criminal prosecutions in Australian criminal jurisdictions, which are mainly the states, the six states and the Northern Territory over very long periods of time.
01:00
So we have records starting from 1788 through to the 1960s. This has been a digital project that has relied on partnerships with archives that provide the data. So our typical data is from original court registers.
01:22
And we extract that data, transcribe it, because mostly manual data, so there was no way of accessing the data by machine technologies at the moment. So we've had to organise transcription using the research and the volunteer community into
01:41
a database that we built with the research services at Griffith University. And on this topic today, we probably really should have somebody from our research team here to talk about some of the issues that are likely to be of most interest to this group.
02:00
But yeah, I mean, you've indicated an interest in this new type of research. So I might just introduce a little bit about it and show you some of the tools we have, and particularly the issue around what we do with the data once we get it.
02:21
Because, let me say there are two types of uses of this kind of data. There are researchers like ourselves who may be interested in telling individual stories, or looking at in kind of conventional social sciences terms, looking at aggregated data and analysing that in terms of what are the factors that shape how a criminal
02:47
trial develops and what its outcomes are. So at the individual level, we also have very large community of people involved in family history and genealogy and so on, that also access our database.
03:05
And those sort of users are really interested in individual stories and really in descriptions of events and individuals as they were recorded originally and not reclassified into some
03:25
sort of higher aggregate. But for the purpose of thinking about patterns of the events that we're talking about, then visualisation of our data is becoming quite important.
03:43
And it's at that point that we have to think about how we aggregate into meaningful categories that respect historical forms, but also make sense in terms of the social science possibilities of analysis.
04:00
So this public search page, I think you can all see that here, that just outlines the purposes of the project. And so we have search historical trials here, which has got a basic keywords search, which works across a select number of attributes of our data and simply searches
04:28
in an uncontrolled way for any term arising that somebody might choose to investigate.
04:40
Somebody coming in might want to know about a particular individual and they type that in, or they may want to know about a particular offence. And without having to go into more advanced search, they may wish to see whether we've got anything on forgery, and there's plenty of stuff there for them to look at.
05:04
But if they've got more information about the area in which they want to search, then they are able to search across a number of our attributes. Now this is the select number of attributes for a specified period of time, which is
05:22
constrained by archive access conditions. Some of our records are from closed periods or under restricted access of other kinds, such as children's court material. But for most of the records we have, people can search across this range of attributes
05:41
and we're in the process at the moment as we're getting to a more complete data set of starting to consider releasing a bit more of our data. So how do we derive these things? I think in terms of any kind of application principles of classification, then the original
06:08
data challenges are just at the transcription level of getting accurate terminology off the page of the data. So first name and surname are significant challenges, so it's very important for our
06:22
data that they be as accurate as possible. The offence category is one where we have the possibility both of an original transcription considering how it might be for our purposes, and I'll show you that in a minute.
06:44
Most of the other terms we have available, we simply transcribe from the original record and we have an open search that enables people to establish whether, you know, somebody had guilty offences in New South Wales in 1910, I should get some results from that
07:07
I think, yes. So that's just how that search function works. Well, I might just draw your attention to what lies behind this, and this is probably of more interest to a lot of people.
07:23
Our first challenge was that we were dealing with a number of jurisdictions in which terms that we'd regard as, you know, common to all of them might have been represented differently in the original records, and the records in any case vary in the extent
07:47
to which they cover all aspects of the criminal process. So, you know, Queensland and Victoria are particularly rich data sets in terms of including earlier stages of the trial as well as later, but we had to develop a process that would
08:08
enable the researchers to define the different registers, as we call them, different state jurisdictions and the particular courts at which we were accessing data from, and have
08:27
an approach that would allow us to add attributes as they emerged over time, and to have registers that had different numbers of attributes, and at the same time
08:45
respecting the original data. We have, so this is a typical example, maybe Queensland State, Supreme Court, we've got 67 attributes here.
09:01
Some of these attributes will be shared between different, with other states and others not. Some of the data is available in original sources, others is very inconsistent. It's very important in this area, looking at Indigenous identity, for example, but
09:24
for the most part, these records don't contain that, and that tends to be derived from other reports, such as news, historical newspapers, which can be searched through a Trove API that we link to our records. I'll just show you quickly how this looks in practice with, again, examples from Queensland State.
09:50
So a key thing for us is verifying the data, and the system for most of our states enables us to check the data extracted against the original record, and that's
10:09
very important because our data is being prepared both by researchers on the research team, and as I mentioned, by quite a large number of volunteers, and this record itself has been
10:22
entered by a volunteer just in the last day or two. So we're able to check the accuracy of this record, and this is a pretty experienced transcriber, and I'd be expecting an accurate representation of what's on the data page.
10:44
One of the key classification challenges for us is making sense of this offence here, breaking open a locked showcase and stealing therefrom, which is a very specific definition of an offence that, if you looked at crime statistics, you wouldn't find a category
11:03
for that, and so we've done quite a lot of work over the last couple of years coding our offence data in particular to enable us to visualise the records. So back on the main page, people are able to visualise our records through this
11:33
facility, and here we, as I say, we've run a code over...
11:46
One second, one second there, Mark. Sorry, Mark, you just cut out for about a sentence there, if you could just, just that last sentence, please. Yes, so the visualisation is a product of work we do on
12:06
aggregating particularly our offence categories, because this is obviously a key area of interest for people looking at this in social science or historical terms. We run a code over our offence data for whole jurisdictions over long periods of time
12:24
and generate levels of aggregation through that code, and the classifications are pretty familiar to people working in criminal justice and anybody looking at criminal statistics
12:42
since the 19th century will recognise these are generally the kind of categories that are used and really across national borders now as well. So there's a lot of work gone into that, and we have both meta-level aggregates looking at homicide
13:06
offences and property offences, personal offences, and then within those categories looking at more refined aggregations that still have their reference point in historical statistics or
13:23
and now in contemporary criminal justice statistics of the kind you see on ABS. The other areas are pretty much drawn direct from our data, although we do aggregate, again, the verdict fields and sentences particularly, because there's some interest in considering
13:46
during this period when the death penalty was still in place, those occurrences in which the death penalty in fact was applied particularly for 19th century. The trial place and committal place, we just use the original data there at the moment. We're involved in some mapping exercises
14:09
at the moment where we've got an ARC to look at a more detailed study of interpersonal
14:22
violence over long periods of time using this database and extending it, and we'll be very interested in geocoding crime events if we can get more specific information as we hope. So that's sort of what we're about and as much as I think I can say at this
14:44
point I'm very happy to answer any questions. Thank you very much Mark, that's very very interesting stuff.