...a bossy sort of voice

Video in TIB AV-Portal: ...a bossy sort of voice

Formal Metadata

Title
...a bossy sort of voice
Subtitle
Using Natural Language Processing to quantify gender bias in the Harry Potter series
Alternative Title
A Bossy Sort of Voice: Uncovering gender bias in Harry Potter with Python
Title of Series
Author
Contributors
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Identifiers
Publisher
Release Date
2018
Language
English

Content Metadata

Subject Area
Abstract
The Harry Potter series is an incredibly popular franchise that shaped a generation, but it’s also been critiqued in the media and academics for its sometimes sexist portrayal of female characters. This talk uses Natural Language Processing techniques and Python to do the first quantitative analysis of gender bias in the language used to describe women and girls in the series, with a focus on Hermione Granger, the unsung hero of the story. Attendees will see techniques for reading and parsing large text files, leveraging grammatical rules to isolate the right words for the analysis, and data visualization techniques, using Python, the Natural Language Processing Toolkit (NLTK), and Matplotlib. After the talk, the audience will be able to get started on using the “magic” of programming to isolate biased language in any piece of text.
Loading...
Laptop Group action Thread (computing) Gender Code Multiplication sign Bit XML Product (business) Formal language Data management Software repository Sorting algorithm Series (mathematics) Video game Process (computing) Musical ensemble Near-ring Form (programming)
Point (geometry) Slide rule Game controller Group action Manufacturing execution system Electric generator Multiplication sign Combinational logic Bit Mereology Disk read-and-write head Bookmark (World Wide Web) Number Hand fan Degree (graph theory) Causality Model checking Sorting algorithm Series (mathematics) Collision Reading (process)
Covering space Real number Tower Bit Library (computing)
Covering space Slide rule Word Average Code Gender Multiplication sign Formal language
Laptop Functional (mathematics) Standard deviation Module (mathematics) Code Natural language Projective plane Mathematical analysis Mathematical analysis Mereology Machine code Formal language Web browser Power (physics) Word Software repository Sorting algorithm Natural number Computer programming Point cloud Process (computing) Laptop Point cloud
Group action Code Multiplication sign Mereology Machine code Area Formal language Wiki Computer cluster Blog Addressing mode Relief Descriptive statistics Position operator Link (knot theory) File format Electronic mailing list Bit Mereology Instance (computer science) Term (mathematics) Hypothesis Arithmetic mean Process (computing) Order (biology) Right angle Pattern language Functional (mathematics) Token ring Distance Focus (optics) 2 (number) Hypothesis Wave packet String (computer science) Representation (politics) Pairwise comparison Focus (optics) Key (cryptography) First-order logic Projective plane Mathematical analysis Multilateration Group action Word Sorting algorithm Speech synthesis Negative number Object (grammar)
Word Functional (mathematics) Process (computing) Key (cryptography) Uniqueness quantification Pattern language Sequence
Dataflow Context awareness Presentation of a group User interface Code Multiplication sign 1 (number) Cloud computing Electronic design automation Graph coloring Computer programming Number Formal language Twitter Frequency Latent heat Term (mathematics) Different (Kate Ryan album) Hypermedia Metropolitan area network Area Boss Corporation Touchscreen Inheritance (object-oriented programming) Surface Projective plane Sampling (statistics) Bit Line (geometry) Type theory Word Message passing Process (computing) Personal digital assistant Sorting algorithm Point cloud Computer music Absolute value Table (information) Damping Family Identity management Resultant Writing Reading (process) Reduction of order Force
Internet service provider Coma Berenices Data type
(bright music)
- Everyone, thanks for coming to this talk. I know 5:00 is late and it's been a long day. I wanna preface everything by saying I am not gonna show you a bunch of code. I'm gonna talk more about how I transformed data, so I'm gonna do it without showing you any code. If you want code, though, you can go to that URL, and there you'll a GitHub repo with a Jupyter Notebook that walks through all of this stuff, so if you really wanna see some code at 5:00 p.m. on a Monday, feel free. So this talk is sort of near and dear to my heart because it rose out of something that was happening in both my personal and professional life. So my professional life is I'm right now a group product manager at Zendesk in San Francisco. I've been in tech in the Bay Area for about 10 years, most of that as a product manager, but pretty much the whole time I've been in a leadership role, at tech companies that varied from teeny tiny startups to really big tech companies that you've definitely heard of, and the common thread though, throughout my entire experience was what a lot of women experience, which is just kind of little tiny bits of disrespect all the time, and they were usually in the form of very subtle, very coded language. And this has always bothered me. I love my work. I love what I do. That's why I still do it. But it's hard to take and it does wear on you. So while that was happening, some significant things happened in my personal life.
Among them, I had some kids, and one year my son, who is extremely into books and reading, got "Harry Potter and the Philosopher's Stone" as a Christmas gift. And this came from my sister-in-law, who loves Harry Potter, has read every one about 10 times, and insisted that this was great for him. I didn't really know much about it. I sort of missed it, being part of the Star Wars generation. And so I didn't really know much about it. Honestly, I'd seen the movies. I think I fell asleep in number five at one point. So yeah, I had a very vague understanding of what Harry Potter was, but, you know, I thought I'd give it a try. She would know. And for those of you who aren't familiar, it's a series of books, and it's really a franchise now of a lot of other things, but originally it was a series of seven books, and it's about this kid named Harry Potter, and it follows him through ages 11 to 17 as he attends this boarding school in Britain called Hogwarts where he learns how to be a wizard, and he meets up with other young people going through that same process, so, like, going through puberty and learning how to do magic, which is a horrible combination of things. (audience laughs) But I could not been trusted with that. But at the same time that this is happening, Lord Voldemort, who is, like, a super, super evil guy who wants to take over the world, reappears after everyone totally thought he was dead and thought that Harry was the only person who ever faced him and survived. So that's the premise of the books. So it sounds a little bit scary for a four-year-old, but I was like, okay, you know, I guess we can give this a shot. So we started with Philosopher's Stone, and one of the things I noticed about this book which was really interesting was even though everyone told me about how much I would love Hermione, who's one of Harry's friends. He's got two friends, Ron and Hermione, and they're in all the books. They help him out with stuff all the time. Even though Harry's the center, they're really important characters. And one thing I found very striking about it was I really felt like the books could've been about Hermione, because she's the one who's got everything under control. She works really hard at school. She studies, she applies herself. She's really, like, she's conscientious. She cares about other living things to a really large degree. And I found this graphic which I thought really encapsulated how I felt. What if these books were actually about Hermione? My favorite is "Hermione Makes Two Useless Friends." That was my favorite one. These are great, by the way. If you get a chance to read them all, the slides are online. But one thing that was really unfortunate to me, and this is where my personal and professional sort of situations sort of collided, was in how Hermione was described. So the title of this talk is "A Bossy Sort of Voice," and that's taken from the very first time Hermione is introduced by J.K. Rowling, who's the writer of the Harry Potter series. "She had a bossy sort of voice." Before she's even named. And to me that really kinda cut, 'cause growing up in the '80s and '90s, being called bossy was not a good thing, and it was almost always applied to girls, and frequently to me. So that kinda hurt. And as I read it more, these are just two examples I could sort of think of off the top of my head. I kept seeing these things like "bossy know-it-all," "shrill voice" to describe women and girls. So that was kind of a let-down, 'cause I really wanted to read this to my son to get him thinking about strong female characters and characters who didn't look like him generally.
So I decided to investigate this and try to turn it into a conference talk. I'd like to say that even though I was part of the Star Wars generation, I was not unaware of how popular the series was, so when I would tell people as I did this project, "I'm doing a conference talk about sexism in Harry Potter," the reaction was pretty much this every single time, like, how dare you. So that was a little bit disconcerting to anger the Harry Potter fan group. But this wasn't just me.
Other people have noticed this, too. And being, you know, kind of maybe a little bit like Hermione, I did some research at the library, and I found a bunch of books that, whoops,
that cover this in some detail. Whoops.
Little slide malfunction there. That cover this in some detail. And these chapters in these books are written by literary scholars, and they write about a lot of really interesting things, including gender bias in Harry Potter, including race in Harry Potter, magic culture in Harry Potter, so a lotta really cool topics, so if you're interested, I recommend you check it out. But one thing I noticed is they approach it all, always very much like literary scholars do, which is talking about specific situations, talking about how those situations are woven into a bigger fabric, and that makes a ton of sense, but one thing that I didn't really see much of is really analyzing the language that Rowling used to describe the female characters. It happened, but it wasn't done in kind of a broad, programmatic way. And this kinda makes sense.
There were seven books at the time, and altogether they had about 1.1 million words. It was actually more than that. So this was not something, you know, your average, you know, literature professor is going to do. So I thought this was a great opportunity to apply some code to the problem. Another way the personal and professional kinda collided on this one. So I wanna talk next about what I actually did,
how I went about this, because I think this is actually, more so than the code, really the most interesting part of the project.
So the toolbox I used, pretty straightforward. Python, Jupyter Notebook, which you'll see in the GitHub repo. I used the Natural Language Processing Toolkit, and I didn't use the most powerful parts of it. I used some pretty sort of standard functions in it that were incredibly useful, though. And then I used PyPlot, because what is a textual analysis without a lotta word clouds, so brace yourself for some word clouds coming right up.
So figuring out how to approach this was hard, because if you've read the Harry Potter books, there are seven of them, and they span, like, seven years, and there are a lotta characters. So I was kinda like, you know, what do I do? I went through this whole process trying to figure out, how do I find all the female characters? I was on all these wikis trying to, like, pull data off. It was sort of a mess. But I decided to simplify a little bit and to look at as my hypothesis that Hermione was gonna be described by J.K. Rowling in words that are used to describe women in a pejorative way or highlight really sort of effeminate qualities in a not so great way. So that was what I decided to do, and I decided to compare her to Harry and Ron, and the main reason I did this was that she's mentioned quite a bit, as are Harry and Ron, and so even though she's mentioned the least of the three, not surprisingly, right? She is mentioned in every single book and she matures with them and they're all about the same age, so I thought, you know, that's a pretty good comparison. What did this mean for my analysis? Well, first of all, it meant I wanted to focus on narration, not dialog. So I didn't really care that there was a sexist character. What I cared about was how the narrator, who's supposed to be a little bit objective, how the narrator was sort of, was describing Hermione. Then I wanted to look at actions rather than the descriptions, because there aren't really that many descriptions, first of all, and it's really hard to parse some of them and attribute the correct adjectives to the right character. So I decided that I would concentrate only on verbs and adverbs, and I'll talk more about this a little bit later. And then I decided I only wanted to compare the three principal characters, like I mentioned. Only Hermione versus Harry and Ron. So there were three steps I used to analyze the text, and once again, you can go look at the code if you really, really, really wanna look at code right now. First I had to get the text into a format that Python could use. Then I had to isolate the parts of the text I wanted to analyze, and then I wanted to find and summarize the relevant words within them. So to illustrate that, instead of using code, I'm going to use an actual text snippet. So this is, if you're not familiar with the Harry Potter books and you haven't read them multiple times, which may or may not, I don't know, maybe some of you have. This is from the Philosopher's Stone, and this is shortly after that first introduction of Hermione I mentioned when all three of the principal characters are on the train. And the reason I used this one was that it's got both dialog and narration and it mentions all three of the characters, so I think it's a pretty good illustration of what I did. So here's the text. This, these first steps are really straightforward. I just read it into Python using the open function. It's a giant string now. And then what I did was I used the word tokenize function in NLTK, which basically just takes that giant string of text that Python is now able to grok, takes that string of text and breaks it up into essentially a giant list of words and punctuation, and it looks like that. Now, if you've done text processing yourself, you might wonder, well, hang on, don't you wanna take all the punctuation and stop words out? Punctuation would be things like the quotation marks, the commas, et cetera, and the stop words are common English language words. And the answer is no, because the punctuation and the order of the words and the proximity of the words was actually really important for my analysis, so I'll show you why in a second, but that's why I didn't do that. First order of business was remember, I want narration. I wanna see how this objective narrator describes Hermione in ways that are different from Harry and Ron. So one thing I learn whenever I do text processing is English is a really, really inconsistent language, but one thing that is consistent about it, and is consistent in J.K. Rowling's novels, is when characters speak, their dialog is in quotation marks. So I wrote a function to pull everything between quotation marks and stick it into this list of dialog, so then I just have the dialog from that snippet, and then another list of narration. So every single narrative passage gets a little sub-list. So that was the first step. And remember I wanna focus on the narration, so from here on in, discarding dialog, looking only at narration. The next thing I did was look for mentions of Harry, Ron, and Hermione in the narration, and I made a dictionary where the key is the character name and the value was the list of all the times they were mentioned. Now, one thing you might be wondering is, well, what did you do if it has something like, she said angrily? How did you handle that? I didn't, just because it was harder to tell who that she might've been referring to. So what I decided to do was just get rid of that and only use the instances where I was really, really sure that those words applied to that protagonist, so just to be extra, extra sure. So those were the keys with the character names. So then what I did was I used the position tag function in NLTK to identify the part of speech that each word represented. So you can see that here, so you've got, if you look at the Harry key, you can see Harry and then NNP. So NNP is a tag to represent the part of speech. NNP is noun. And then you've got looked, which is a verb, so, and so on. So this basically let me isolate the important nouns and the verbs and adverbs around them, which again is what I wanted to look at.
So this is a little bit of a sidebar, but this was a really important part of the project, and I think another reason why the process was kind of more interesting than the code in a lot of ways. So I had to look at patterns in language in order to find the right words, right? So there are a few patterns that J.K. Rowling and most novelists use in putting verbs and adverbs around character names. So one of them is verb noun, yelled Harry, right? Another one is verb noun adverb, so said Harry happily. Another one is noun verb, Harry saw. Another one is noun adverb verb. This is actually harder than it might sound. And, like Harry whispered softly. So those are the patterns that I found, and that makes sense. They're very common patterns in English. So how I translated that into code
and what I did with it. Well, what I did was I wrote a function that basically looked around, looked at these sequences here. These are all noun verb or verb noun, but just, it's a simple example, but to give you some context for it. So basically, I looked for those patterns that you can see there, and then I made a dictionary where the key is the character name and the values were all of those words that I found that were verbs or adverbs. So in this particular example, you can see that said is used by both Harry and Hermione, so that would mean in the next step where I basically got rid of all of those repeated words, that said would be removed from both Hermione and Harry's list. So what you learn from this dictionary is only Harry used the word looked, or only looked was used to describe Harry, muttered was only used to describe Ron. Sounds about right from the books. And there was no unique verb or adverb there for Hermione. So that's the process I went through in a nutshell.
So now the findings.
So get ready for some word clouds. So one of the first things I did was look at how Hermione is described in each book with words that are only used to describe her. So this line has all the books on it, and then when you look at all the word clouds, it gives you a very high-level idea of what's going on here. So some of the words are totally innocuous, like earnestly or something like that. But you can see, say, in book one, you've shrieked, squeaked, and timidly are, like, the first things you see, like, right there on the left. And it really depends on the book, but there are definitely words in each one that are a little bit eh. There's a lotta squealing, just so you know. And I spent a lot of time around little boys growing up. I babysat little boys. 11-year-old boys squeal and shriek a lot, so I don't really know why they weren't at least in the earlier books. So as I mentioned before, one of the things I wanted to do was compare Hermione to Harry and Ron. So you might be thinking, well, okay, you know, some of these words just on first pass look a little bit off, but, you know, what about Harry and Ron? Like, you didn't show us the words that were used to describe only them. I have a slide for that. So I made this table to explain the results. Across the top you've got the book numbers and then down the side you've got Harry, Ron, and Hermione. And so what this shows is the word that was used to describe them exclusively in the book that the column represents and then the frequency. So you might be thinking here the frequency's quite low, and that's because I was really conservative with how I did this. I was only looking for verbs and adverbs and only in cases where they were located around a protagonist's, like, actual name. So it's a very kind of conservative sample. But here are a couple things that stood out to me. For Harry, I think this makes sense. Rowling uses him as the lens for the narrative, so a lot of the words that are used only to talk about him are, like, thought and wondered and very sort of internal concepts. Not all, but it's pretty different than the other characters. With Ron, if you're, again, familiar with the books, Ron's a real grouch, like, more so than most teenage boys, and he, there's a lotta stuff that's exclusive to him that's, like, groaning and moaning and complaining and being gruff and stuff like that, which really does fit Ron, right? Don't fight me on that, it's true. (audience laughs) So Hermione, what emerges for her? Well, so going back to what I said at the beginning, I really thought Hermione was a great character. She was so strong. She leads the resistance. She comes up with the whole resistance and makes Harry the leader of it, which is a little bit weird. But anyway, I thought she'd be used with very heroic terms, and a lot of them are neutral, but there are quite a few in there that echo what I said earlier. Squeaking, squealing. In book seven, I highlighted that because I thought it was kind of especially unfortunate. She's got squeaked, shrilly, breathed and breathless and squealing, and, you know, I was just sorta like, in that book she does so many great things, and it was just so unfortunate to see that. So what do I want you to take from this? You can totally go out and analyze Harry Pottery by yourself and tell me how I'm wrong if you want. I dare you. (audience laughs) But what I would really like you to take from this is
what I mentioned at the very beginning about coded language and how people who work in tech and beyond really do deal with this all the time, and they're very subtle language cues that I think sometimes we don't even notice. I sincerely think J.K. Rowling did not notice when she was doing this. I really do. I don't think she was trying to do this. But I think that it's so innate to a lot of us, it's just how we sort of think, that it happens. And so what I would ask is that you look for words like the ones on the screen to describe women and people of color, people based on their sexual preference, and just be critical of them, and really be thoughtful when you're consuming media, especially media that you love, things like Harry Potter, which I know a lotta people love, but it's not perfect, and I think we need to kinda keep that in mind, especially when we're reading to our kids. I definitely do. So thank you for coming to this talk. I really do appreciate, once again, that you've stayed till 5:00 p.m. That's really awesome. I do a lot of projects around literature and trying to use programming to understand literature better. I come from a literature family. Both my parents are college English professors, which is probably a whole other talk. But yeah, so I've really enjoyed doing this. I did a different presentation for Python about gothic literature. So if you're interested in this area, I haven't, I feel like, really even scratched the surface with some of the tools that could be used to help folks out who are in those fields. Humanities, social sciences, journalism, all that stuff. If you're interested in the same sort of general realm, please do look me up on Twitter. So thank you very much. (audience applauds) - [Woman] I just happened to be here, but I absolutely loved this talk. - Oh, thank you. - [Woman] I'm kind of, I've just personally started doing a lot more writing myself, and been thinking about words and their meanings, because, and some of their preconceived notions, and as, and I did a lightning talk earlier, and the, like, when I kind of think of, when I say here's who I am, it's like, I have ADD and I have, and I'm crazy. These are two words that were really, crazy is such a, been a word that's, like, stigma, and there's been mental disorders in my family that have been labeled things, and they're, you know, they have preconceived notions as being bad, and words like, now I just see these words as, they describe me, but I've changed their definition to me. So where I'm like, hey, I'm crazy, but I'm accepting of that, and I use it to do crazy awesome things, and my ADD means that I don't really feel like doing that right now, so I'm gonna do what I feel like doing, and it charges me up and it lets me do something else. Like, all of a sudden when I procrastinated one time at work, I don't know, my boss isn't here. I was writing. Got the flow. Just purged out this stuff in, like, half an hour as fast as I could type, just typing away. Before I knew it, then I was, like, got into work and got onto my, well, I was at work, but I was doing my work work, and I was in the flow and helped me continue that flow into my work. And so these are just, sorry, this is a long question. All right, so words, preconceived notions and how they change and through time and our definition of things through time and our culture, since society has preconceived notions of those words, so answer however you, that whatever way you want. - Well, so, that's interesting, so thanks for calling out the health and ableism generally. I think that's important, too. So yeah, so I think that one bit of pushback, I guess, that I've had on this was that a few years back, maybe three years ago, there was this campaign to sort of reclaim bossy as a term. And so I think you could maybe look at that lens, but I think what's really important, one thing I've learned doing this type of work, like, looking at different literature from different eras, is you do have to put the lens of the time that it was written on it a little bit. So even though bossy may not feel pejorative to somebody now... - [Woman] (speaks off mic) - Yeah, exactly. Yeah, even if it's reclaimed, in the context of the time, it wasn't meant as a compliment. So that, I think that that's really important. But I do also think, I mean, there's nothing inherently bad about squealing. It's just the way that, the context in which that word exists is so negative. - [Man] So first of all, awesome talk. - Thank you. - [Man] This was extremely cool, and I dug it. My question is, would it be easy to apply this process to non-protagonist characters, and the reason I'm asking is because it'd be super cool to see if these trends were consistent among multiple female characters, or instead of it was super specific to the characterization of Hermione. - Um-hum, yeah, that's a great question. So with the code as it exists now, no, it's not hard. Really all you have to change is the names of the protagonists, and then you're good. There's nothing built into the code about what is negative or pejorative, so it's really up for some interpretation. But yeah, you could add female characters, you could do good versus evil kind of a comparison. Could be really interesting, too. It would be very, very easy to do that with any characters. - [Man] (speaks off mic) (audience laughs) - Sure. (audience laughs) Yeah, I get asked that actually quite a bit, and, like, for example, if J.K. Rowling is more sexist or less sexist than some other author, you know, if I can compare it that way. So yeah, you totally could. Haven't got around to it yet. Maybe one day. Thank you. (audience applauds) Thanks, everyone.
Loading...
Feedback

Timings

  495 ms - page object

Version

AV-Portal 3.21.3 (19e43a18c8aa08bcbdf3e35b975c18acb737c630)
hidden