We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Cats, The Musical! Algorithmic Song Meow-ification

00:00

Formal Metadata

Title
Cats, The Musical! Algorithmic Song Meow-ification
Title of Series
Number of Parts
66
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
How are you supposed to sing along with your favorite TV theme song every week if it doesn’t have lyrics? At my house, we “meow” along (loudly). We also code, so I built ‘Meowifier’ to convert any song into a cat’s meows. Join me in this exploration of melody analysis APIs and gratuitous cat gifs.
AlgorithmMusical ensembleLength of stayVideoconferencingMusical ensembleTelecommunicationComputer-assisted translationAlgorithmMultiplication signJSONXMLComputer animation
WhiteboardSoftwareForceGame theoryBuildingVideo gameStorage area networkSoftware engineeringMathematicsBit
Moment (mathematics)Goodness of fitGame theoryBookmark (World Wide Web)Computer-assisted translationWebsiteBitHypermediaCuboidSoftwareRight angleComputer animation
Inheritance (object-oriented programming)Digital photographyComputer-assisted translationLecture/Conference
Game theoryTunisRight angleRoundness (object)Process (computing)Source code
Game theoryRow (database)Line (geometry)Roundness (object)Computer programmingCartesian coordinate systemComputer animation
Function (mathematics)Computer-assisted translationAudio file formatComputer animation
Musical ensembleVideoconferencingComputer-assisted translationInternetworkingBookmark (World Wide Web)Keyboard shortcutSelf-organizationMusical ensemble
Social classRight angleSocial classRule of inferenceObject-oriented programmingSoftware design patternXML
Standard deviationCartesian coordinate systemMultiplication signTest-driven developmentCodeLengthLibrary (computing)Audio file formatComputer animation
Metropolitan area networkWave packetMathematical singularityMereologyComputer animation
NeuroinformatikInternetworkingAlgorithmProgrammer (hardware)
Mach's principleVacuumVoltmeterPoint (geometry)ParsingSocial classAlgorithmGradientMIDIParsingMappingNumberDependent and independent variablesKey (cryptography)InformationGame theoryComputer fileLine (geometry)Goodness of fit
MIDIOctaveTexture mappingKeyboard shortcutMusical ensembleOctaveKeyboard shortcutDifferent (Kate Ryan album)DigitizingTelecommunicationRule of inferenceTable
Logical constantParsingVolumeExecution unitComa BerenicesMIDILogical constantData conversionCorrespondence (mathematics)DampingSocial classControl flowMultiplication signLine (geometry)Hash functionComputer animation
Proper mapLength2 (number)Multiplication signMetropolitan area networkComputer fileComputer animation
BitMultiplication sign
RootExecution unitFormal grammar1 (number)MereologyStatement (computer science)Computer fileCodeMultiplication signCartesian coordinate systemLibrary (computing)Right angleSource code
Library (computing)Computer fileLength2 (number)Audio file formatParameter (computer programming)Computer animation
MereologyLogicCartesian coordinate systemNumberLengthLoop (music)Computer fileMathematics2 (number)DampingIntegerProgrammschleifeSource codeComputer animation
EmailTouchscreenMultiplicationLengthOcean currentComputer fileComputer animationLecture/Conference
MereologyComputer fileSocial class
Computer-assisted translationLibrary (computing)Range (statistics)
BitCurvatureMobile appProof theoryOctaveLaptopMultiplication signCausalityComputer animation
Software testingInternetworkingMessage passing
Metropolitan area networkOctaveWebsiteLibrary (computing)Freeware
Case moddingLibrary (computing)OctaveKeyboard shortcutRight angleComputer fileNumberFreewareRange (statistics)Diagram
Gamma functionOctaveLibrary (computing)Range (statistics)Logical constant
Patch (Unix)Computer fileOctaveTunisComputer animation
Multiplication signComputer-assisted translationLibrary (computing)Mobile appKeyboard shortcutComputer animationDiagram
Case moddingCodeMoment (mathematics)Library (computing)OctaveFile format
Moment (mathematics)Game theoryCASE <Informatik>
Game theoryBit
Game theory
SoftwareForm (programming)HypothesisUniverse (mathematics)
Reading (process)Social classFreewareSoftware testingSoftwareComputer fileMIDIDifferent (Kate Ryan album)XMLComputer animation
Mobile appFrequencyNeuroinformatikGroup actionMultiplication sign
CausalityPerfect groupComputer animation
Client (computing)WebsiteRevision controlInternetworkingReverse engineeringIdentifiabilityPanel paintingComputer animation
Multiplication signComputer-assisted translationSlide ruleComputer animation
Coma BerenicesSimultaneous localization and mappingComputer animationJSONXML
Transcript: English(auto-generated)
My name is Beth Hobart. This talk is called Cats the Musical Algorithmic Song Meowfication.
Very excited to be here today. My first time speaking at RubyConf, so thank you. So I have a warning that I give at the beginning of this talk, that within the next 20, 25, 30 minutes you're going to likely encounter some poor singing, too many
cat GIFs, and an excessive amount of silliness, so be prepared. Let's get started. So I actually don't remember what my bio says on the website, but I have some changes, life changes that have happened. It's a little bit out of date. So I was
a software engineer at Flywheel when I applied for the CFP working in Omaha, Nebraska, and now I am at Thoughtbot in San Francisco, which I'm very excited about. So if you happen to work at Thoughtbot and I have not met you yet, please come say hi afterwards. And let's get started. So I have been told by a few
people that one of my strengths is thinking outside the box, or as I like to call it, being really good at coming up with stupid ideas. So a few of my ideas have included feces book, a social media website for your poo. Okay, maybe I
should have stopped there. Kombucha, really boozy kombucha. Kombucha is like a fermented tea drink. It's supposed to be healthy, but like does the healthiness outweigh the alcoholicness if you add booze to it? I don't know. Another idea, and I don't have a name for it yet, but it's birth
announcements, but for features that you release at work. Then my favorite is a Meowica. It's a US-based antisocial network for your cats, and I actually built and released this to a smaller market in Omaha. It was called,
and it still exists, it's called Oh Meow-ha. So you can go there and see it. So slight change of subject, but still very relevant. Who here is familiar with Game of Thrones? Okay, cool. Like most popular television show that's on right now? Okay, good. So I need to give you a little bit of
background on me first. So this is me and my husband. We got married in a bar to the disappointment of both of our parents. That's not very relevant. These are our cats, Xiao Gui and Clementine, and Geeta. She's an asshole, and yes, this is a professional photograph of my cat. This is me and my husband on
Halloween a few years ago. We were really in the Game of Thrones. Yeah, so you can barely tell the difference, I think. You know. So you're wondering, where are you going with this? Well, I'm going to talk about that. So I'm
going to play you the Game of Thrones theme song right now, if we could get a little louder. Okay. I'm sure most of you are familiar with that.
So you may have noticed, it doesn't actually have any lyrics, which I think is a problem. So one Sunday night, I was watching Game of Thrones, and I just kind of started meowing along. And that's where this whole talk comes from.
So you're all familiar with the Game of Thrones theme song. You know what a meow sounds like. Meow. So we're gonna do something together. We are going to a Meow the Game of Thrones theme song. I will get us started, and then you are all going to join in. Are you ready? Meow, meow, meow, meow, meow, meow, meow meow, meow, meow,
meow, meow, meow, meow, meow, meow, meow, meow, meow, meow, meow, meow, meow, meow, meow, meow. Meow. Cool. Nice job everybody. We just, thank you. Yeah. Give yourself a round of applause. We just Meow-ified the Game of Thrones theme song.
So, probably set some kind of world record for the most people meowing in a room. So, Meow-ifier, the application is just one idea in a long line of bad ideas that I've come up with over the years. But it also seemed like a really interesting
technology programming problem to solve. And I finally said, what the hell? And I went ahead and I built it. So, how exactly does it work? Well, simply you upload a song's audio file and Meow-ifier outputs a new audio file with that song's melody sung by cats. So, I wish I could say there's such a thing
as a cat choir. There is not. I Googled it, nothing came up. So, it was all up to me to figure out. So, yeah, Google, that's great. In my internet research, I came across some very famous musical cats. Probably my favorite was Keyboard Cat.
And I would love to keep and finish out that video, but I only have 25, 30 minutes here. So, I also discovered I'm not the first person to come up with this idea. This is a cat organ. So, there's some really interesting details
about the cat organ. So, a historian in the 16th century, so this historian described seeing a cat organ, and wait for it, being played by a bear when King Philip II rode into Brussels. So, I honestly have no idea if that's true or not. Like, are all historians trustworthy? I'm not sure.
But yeah, throughout the years, people have mentioned cat organs. So, the consensus is that it actually has never been built and it would make terrible music anyway because cats don't meow in a fixed pitch, which I'll get to later. Well, until now, that is. So, we're gonna do this.
We're gonna dive right in. And what I decided is that I had some big challenges. And I had no idea how I was gonna build this thing, so I decided I'm gonna make some rules up for myself. And I had one thing going for me, which is that I had read Pooter, practical object-oriented design patterns or designs in Ruby, and that's by Sandy Metz.
For those of you not familiar, I'm hoping all of you are familiar with that book. If you've not read it, please do read it. So, thanks to Sandy, I knew I needed to make my classes really stupid, so plug and play. I also decided, based on my experience with my legacy code at my last job, that I wanted this application to be 100% test-driven,
unlike the legacy code at my last job. So, I have these standards set in place. And now it's time to get started. So, I have three really big challenges. And the first one is finding a way to obtain the notes
of only the melody from a song's audio file. So that's a pretty big one. Then there's correcting the meow length to match the length of the note in the melody. And the third one is creating a multi-octave library of meows. So, let's get started with that first problem, the melody.
So, it's pretty easy for a human, especially one with any musical training, to pick up the melody of a song. So, the melody, for those of you who are not musicians, the melody is the principle part of the song. So, like every song you hear on TV or the radio is a polyphonic song, which just means that there's more than one note going on at a time. But if you were listening to Bohemian Rhapsody by Queen,
the melody would be the part that Freddie Mercury is singing. So, this is where the bad singing comes in. So, the melody is this part. It's like the, Mama, just killed a man. Or the, I see a little silhouetto of a man. So on and so forth, that's the melody.
But, you know, there's harmonies and bass parts, and I was like, I don't want all those things muddying it up. I just want the melody. So, what did I do? Well, compared to a human brain, the computers are pretty unintelligent. We have to tell them everything to do. So, writing an algorithm that a computer can understand to extract a melody is incredibly complicated. And maybe you're wondering if I wrote one.
I did not write my own algorithm. I did not even try. I did what most programmers do, and I googled the dark depths of the internet until I found something that I thought could work. And it was so difficult to find. But, I finally did find something. The first tool I found is a tool called Sonic API.
So, it offers professional-grade audio technology and high-quality world-class algorithms, and it was free up to a certain point. So, I'm like, okay, I'll give this a shot. So, I have this simple song parser class, and inside this class, I have a parse method. So, all I needed to do was pass the proper params
through an HTTP call to this API, and then the song file and my key and all that good stuff, and it's supposed to extract the melody for me and sends me back a response. So, this is an example, and I realize how small that is. I don't expect you to read it. This is an example of what the responses look like, except imagine literally hundreds
and if not thousands of lines long. So, this is just a collection. And this actually happens to be the first few notes of the Game of Thrones theme song. So, let's take a closer look at one of these. So, this is what each of those pieces of information has in it. It has four pieces of data per note. And the MIDI pitch here maps to the pitch of a note.
And because, so I'll get into MIDI a little bit later, but because only whole MIDI pitch numbers map to what we think of as standard notes on a keyboard, I had to round the note up or down before I could map it. So, this MIDI pitch would be rounded up to 36. So, let's see what that gets us in MIDI mapping. So, MIDI, which is short for
musical instrument digital interface, that's correct, yes, is it's a technical standard so that like all electronic musical instruments adhere to the, from different manufacturers, adhere to the same role and communicate with one another. So, 36 lands us on the C, and this C is two octaves down from middle C for those of you that play the piano.
And here's what that looks like on a keyboard. So, I don't, you may have seen me playing the piano this morning before keynote. That was me. I like the piano a lot. I know nothing about electronic music, so this is all brand new to me. But what I ended up doing is that, like I'm really lucky because MIDI is standardized, right? So, I just went ahead and I made them constants
in my note converter class. So, this comes into play after I've parsed my song. So, here's again MIDI note 36 in its corresponding pitch, C2. And time for a butt scratch break. We've been talking for a little while. All right, you guys aren't scratching your butts.
There you go. We'll move on. So, see that there's an append array method here, append array with note method. So, what that method does is it adds the standard pitch to each line in my collection using those constants I just showed you, so the C2.
And so, I'm talking about this hash here. So, what this class does, this method, it adds the note to the end of here. So, now instead of this really long collection with only four pieces of data per line, there's five. So, what's next? Well, if you're looking at this, you might think,
okay, well, we've got the MIDI pitch down, what's next? Maybe onset time or duration, and you would be correct. So, I need to correct the meow length to match the length of the note in the melody. So, a melody's gonna have notes of varying lengths, right, like when you think back to Bohemian Rhapsody,
Freddie doesn't just hold each note for like a half a second each. Some of them are like a quarter of a second or maybe a whole second. It's not like, mama just killed a man. It's mama just killed a man. Some of them are longer, some of them are shorter, but I can't have like a meow folder
with files of every conceivable length. So, instead, I had to find a tool that could either cut or extend a meow to fill the proper amount of time. And, like most programmers, I'm like, I'm gonna see if this already exists. And I found, you know, a little bit of Googling, found a really great tool that I'm sure a lot of you have already used.
It is called FFmpeg. It's been around for like two decades. And the only problem with FFmpeg is that there's almost too many options. Like, I spent an inordinate amount of time Googling, like, how do we do this, how do we do that? But you know what, I finally got it working. It took a really long time,
and so this method right here is embarrassing because it's very long and messy, and I did not refactor it, and it's been two years. So, like I said, it took a long time to write, and I was like, oh, I'm done. But let me walk you through it. So, what happens is we pass in the parsed song, and remember that collection that the API sends back,
and then we append that one right here? So, we pass that in, and at this point, I need to tell you that there is a library of meows living in my application. We'll talk more in depth about that soon, but all you need to know right now is there are approximately 88 short audio files, each with a meow, in a different pitch like the ones you'd find on a piano.
So, this is the part of my code that creates a meow with the correct duration. And this first piece of logic, the if statement, shortens a meow file while the else statement lengthens one. And, you know, short notes are pretty simple. When the length of the extracted note was shorter than the library file,
I just make a copy of that file, and I trim off the end of the file to get the correct length. So, it looks a little something like this. So, if the duration of the note is less than the length note, then you adjust it. So, we've got a note duration of only 0.48 seconds, while the note to adjust is a whole second. So, all I do is I trim it down,
and actually, those don't actually match. Well, imagine they're both the same. You guys have great imaginations, right? So, ffmpeg gives me the ability to do this pretty easily, granted I passed the correct arguments in. It gets a little more difficult with the longer notes,
and there's so many different ways I could've done this, but the first way, and the way I'm about to show you, what I ended up doing was, if the length of the extracted note is longer than the length of the meow in my library file, I keep duplicating that meow file
until it's longer than the extracted note, and then I combine them to create one audio file, and then I shorten it to match the length of the analyzed note. And so, I will show you guys all this in a second. So, this is the logic for the long notes. These are some of my private methods, and it's actually one of the most interesting parts
of when I was building my application, and it's probably the first part to go when I start refactoring, because I've decided to do this another way. But currently, so the note length is 2.46 seconds, and the file length is only one second long, so first I find the number of loops. I take simple math, so take the note length
divided by the file length, which is 2.46, and then I use the seal method, which returns the smallest integer greater than or equal to the float, which is three, and then I use ffmpeg in this method, and I loop my file three times. I combine it, and then I crop it and save it.
So, I mean, that's one way to do it, right? Oh, my GIF, this is that GIF with Shaq, and he's doing the wiggle, and it's supposed to be full screen. That's okay. The other way that I thought about doing this is like, oh, what if instead, so right now, my meows are, if you go back here,
so there's actually three meows when it's only supposed to be one meow, so the next step would be to cut out the middle of my meow, and then divide this meow in half and basically fit the middle of the meow
and extend it until it's the right length so that you have the beginning of a meow, multiple middles of meows, and then an end of a meow. Currently, that's not how it works. But all those files that I make get stored in an array, and this last part is the step where I combine
all the files together in my song builder class. I don't expect you to read through all this, but it works, trust me on it. Final words, right? So this last part, this is the last issue I had to solve,
is my meow library. So I had to create a multi-octave library of meows, and that's a lot of meows, if you hadn't noticed, and very few cats meow in the base baritone tenor range, and so I already Googled, there's no such thing as a cat choir, so I had to figure out my own custom meow library
using some interesting tactics. So the first one started out with me sitting at my piano, playing a note, and then trying to sing it, and I'm always a tiny bit flat, and my piano is a bit flat because I need to get it tuned, so I was like, well, this is not really cutting it,
and I was like, okay, well, what's next? So then I got one of those tuner apps on my phone, and I start singing into my laptop's microphone with this tuner app going, I'm like, okay, I'm gonna do this, meow, meow, meow, and I'm like, okay, this is gonna take a while, because, you know, I can't get it on the first time, like, I'm always flat, I promise you, I'm always flat.
So I think I got like half an octave's worth of notes before I realized it wasn't cutting it, but it was enough for a proof of concept. I only recorded five notes, but I knew a song that had five notes in it. Meow, meow, meow, meow, meow, meow, meow, meow, meow,
meow, meow, meow, meow, meow, meow, meow, meow, meow. So my test passed, it works, all right. Obviously, this wasn't gonna cut it, I need more than five notes, so I'm like, okay, back to the internet,
and like I said, I like taking the easy road when I can. So I finally found a few octaves of an auto-tuned man meowing on a Freesound website called Freesound.
Who woulda thought? And here is that first library. So if you remember, a keyboard has 88 notes. This is not 88 notes right here. The number of files in this Freesound collection only spans just about four octaves.
And I was like, oh, well, I'll give this a shot. I mean, it means that I don't have to make my own library, but I had an issue. What was I gonna do if one of the analyzed notes fell outside the range of these notes here? So I had to write some workarounds like you do when you're a dev.
So I had to write it, I had to do an octave modifier, so if a note is too low or too high to fit into the range of notes that I have, then I adjust the octave. So if the note is F7, which is like a really low F, but I happen to have F6, which is the octave higher, then I would adjust it up.
So I was like, okay, that works. But there was another problem with the library that I downloaded. So each note has a pitch and octave designation, and you'll notice some of these notes have a hyphen in there. Well, that actually is their sharp designator, so it'll be like F sharp six. Well, if you remember, I already wrote my constants,
and they used the sharp designator. So I'm like, well, I'll just write another method to patch this in so that I don't have to rename every single file in my meow folder. Okay, so I write a formatting method, I'm lazy, and let's see what that gets us. Meow, meow, meow, meow, meow, meow, meow, meow, meow,
meow, meow, meow, meow, meow, meow, meow, meow, meow, okay, he's on, he's in tune. That's what I can say for that. I was like, I took all that time adjusting my app to work with that library, but I still wasn't happy with it. So I bit the bullet, and I found some meows online,
and I decided to auto-tune 88 meows and make my own cat library. And so now I have a cat meow library spanning almost the whole keyboard. And what that means is that this method that I wrote to format my notes, well, I can get rid of that.
And this octave modifier that I wrote to modify the octaves, well, I can get rid of that. So cleaning up my code, I love deleting code. I think everybody likes deleting code. Because now I had a full library of notes, like, way better.
So, I mean, that was my three big problems, right? Like, now it's the moment you've all been waiting for. But first, so this is the Game of Thrones theme song, just to remind you, in case you forgot. And this is what we're going for, okay?
You ready? Okay, we're ready?
Can you turn it off a little bit so everybody can hear my failure a little bit better? I mean, it goes on for five minutes.
So I had some problems. There were, that doesn't sound like the Game of Thrones theme song. I mean, am I wrong? So it turns out that the melody analyzer I used, which was free, you know, wasn't actually world class.
So I was like, I'm gonna try this again. So I found something called Melodia. This guy had written his PhD thesis about melody extraction, and I'm like, finally. And he developed some free software and a command line tool, and I was like, yes, I'm gonna do this. So I had to actually sign a release form so that the university where he did his research
would release me, like, release the software to me. And this is the release form. I'm not sure if they ever read it. Like, I don't know if they read it because they actually sent me the software.
And it was actually really simple. Like, this is my command in the command line. And like, so I was just testing it out, playing around with it, and it ran really fast. And I was like, awesome, yeah. So I have it like save files directly to my desktop while I'm testing it out.
The difference between this software and the previous tool I was using is that this one returns a MIDI file. So I need a new class that can translate MIDI files. So before I write this class out, though, I wanna see what this sounds like. So I go to like some free, like, MIDI analyzer online. And it turns out there might be some issues.
So this is actually Melodia in action. So this is jazz on the far left,
pop in the middle, and opera. And so the blue is the frequency that the computer estimates the melody to be, and the red is the actual melody. So if you can see, it's really good at jazz and really bad at pop and opera, which is really sad for me because it turns out it wasn't even gonna be worth my time
to add this to the app. So I'm like, well, I'm not too happy. I decide to go back to Sonic API, like the first one. And I mean, I'm stuck with this. Wow, wow, wow, wow, wow, wow, wow, wow, wow, wow, wow.
Yeah, yeah, is it perfect? No, it's not perfect. It's far from perfect. So it's a side project, right? Like, I'm not getting paid, there's no due date. So like, I'm gonna keep working on it. And like, I have some ideas. Like, what's next? I mean, I still haven't written a client side for it. It was all command line.
I wanna find a better melody analyzer, obviously. But what else could I do in the meantime? I could pivot slightly. Have you, you've probably all heard of Shazam. It's like a song identifier app, right? So what if I take a reverse melody analyzer, similar to Shazam, to figure out which song
a user has uploaded to my website? And then once I find that that song's name, I scrape the internet for a midi version of that song, and then turn that into a meowified version. Well, I haven't written that yet. So it's my next thing on the docket. So next time you see this talk, maybe I'll be done, and maybe I won't.
But in the meantime, like, you know, it is what it is. And I've had a great time talking about it, and building it, and sharing all these cat GIFs with you. So you can find my slides on speaker deck, and I will be tweeting that link out soon.
And my handle is hobartdashery, all this is here. And with that, I am done. Thank you so much for meowing with me.