We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Music Theory and MIDI Encoding

00:00

Formal Metadata

Title
Music Theory and MIDI Encoding
Title of Series
Number of Parts
19
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Charlie Gillingham (keyboardist for Counting Crows and computer scientist) talks to us about music theory, MIDI, and encoding for NuPIC.
Codierung <Programmierung>MIDISpring (hydrology)ResultantDisk read-and-write headBit rateMIDISpacetimeComputer fileProcess (computing)InformationVideo gameRevision controlMereologyUniverse (mathematics)FrequencySoftwareMultiplication signQuicksortSurvival analysisExecution unitPositional notationSet (mathematics)Partition (number theory)WordSearch algorithmCommunications protocolWage labourVirtual machineString (computer science)Volume (thermodynamics)BitMathematical optimizationRight angleLink (knot theory)Product (business)Personal area networkPhysical systemAudio file formatFlow separationTerm (mathematics)Power (physics)Plug-in (computing)Time zoneAngular resolutionTwo-dimensional spaceVideoconferencingNeuroinformatikMathematicsElectric generatorSigma-algebraOnline helpLecture/Conference
Physical lawOptical disc driveInformation securityService (economics)Standard deviationMultiplication signState of matterNormal (geometry)Validity (statistics)Volume (thermodynamics)CASE <Informatik>VarianceFrequencyCuboidUniverse (mathematics)Right angleMereologyObject (grammar)Level (video gaming)MedianSet (mathematics)Insertion lossMedical imagingMathematicsPoint (geometry)Video gameRadiusArithmetic meanNumberWater vaporReverse engineeringRegulator geneComputer fileTheory of relativityTerm (mathematics)AreaRegular graphSquare numberMoment (mathematics)Real numberCombinational logicUtility softwareDistanceQuicksortEvoluteInstance (computer science)Chord (peer-to-peer)Integrated development environmentDifferent (Kate Ryan album)Touch typingMusical ensembleMIDIRow (database)Form (programming)NP-hardString (computer science)Plug-in (computing)Solid geometryKeyboard shortcutRepresentation (politics)Beat (acoustics)SoftwarePower (physics)2 (number)SpacetimeFlow separationTrailAudio file formatSigma-algebraCombinatoricsMessage passingDomain nameWindowVelocityData conversionException handlingEvent horizonSelf-organization10 (number)DialectLecture/Conference
Source codeSelf-organizationNumberAttribute grammarMultiplication signGame theoryRight angleNeuroinformatikCuboidEvent horizonChord (peer-to-peer)Process (computing)MereologyCodierung <Programmierung>Pattern languageDifferent (Kate Ryan album)BitHarmonic analysisObject (grammar)CASE <Informatik>Integrated development environmentScalar fieldSoftware bugStreaming mediaRange (statistics)MIDIView (database)Distribution (mathematics)Hydraulic jumpDomain nameTheoryDimensional analysisPoint (geometry)Representation (politics)Sheaf (mathematics)Flow separationMusical ensembleCore dumpDistanceStatement (computer science)Axiom of choiceSlide ruleMedical imagingGraph coloringSphereVideoconferencingGreatest elementPhysical systemFigurate numberWordEndliche ModelltheorieObservational studyCovering spaceWave packetConnected spaceAreaScaling (geometry)Decision theoryMathematicsStudent's t-testMathematical analysisLecture/Conference
Curve fittingSeries (mathematics)Workstation <Musikinstrument>Classical physicsFood energyHarmonic analysisString (computer science)Set (mathematics)ConsistencyDisk read-and-write headMultiplication signRight angleMereologyMIDIData managementDifferent (Kate Ryan album)State of matterAreaMedical imagingWord1 (number)Key (cryptography)NumberCategory of beingMachine visionSocial classInheritance (object-oriented programming)Parameter (computer programming)Sampling (statistics)Similarity (geometry)Special unitary groupExecution unitSoftware maintenanceProcess (computing)Perfect groupPhysicalismSummierbarkeitTerm (mathematics)Universe (mathematics)CASE <Informatik>Rule of inferenceInternetworkingTheoryFrequencyQuicksortAuthorizationStandard deviationVarianceMatching (graph theory)Software developerVirtual machineWaveTheory of relativityLevel (video gaming)IntegerRow (database)TriangleDistance2 (number)CurveStudent's t-testPoint (geometry)Mathematical analysisFrequency responseDomain nameComputer fileSynchronizationReading (process)OctaveCharacteristic polynomialHeegaard splittingLogical constantMathematical modelLine (geometry)Doubling the cubeBeat (acoustics)Sawtooth waveCodeLecture/Conference
Pattern recognitionSymbol tableMultiplication signComputer fileSummierbarkeitTime seriesLevel (video gaming)Social classContext awarenessIntegerResultantDegree (graph theory)Extension (kinesiology)Video gameDifferent (Kate Ryan album)WeightEndliche ModelltheorieMetropolitan area networkWhiteboardRight angleProduct (business)MereologySet (mathematics)Distribution (mathematics)3 (number)Computer programmingState of matterBlock (periodic table)Special unitary groupCASE <Informatik>Natural numberMathematicsTheoryParameter (computer programming)Execution unitDependent and independent variablesFunctional (mathematics)Vector spaceBitData structureoutputDemosceneLine (geometry)Process (computing)Lattice (order)Field (computer science)Task (computing)Game controllerNumberCategory of beingQuicksortAreaMIDIInternetworkingSimilarity (geometry)INTEGRALBasis <Mathematik>Moment (mathematics)Software-defined radioArithmetic meanPhysical systemChord (peer-to-peer)ConsistencyLogical constantCombinatoricsDomain nameClassical physicsOctaveDatabaseSoftware development kitDrum memoryCodierung <Programmierung>Dimensional analysisOffice suiteKey (cryptography)Sheaf (mathematics)InformationTheory of relativitySpiralHydraulic jumpGraph coloringTwitter10 (number)Fitness functionElectronic mailing listInstance (computer science)Goodness of fitPoint (geometry)Lecture/Conference
Scaling (geometry)Chord (peer-to-peer)SubsetSocial classSpacetimeMIDIMereologyComputer filePoint (geometry)Codierung <Programmierung>Term (mathematics)Different (Kate Ryan album)Set (mathematics)NumberPattern languageRange (statistics)Multiplication signMathematicsVirtual machineFood energyWordRight angleMetreKey (cryptography)BitCombinatoricsData structurePositional notationNeuroinformatikConsistencySheaf (mathematics)Combinational logicElectronic signatureHidden variable theoryDialectRule of inferenceInequality (mathematics)Student's t-testPresentation of a groupObject (grammar)Variable (mathematics)State of matterWave packetTouch typingRepresentation (politics)Process (computing)ResultantParameter (computer programming)Menu (computing)WhiteboardBit rateMathematical analysisLecture/Conference
Shape (magazine)BitNetwork topologyCodierung <Programmierung>Point (geometry)MIDIMetreOperator (mathematics)Group actionVariable (mathematics)Right angleScaling (geometry)Envelope (mathematics)MereologyExtension (kinesiology)Combinational logicVirtual machineHidden variable theoryKey (cryptography)Beat (acoustics)InferenceCharacteristic polynomialVelocityNormal (geometry)MeasurementTheoryRange (statistics)SpacetimeStreaming mediaNumberBoom (sailing)Machine visionScalar fieldCellular automatonMultiplication signSampling (statistics)Chord (peer-to-peer)Inheritance (object-oriented programming)OctaveCASE <Informatik>Food energyPower (physics)BuildingWordComputer-aided designRandomizationBasis <Mathematik>Game theoryState of matterAreaField (computer science)Service (economics)Gradient1 (number)Associative propertyCore dumpData managementSymbol tableResultantFeature spaceLecture/Conference
MIDIMultiplicationMultiplication signMetreMathematicsNumberElement (mathematics)Functional (mathematics)Beat (acoustics)Variable (mathematics)MeasurementLengthRight angleHeegaard splittingHierarchyExecution unitQuicksortPhysical systemLevel (video gaming)Cartesian coordinate systemImage resolutionP (complexity)Axiom of choiceMoment (mathematics)Streaming mediaCodePoint (geometry)Codierung <Programmierung>Process (computing)WritingState of matterCategory of beingCellular automatonCASE <Informatik>MereologyOnline helpUtility softwareSource codePresentation of a groupStudent's t-testInteractive televisionWordSet (mathematics)Water vaporView (database)Sheaf (mathematics)InternetworkingGroup actionIntegrated development environmentUniform resource locatorExtension (kinesiology)SurfaceSampling (statistics)Physical lawLecture/Conference
Spring (hydrology)TheorySicCodierung <Programmierung>MIDIComputer animationDiagram
Transcript: English(auto-generated)
Yes, we are So I'm going to introduce Charlie here Professional musician and you have a bachelor's in science
Intelligence Well, I'm just gonna talk about Music and audio and this is totally my take first. I'm just going to say
How I look at the problems associated with music, okay, so I'm gonna grab a pen you can erase all that All right, so You've got an audio file, right? And then you've got a MIDI file
And MIDI you all I'm certain you all know it's just a version of Western musical notation It's taking audio and thinking of in terms of notes notes that start and stop right Which is basically what Western musical notation puts in and it's got a lot of other stuff That's not interesting for today. Like it's got tempo and
Pitch change and so on but okay So if you've got a MIDI file Right the information you can get out of a mini file the things you can do with the mini file is first you can produce You can produce helpful things for for video and audio editors, right?
That's totally solved, you know, solved in the mid-80s right when they would start a Pro Tools and added and so on They can take MIDI files and allow you to change the sound of music Okay, so the next thing you might want to do throughout make an audio file, right? Now to make that to take a MIDI file and play it out into audio on a computer and to do that
Well, it was only solved around I'd say about 2,000 and that was solved by the companies that make software instruments Right and before 2000 there weren't software instruments that could actually convincingly Produce orchestral sounds or piano sounds and so on and now that's been solved
There's a there is there is software that you can buy that will produce music that sounds good and films if you've seen their scores oftentimes are done using something like that the Vienna strings which is made by ilio, which is a Plug in it's very expensive, but it but it sounds like an orchestra when you when you play. Yes convincing
So this is solved. That's good. So that's this is this is mini to audio, right? Okay, so what else can you do you can you can Using MIDI information you can you can run machine learning for?
For people who want to classify off audio files, they'll be like pan pan doro or shazam these kind of companies The what they need is they need to know what's in that audio file up there They need to extract features from that audio file and then they're going to do machine learning on that and base their recommender systems on them
This would this would be the MIDI data would be great for these guys, but the problem is there's there's not enough Files so it's a spares data problem and the missing link is audio to me right The this problem is like almost solved being solved now
There's there's companies that make products that do really bad Unreliable audio to me, you know, but you can't just take you know, Beyonce file come back with Reliable you might be able to get the tempo, but you won't be able to get any reliable information about the the
harmonically what it's like or how exciting it is or Any feature that you might want to extract or the Pandora might want to what might want to extract? So this this is kind of an open problem. It's like half-solved What else oh, well what I'm working on is, you know using machine learning to compose music
right And this is not solved and I think this is what Matt's interested in to take a mini file and you know Listen to music listen to 500 Bach pieces and then come out with some Bach. It sounds like Bach and it's recognizable as Bach
And that's that I think is what Matt's been working on is unsolved I'll talk about that a little more in a minute I want to talk about audio to audio to mini a little bit because I think this is the problem I would love one of you to work on I'm not qualified because I don't know enough about audio signal processing and there's a lot about audio signal processing
The reason I think that this is a good problem for a new pick There's several reasons first is that there's a lot of data There's all those audio files out there and I can produce sets of audio file mini pairs I can produce 10,000 of those by using my music generation stuff and creating audio and MIDI
This is a like I have a drummer they could play for hours and hours and hours and days and days and days And if you could turn what this drummer plays the audio back into the MIDI file, that would be awesome So first of all, it's got a lot of data There's available data for the problem. There's But the space of data is you know kind of well-behaved it's a measure space with frequency in the harmony
And I mean, I'm sorry. I got it. You know, it's just frequency times and time, you know, it's a two-dimensional space That is well-formed Also, I think it's very much an early 21st century problem a machine learning
data-based Mathematical optimization is appropriate. You know, it's kind of like everything that everybody's working on right now It's this is one of those problems And because it's almost solved there's a lot of papers out there and there's work out there and there's things that are and the power zone and
so Anyway, so I think that's a good problem. I'd love to talk to somebody about that if they're interested at all You know how good the state of the art is and if you know, let's say, okay, let's say it's only grunts Okay, and it's a clean environment Sure, so Yeah, you can do that. It's in a typical Recording session you're recording a new band and the drummer is not great
And it's really important that the drummer be great without a great drummer. There's no salt, right? So there are plugins that you can take in and they will find all the note on sets Map them to the different instruments that are available on the drums and then you sit there in Pro Tools and slide them back Ask Pro Tools to slide them And that is this right so they can take an audio about Bing and it comes out on this other track in
One audio files with MIDI on sets is basically a mini problem and then there's another company Melodyne melodyne can take a guitar a guitar track and turn it into the notes But it's not very reliable the onsets aren't great
The onsets aren't exactly right and thing is with music with at least recorded music on a high level You know five milliseconds a lot of time, you know, five milliseconds is the difference between something seeming sluggish or seeming peppy You know, it can't be off by 30 milliseconds because it's going to sound out of time. It's gonna sound either clumsy
Or it's going to sound lazy or it's going to sound heavy or light or silly, you know It's just you need to get it right in that 10 millisecond window when it was supposed to happen and the waveform For first for a musical sound it's kind of like it goes like like that, you know
So you got to figure out where is the beat right there, you know, and this is like 30 milliseconds, you know, so you got it. You got to pick this spot or sometimes it's it's this spot You know this really, you know this interesting spot right here. That's where the beat started. This was this You know and this is the power
So, anyway, so that's all you and MIDI, maybe we'll just talk about it and I'll talk about this the composition is interesting because There's several problems with it, I think, for new pick and we'll see what happens but The first one is that musical data, which is the MIDI
It's not a measure space. It's a it's it's it's a combinatorial domain, you know You're talking about there's so we'll talk about MIDI data percent any more space So MIDI data, there's 127 notes The MIDI keeps track up, right and then
In terms of time MIDI's in clock ticks There's some number of clock ticks the least you could get away with is probably 24 clock ticks per beat Right, and then that's that's a beat and then events start right this note starts on a beat So this right here you get a note on message
Right and That will be associated with certain velocity Which is connected to volume and then the note stays on for a little while and there's a note off message Here, right And I was going to show you guys This is Bach's Prelude and C major
Written in that form But Hard to see But I can't show you that's prelude and C major written just like I wrote it right there with You know, that's the first movement. That's the second movement
Anyway, so you've got notes starting and stopping and they make the regions of sound, right So, yeah, I was saying it's combinatorial domain so you're talking about combinations of 20 you know what I mean? So there's so you want to ask yourself a question like well Is this a major chord or a minor chord? Is it happy or is it sad? What's happening right here? And
You know, it's a major chord if it has a certain Relationship between these things called an intervallic relation or the interval You know if this if these are certain integers, right? This would be four and this would be three and that's a minor chord, right?
So if that means that this is a major chord, this is a minor chord. Sorry, so This is where it gets interesting. I think well, yeah We're talking about encoding mini solid music. That's what we're headed to. I'm headed there. I'm headed where you want me to go. Okay
Yeah, so so we want to ask if this is major or minor It's a feature we want to take out when we take out whether it's mostly major or mostly minor Are those binary states it's on or off or is there like a volume there is that's the velocity number and for me I think it's better to to have these be zero the guys who aren't playing Right and this guy to be his velocity, which might be 78, right?
Yeah volume well how hard technically translate to intensity so for some instruments like trumpet, you know If you hit a trumpet hard The timbre changes, right And say with an electric guitar hit it hard the timbers different, you know with a with a like electric organ doesn't matter
There's a volume pedal. So if MIDI has another thing for volume, but for our purposes today, I think this is enough, you know So do you have off the touch to use off the touch on the keyboard?
I To use after touch on keyboards a lot. Do I use it? It's a good feature for doing string arrangements I have used it on string arrangements because really really high-end musical Software instruments will be able to react to that. I'm just a various ways But there's no standard there after touch to mean almost anything but that it could burn you over time
Yeah across the lyrics. I think we should ignore As far as the messages conversation everything except note on note off Yeah, that's right. I always think and then we can talk about the additional features once we I'm using lots of other features You know, there's all these other signals which many is keeping, you know, they're all digital so they're like this
So there's like volume. There's a Jeez, you know, there's a hundred, you know, there's reverb send there's you know, there's there's hundreds of them And actually there's there's tens of thousands of them if you if you know how to work your MIDI So you can run any kind of signal running along here what he just talked about after touch is the only one of these
Other signals that actually goes with the note So, I don't know, you know that you can have after touch and different after touch for different notes So that's what makes it unique. It's no face, but that's the only one that you get that way
so I wrote He put it up here I wrote something that takes a MIDI file and turns it into this representation Just for now if anybody wants to work on it they can change it but it goes into a numpy Array, right and then by array that's 127 by the number of ticks and I handle all the weird stuff like most files have
480 Ticks here and I cut it down to 24 And handle the case where there's two things in the same Take it all that stuff, you know stuff that you can cause the bugs
And so that's available and I guess I guess I'll take Questions you want me to talk about more music theory. So what I want to talk about is the why? Why it's not good enough first of all, so so everyone understands why it's not good enough to just Encode these notes as scalar values and let the scalar encoder that we already have
just do it do its job because The only example that we have so far is the first one somebody did a long time ago Pomp and Circumstance and it worked really well and understood the notes, but that it wasn't music It's not the way that was represented was not music and I and I want to understand how we can represent the difference Between a major chord a minor chord or how the harmonics are the dissonance
We represent that in As far as representation. That's what I don't understand. Sure. These are features. Right? These are features of This basic musical signal. The first thing is you were kind of talking about a little bit is is voice leading So you have a note that ends here, right?
And you have another note that starts maybe here Or it could start here right Now the ear will make these into a melody right it goes up and down it's scalar like you said it's one dimension So you have to solve this problem this problem
Picking the notes out of that representation to figure out what's the melody, you know, where's the melody? It depends on a lot of packages, you know, it depends on where you expect the melody to go basically You know and what you're what you're paying attention to what you're listening to even depends on how hard it's hit
That's a good point. Like when there's several notes on in a mini stream at one time. What's the importance? How do we break? What's what notes are important for the melody? How can you pick out the melody? Well, I mean you could do a really bad job of stepping out the top. Why yeah That won't work in general, but that will actually work for more things than you might expect, you know
Really it's solving the voice leading so like the box prelude and pew there the pew the second part I can't really show you but it's got it's got You know, it's got several melodies That go like this, you know, they generally stay in there in their ranges, but they occasionally do things that are weird
You know, so it's got that and each of these is a melody. There's four melodies, right? Your ear solves all these problems One of the magical things about Bach is that he was able to keep these so that your ear always follows all four melodies
You sit and you listen for one of those four melodies. You can hear that one melody and your ear your brain your Auditory cortex is solving this problem, you know the voice leading problem So that's that you know, so that even when you know, you've got this guy He's jumping, you know box set it up so that your brain can solve it
So two voices jump over each other right box so good at it Partly because he can set up situations where your ear says oh that went down there and this one went up there Even when he's playing them on an organ where there is no distinction between these sounds if they're singers you can kind of hear him breathe or him swing up a little but
You know on a pipe organ, you know, all you've got is the timing timing is your only clue that this melody went down there So that's the whole voice lead problem. I mean, this would be the the hardest Problem would be taking a four-part fugue. That's turning into four melodies But that's something that anybody can do instantly
I think everybody can do it just sitting in front of the speaker You can you can hear that one. You can hear each of the four melodies But that leads to the question of the can I do that instantly because I've heard music before or could I do it instantly? Little both I would say because the the auditory cortex evolved to detect
Sound making objects in your environment and to divide those into separate sound sources Right and to try to deduce the character of each sound source in the environment So when your brain hears a four-part fugue, it's trying it It's not it's trying to hear that as separate events coming from separate objects, which might be threatening or friendly
You know what I mean? And so that's that's a very disturbing yet. So couldn't you gather clues by for instance? If you're if you're looking at two sources that you're trying to discern melody from harmony One would be how how frequent the notes change in both of them because harmony tends to
To be less frequent in the in the number of notes for one, right? harmony also Tends to show up and go away show up and go away Whereas the melody tends to stay constant so they're they're like clues that you can gather from these different attributes
Right to to discern the melody from the harmony, right? So if you've got somebody's going you know, you got somebody's going this on You gotta plug it in. Oh, I gotta play it by game because I turned around a computer
So you got one melody that's so you got a melody that's going Right But but the chords there's a there's a whole other section in the orchestra the band is going Don't don't don't don't don't don't with this chord, right? You're not going to mix up those two voices
Because here you go because because you know, this is this guy's in pop music, of course you always have Somebody doing this and that's like so so so You know that that way you can always separate out of the voice, but still from computer's point of view
You know, he's got it he's got to figure out all these door notes these guys are all in the same range They're all close to being attached at the end, but not necessarily attached And they're they're doing rhythmic things which are consistent, which is what you were saying They shall be kind of harmonic things that are consistent like doesn't make any sense that a melody would jump
From way down here to way up there. It always jumps within a certain range, right? So there's a distribution over the Possible jumps, you know like this, right? So this guy's doing an unusually large jump and this guy's doing a perfectly expected jump, you know, that's that's pretty normal
so we can guess that that's that that's That's what's going on so that's the voice leading problem Another question when we think about encoding that comes up in other domains is what is similar You have you know, you have the minor chord there, right? Could you have other patterns that are look very different
But actually musically similar And you know, how do you define similarity? Yeah, it's like augmented seven minor chords Yeah Yeah
Right, right and then maybe it'd be interesting to see what happens if you play a full recording
Right, that's a well-formed problem. That's it. That's it. That's a good problem, you know, and that's a good solution that solves the voice But that's an approach to solving the voice thing problem you get a MIDI file You don't know what the melody is and then that could be a way to figure out what makes the melody It's also kind of machine learning ish because you're not giving it about your rules, you know
based on classical music like you know what I mean, you're just saying Melodies that have no relation to each other Well, that's how we learn music anybody who's taking an instrument starts with twinkle twinkle little star, right long Well a classical instrument. I started married a little
Oh snap music burn That's the same. Oh, that's just many in code. What is does many give you I'm not familiar with the mini format, but I have been a musician, but Does does mini give you the Hertz of What the Hertz frequency frequency? No, but it's easy
But it does sustain a note number. Imagine it is an integer between Yes, whatever the pitch right because because Sympathized question was about encoding and similarity
so you could basically assume a key and then get the variances of each note and and encode those notes and you can determine similarity by shifting notes into the key that you're comparing and then and then
You know and then gathering your similarity between the notes given the the forced key that you're putting everything in right, right? Yeah There is a problem that if you're going to take your MIDI files and you're going to do anything to them You know if you're going to have to go through and pick out the melody You have a data problem because you're going to have to go do that
You know, that's gonna take a long time. You can get a hundred done in an afternoon, you know, and it's gonna You know, we've already got I brought with me a several hundred MIDI files But just digging those up and preparing them, you know for anything other than just
You know running the new pick on them straight. We will take a lot of work I want to define a simple problem and maybe it's as easy as saying We're playing something simple like Mary had a little lamb if you're playing it in chords, not individual notes, right? And then you hit a wrong note somewhere. So there's suddenly dissonance where there should be harmonics
That if we could identify What's it required to or new pick to identify that anomaly that would be really useful Well, that's the next thing I was going to talk about was dissonance and constance. Is that what you're thinking about? Yeah, okay. The eraser is that one thing. Yeah, that's the eraser. This is an eraser. It looks like a spaceship.
Okay, so dissonance and consonance Okay, well, let's talk about let's talk about harmonic harmonicity, right An audio signal is harm. It has harmonicity if it looks kind of like this Right if there's if there's a lot of energy at what's you know?
What we musicians call the tonic and then what we call the first harmonic and the second harmonic, right? And these are integer multiples of that. Everybody knows this, right? No, no, no explain this Everybody knows the opposite of what you just said. Did I just say it wrong? We don't know this. We don't know this.
Okay, is this a what is this? Frequency response error, I'm sorry frequency response curve to set right Forgive me if I use the wrong physics name for something. So One point in time one note struck right the analysis of that. Yeah, right. So this is a wiggle
I've got it. I had it all set up on here, but they don't think I can show it Just chose and say I didn't have it. I should also the first thing your year does Right. Okay. Yeah, so let us talk about our own exam So yeah, what makes it a sound harmonic is that it's got these
peaks in it, you know that thing that represent the thing and then the timbre which is the quality sound whether it's You know, whatever makes the sound that depends on how high each of these are relatively Simplest sound is a sine wave Which looks like that right all the energies at the at the time, you know, and then a square wave or triangle wave
You know, they have different, you know, it's square waves all the odd-numbered ones a sawtooth wave is all of them You know how much energy spread it over these? Okay, so that's a Sound that has harmonicity. Okay, and
Okay, so Let's talk about constants. We have another note out here Wait on a similar sound Right. It's this is this is an octave away 12 half steps, which is this
That's not good we interpret that as the same note, so here's another note as the Here's that here. Here it is an octave away, right? And so what happens here is that these these kind of line up and The sum of these two Looks is still very very harmonic. Okay. So if you take it you take a sound
It's just a very close but not close enough to that guy and you add these up You get a phenomenon where the sum of these is kind of like there's this double peaks or something like that and then There's other phenomenon going on in your ear that have to do with you can perceive beats and you're also perceiving the plus and minus
Frequencies of this Which are moving around with the beats and it's theoretically it's very unpleasant You know once you get really close, so these are so the octave was consonant And this is this
Right and that has something to do with It's unpleasant for your ear to try to turn that into a harmonic sound when there's those And like somebody just said the whole job of the year is to detect harmonic sounds detect sounds that are made by That have harmonicity and it does this by having by splitting it into the frequency domain
And then some method or other somebody else probably knows how this actually works But you know some method or other it says well this these these different I'm getting energy at all these different points and that means that all the look right here is a harmonic sound I'm driving
Not this part that is for everybody all around the world they perceive a distance well, okay No, but if we define dissonance in terms of the physical sensation of dissonance Yes, if we define dissonance in terms of prohibitions or or or things that you want less music or not then no
Okay, because like in jazz of course you can go You can't do that You can do that. That's very dissonant, right? But it sounds good. Well, we like it. Yeah, because we're used to right so that's depends on the culture We're used to this set
But it was almost unheard of in Baroque practice, right? So you're defining the physical Yeah, this is a physical characteristic distance and constants is a physical characteristic and What I'm kind of showing you here is Hamilton's theory Of distance and constants, which is pretty much discredited because there's a lot of things It doesn't do that the actual experiments on people where they try to figure out whether they think a sound is constant or dissonant
But this is the but it's kind of like this, you know and the best theories and I have a couple of papers if anybody's interested the best theories have to do with The way neurons sync up You know, I think make a mathematical model of neuron
I don't know what's called you guys probably know but you know when two neurons They like to fire together and they keep firing together They like it and they like it and you have to perturb a lot before they stop, you know And so it has to do with that and you can there there are mathematical models of that. There are better mathematical models of consonants that fit psychological experiments better
That's interested in further reading anybody musician so that's Basically constant distance in a nutshell is that that's really consonant That's very constant That's also very constant That's not really that constant. This is this is dissonant
And this is this right and so first year musical music theory students when they go in class They they they they take the intervals which is you know All these distances in every pair of you know, it's and they give it they say either it's consonant
It's it's either it's unison perfect consonants consonants or distance four levels for an integer Zero to three that you know that and that's all you would need to think about today by the way, that's some of the theory of it like You can see it's a lot more complicated to actually listen to an audio signal and say well is it constant or dissonant?
It's it's actually pretty this is the math there, but You can sum it all up in just these This integer value. I think it's definitely something that needs to be encoded As a part of the math for the mini Well, you have to carry it out. Okay, so
Right. So so here in my major third, right? Maybe my minor chord here. This is seven, right and seven is Has perfect constant because it's an open fit seven half steps is an open fit Which is constant, right? So you'd say if you wanted to try to figure out the constants of dissonance of this MIDI file
The best bet for now is to is to you know, kind of either add these or some knees or I mean I need to sum them or average them, right? You Know and try to figure out what's the total consonants happening now at this moment? So you could have a feature of this mini file where you're where you're tracking the constants, you know
Whatever that comes out to you know, and then it stops right so that would be a feature a feature of the mini file at that point Right. Yeah Talk about voice leading talk about constants. Oh, I want to talk about voice leading as a feature
When you're talking about voicing this feature, you can look at the distribution of Voice leading which is kind of what you were talking about, which is where You can ask Once you deduce the melody you can ask how often you know, does it jump? Right so you can say well, you know, there's a one and then there's two and then there's a five
You know and you can you can come up with a distribution It's like well melodies almost always jump one and then you know, sometimes they jump looking up plus two and sometimes they jump plus two and very rarely do they jump plus eleven, you know, or Maybe that backwards minus eleven plus eleven, right? So
Voice leading is a feature so you can you can extract the feature of voice leading once you've extracted the melody Right and that can be a signal You know, obviously there's no voice leading that again, but right We're not going to be able to know that in the middle of the midi stream unless we have identified the melody
Right. So yeah, I want to make sure I said that was a feature Do we have like of the files that you have that are some files are like this is a midi or this is a melody only file No Right, I think what I did was grab a bunch of classical midi files from the classical midi database
Okay, yeah pretty well organized don't worry about drum kit in there So it seems like if you're actually listening to this midi file that you have a potential of Many on and you know
On notes possible. Don't you even structure an encoder that could possibly Have Yeah Well, each one would be a single bit into the SDR Right every note on here would have meaning and it would buy there be one or zero if it's on
Is your intuition? That you would want to pre-process and like scan for You know interesting intervals and say okay among my whole no I don't think so I think what you do is you set up your encoder with with 128 bits 0 to 127 and that's your your
Input into your HTM, but then you train this system exactly like you would train a child Which is you start with melody lines and then you start adding two notes that make triads Right minor triads and major trends one bit doesn't represent like the octave consonants, right? You could write did you both you can so you'd want to represent I think that's a merge interesting semantic
relationships between your I think you'd want to try to write an encoder that tries to match what you do with the cochlea Okay, so that you know, that's why I was getting to similarity, right? You know so that sounds that sounds similar will have a similar number of bits on
That's one of the basic thing you want to have in any color I wore and it sounds like the cochlea does is trying to do that like I see a different office It's gonna be very similar even though they're very spread apart in a MIDI structure They're going to be very very very assuming some number of bits It says this is like a C regardless of the octave and another set of bits that might talk about which octave
Yeah, right. Yeah Okay, yeah Let's talk about pitch class. Okay, so You know, these are all the notes with all the notes and then you've got this is a C
This is a D. That's a D. So there are 12 pitch classes, right and some people, you know put these in a spiral you know so that you got Here's the C and there's the C below and there's definitely things that depend That if you take this song and you rotate it, right you get something that most people couldn't tell is different
You know what you rotate a little bit almost no one could tell You're talking about playing it in a different key. Yeah, you know all the key is yeah If I change it a little bit, you know You probably if I hadn't started with the one you would have never known I was doing anything different, right?
Yes, well transposing but you know, so so for every note there's a function which gives you the pitch class which is basically to You just mod it by 12 and that gives you the pitch class C is 0 and D is C sharp is 1
Right, so You can take a mini file and you can say, you know what pitch class And that will be another signal. That's You know also also interesting, you know, what pitch class am I looking at? But you have to do it over time, right? You couldn't take one instance in time and all the notes on and identify the fish class. Oh, yeah
No, you're good. You're good, right? You just say, you know, this is this is It's easy This is 120 and you know, obviously that's that the octave is 10 right in this class is 0 so it's a C Right. It's a C in the 10th octave of MIDI
right But but there's but there's interesting information also this way you can't reduce it Yeah, you can't reduce it to pitch classes that you need to know, you know, like just for these these these jumps
You need to know that this guy was here not here. You know, this is 12 below and say You know, I mean that's those different things. They're similar but different so like if I take a voice thing like this And I switch that top one down, okay You get something that's a different quality, but it's still the same
Course so it's the same thing. I want some dimension. So it's the same thing Yes, so there's a lot of different features that you could look at yeah at any moment in time Let's say there's a snapshot in a song You hear that and then you hear another snapshot later in the same song or different song trying to figure out what's said
Whether they're really similar or really different and try to characterize that what are the most important features according to you Of what would determine how similar those two sections are?
Even the court system doesn't have an easy answer This has to do with sampling other people's music and like even the courts have tried to figure out Nobody's similar Pieces sound very similar
So difference Where's the short list of essential features to really capture the nature of a song to be able to prove something? It sounds good That's like the whole thing that's like a question like that you know, we just extract those features in these traditional, you know
It's easy to do the temporal pooled classes SDR bits corresponding to each of the songs So as soon as we solve this problem, we'll have solved the problem yeah, yeah exactly Could you tell? I'm sorry. I didn't mean somebody. Oh take advantage of the
The way the music's laid out the chord structure the difference between those The relative information right well, okay Yes, we talked about pulling consonants out as far as asking you is it a major or minor chord like who's there, you know
What's going on? Combinatorial domain right even if you cut it down to the pitch classes There are there. So I suppose what's playing now. Here's the pitch classes. There's 12 of them, right? So what's playing right now is this? Which is a C major triad right and C major chord, it's this
Right, so that's playing right now But that's that's one of what is it 4,096? Possible things that could be happening right now in pitch class space Is that right To the 12
so I Yeah, you were talking about You've got several things going on here and we want to know if this has happened again Any of these or or you don't classify these you're not going to get all full all the combinations are combinational space You're not going to fill it small because of
We we like So many differences between those right so they're complicated the 4,000 or is is a lot smaller Oh, you see I mean, yeah, you don't tend to play this. Oh, you've got a certain number of You know fingers and stretch as it were it's trying to do these notes
So yeah, you can cut this space down considerably, but in our age of big computers and so on The space is already pretty small, you know for machine learning 4,096 not a big number, you know Possibilities Arrange for a space to happen. I think the main thing I've tried to try to convey is is getting the difference between the nose So going through a difference
So you get first note and then there's a difference So the Delta intervals the Delta of the note number showed in the intervals between right? So we have we have we have we have three notes, you know, and one and two and three, right? And so what we actually look at is n3 minus n2 Right and n2 minus n1
Right, just look at the intervals. Are you talking about simultaneous notes being played or over time? Sometimes first we play like that's like a chord. Yeah. Yeah Which would be like the range No, it would be it would actually be able to actually record the pattern yeah
We go over the time of where it's moved up and down it would Collapse the space so that all patterns that that that are just something moved up and down Would would be the only awkward thing is you might need to define the time
the time of the key of Yeah of that consonants like there's dissonant energy in C versus dissonant energy and The Yeah, that would you would want to know the difference right
Well that brings up another thing Which is that you know, obviously there's a lot of structure in the musical piece For example a composer has chosen to scale You know, he's chosen a secret chord for a particular, you know small part of the music I think every partner these are their subsets of the set of notes. This is a subset of the set of pitch classes, right?
Like usually there's seven might be eight or six and then the court is, you know, some subset of that, right? And so these are these are features of the signal part of the scale is that each scale has a
Has a time, you know each chord has a time so altogether the harmony Well, what musicians called the harmony of a particular bar or the section of a piece Involves two numbers and two sets over the pitch classes, right? Where does key?
This is the key. The time is a key. This is the key Right, but you also called the time of the scale. Sorry ten words for everything, but right there's a key So, you know at a certain part in it, you know the beginning of that piece We already know that we're in Tonic C you know the
The first chord is C major, right? And the scale is C Right, right if it was an a minor You know I feel like I'm not explaining this very well But this is the same. Okay, so a minor this a minor
The scale Is the same notes and it's the same Subset of the pitch classes with a different note chosen as the starting point So is that were you saying like the intervals between the notes of the scale or the same they're just shifted
Well, these are the same pitch classes in a minor. This is there's several minor scales, but this is this a minor That a minor the same pitch classes is The Same pitch classes, the only difference is I approach the chords in it differently because I
Because it's a minor Does that make sense sorry the key signature is the same there's no charge of that Okay, but the point I wanted to make it is the most important point is that these this is a hidden variable
Right It's not in the notes. You're seeing You know, it's not there. Nobody's written it out And in fact commercial MIDI files maybe I was even when you download a thousand many bottles of that. It's not reliable Like you're wrong. It's wrong a lot, you know, and especially because in a real musical piece
The composer is going to be shifting keys You know, he's going to shift keys on the thing and he may not bother to tell me that he's shifted keys They're just used, you know different Notations to fill it. You know, it's different so You said this is a hidden variable, do you mean the key? The harmony, yeah the key, the scale, the tonic of the chord and the chord
So all the data from harmony is a hidden variable It's a hidden variable, it's all a hidden variable. Because all you see in the file are these, you see the notes, right? And so that's the thing that our encoding has to represent somehow Are these hidden values in the music? If you want to represent, also this should be said, this is western music This is not brain based, we talked about that earlier. This has nothing to do with the brain, this has to do with western culture, right?
This is how western musicians have organized their music in terms of key, scale, chord, that's how they do it so if you assume you're doing a western piece then a Lot of music uses keys, but only western music really uses chords the way it does
So when you're listening to it, you can tell the difference when it's changed. Oh, yeah, you can tell the difference Yeah, that's partly cultural But nobody had to train you, nobody had to say, the chord's changing now Charlie, little Charlie You know, nobody did that while you were listening to music, nobody taught you to listen to the chord changes
But we can all hear it, but that's because And I want to say what the other hidden variable is, which is also a chord that's used I think I should talk about meter because I think we're almost done
I scared everybody off for working on this at all It's a music theory, it's a bit like alchemy It's pretty simple, it's just combinatorics on, you know, a space like this Yeah, yeah I mean, this is just a subset and a subset of a subset
Yeah, and then a particular guy chosen to be special and another guy chosen to be special It's not that complicated And how many other ways would you, you know, take all the notes and cut them down? One thing that we're seeing in music theory that's interesting is that Music theory is something that you teach to human musicians, right?
So everything in music theory has small numbers of things, right? Humans can't remember 4,096 combinations We can't organize 4,096 combinations in a way that makes any sense to us We have to divide it into, there's one major scale There's only one, and there's, you know, three minor scales that are popular, two other scales that are used in jazz
And that's it. We only have eight scales, eight or nine, maybe twelve scales if you counted every weird scale, right? And, you know, chords, they're either major or they're minor, and then they have extensions So when you're teaching people chords
You're really only dealing with, like, twos and threes, very, very small numbers, you know? But the reason that is, it's not a feature of the space, it's a feature of teaching music to humans Does that make sense? So maybe we need to show these two, maybe we don't I think we have to decide, are we going to make a universal encoder or a cultural bias?
Well, as soon as you put the culture in there too much, you know, you start just generating random cases in the culture Which actually sounds pretty good You know, I don't know if any of you have ever heard of the Eliac Suite, which was made in the 50s On machines with absolutely no power
But, you know, he just took a couple of features here and randomly chose between them And it sounds great, you know, it's really interesting, beautiful music You know, you could do that without any learning, without any, you know, so it doesn't take much if you restrict the space enough To come up with music that sounds pretty darn good Yeah, and it seems to me like if you could come up with a basic encoder for music that encodes all the basic characteristics
And then you play a lot of western music, it should be able to learn, you know, bits that correspond to the recent variables Right Well, I think meter is another key thing that we've got to talk about Can I, can I say it? Yeah, sorry, sorry Can I ask slash say something?
Say something Here's the way I would approach it So I would have a composite encoder just like you do with your date encoder And the composite encoder, you would have a scalar value for each part You would have like the attack of the note, the actual note, the duration of the note, the timbre of the note
And you start with just the small things and you're able to quantify, you know, just exactly What the important qualities of a note are And each sub-encoder inside of the parent encoder would be responsible for its range of bits That encodes this quality
So you would have, you know, five, six qualities Start real small And this composite encoder, each encoder, the scalar encoder inside would be responsible for characterizing that part of the note And then once you're able to encode this and you can put it into the HTM And you can play notes and it can predict notes based on, you know, it can predict notes
Then it might not be necessary to try to quantify exactly what a key is Because given, you know, what it's already working with, it can kind of infer the key
Because it's going to only predict things that are, you know, that it's seen before And things that are, you know, relevant to the key that it's already been working in Kind of like that So, I mean You said a composite encoder, but then you only explained one of the composite encoder
No, no, no C, A, B, C, D Like qualities of the notes And then the attack of the note Hard, soft, whatever kind of way Then you have the duration of the note You have the timbre We won't know the duration if we're streaming this
Right? It's going to be at any point in time in a musical stream I would say you sample it Take five millisecond samples, boom Boom Boom, boom How is it changing over that five millisecond? Boom, boom, boom You know, I don't have any idea
Is the timbre kind of built into what your velocity measure is? Is that kind of what's in the timbre? That's up to this operation That's all the attack and decay stuff would have to be what you're talking about, right? The whole envelope Well, I'm talking about just trying to get a lot of quality out of it I mean, you start simple though
Just see how it works Yeah, we just started This is the chord Not the chord, excuse me This is the note 1 through 12 Not 1 through 128 It's a combination of many notes Here's how I would think Here's how I would think of that
Make a grid of notes A through G 12 notes And then 10 octaves Yeah, 120 And then, like, if you shift key And you just say, you know, what kind of energy do I have In each of those cells If I shift key That's That's a motion up or down
If, you know And then you can think of it like a Vision problem You know, recognizing the shape The shape of something Whether Whether a normal song So you're in shape With that
I don't know That shape changing Sorry about topology Use a multi-encoder Use a multi-encoder Use a multi-encoder for your composite encoder Yeah, that's kind of like the voice leading Problem is topological You're trying to follow this guy You're trying to watch this guy and the guy's the melody
And you're watching his 127 space, right You know So, yeah Hey, Matt, why do you think I gotta talk about meter Let me talk about meter Because it's the clue to the hidden variables Okay, so Time, right
Time in music Depends on what I believe is our Innate sense Of a beat I believe that built into a cerebellum Is a machine that In the 5 millisecond range Can match a beat Really powerful machine back there
And it dictates a lot of what happens in music In all musics In all cultures There's this Well, there's some Never mind A lot of music has beats I'll say it a lot It has beats, right And so you start with the beat, right And then
In most music Then you Group them into larger units, right Right And you subdivide them into Smaller units, right You're off by one in that first part Oh, thank you I was trying to do four, I'm off by one up here
But I didn't Or I need nine, that's most So, right So In the MIDI file over there Things happen here They happen This is called the bar Or the measure This is the beat, right
And this is the eighth note Sorry, there's not a better name for that Sorry, my writing is so scrambled So This is the beginning of a bar Which we call the downbeat Things happen there Notes start there Keys change there
All the change happens right there Right And then there's less change Almost never any change there That's an insignificant place, right Sometimes change happens here That's pretty interesting And then up here I have another layer Stuff happens here And the hierarchy goes on up to sections, movements
And it goes down to smaller units Which you call 16th, 32nd, 64th Right Okay So that's the basic layout of western music And most other music There are ways that we Trick this system
By running two meters at once None of that's important Right now Really prevalent in african music We have syncopation and other things in our music But anyway, so basically Let's just look at it this way Okay, so if you look at the number of things Like there's four beats per bar Right Beats per bar
And there's Two eighths Per beat Right And there's almost always two That's the meter The ratio Right in this hierarchy How many ticks you get Yeah Is one 64th
But not the smallest Resolution Right, well, I'm getting that The thing here, this Is called a tick Right And it's the lowest, the fastest resolution In the midi bar And like I say, it's typically 480 ticks
Per beat On the composition side, it's usually 24 ticks per beat You know, on the other side, if you're generating music Then use 24 Do you need a tempo actually? A tempo is more than that Right, there's tempo, okay, so tempo So we got meter, that's meter How it's structured hierarchically Then the other element in time is tempo
Which is basically a function from The beats To time So if you have a tempo The tempo guy, you know, with a little Multiplication, a very obvious multiplication You can find out what time This occurs, right, so you've got this So right down here at the bottom we've got clock ticks
Clock ticks, tick tick tick tick tick Tick tick tick tick tick tick So using the tempo, you can take this tick And you can say what time is that tick going to occur Relative to the beginning of the piece You know, and you can get it in milliseconds Or nanoseconds and say Yeah, so tempo Is basically a function from tick to beat From time, and you can also take time And with a little multiplication and a floor
Or a round, you know, you can get back to a tick Right Okay, so The tempo and the meter Are also hidden variables Right We get the clock ticks In midi So For all the midi files
You don't have to worry about templating, because the MIDI file encodes the tempo, and that's why. There are MIDI files out there that don't encode the tempo, they encode the simpty, if you've ever heard of that. A certain way of looking at the time that you're using film.
I don't think you should depend on having a tempo. I think you have to. You do? Well, deducing the tempo is very hard work. It's getting better. Maybe I'm not such a thing about the timing. You mean the meter? Yeah. We've got all these ticks, so we're going to want to be predicting when the next beat
is going to come, based on the music we've seen. Right, so you're looking for ticks per beat. Yeah, something like that. And that's recorded in the MIDI file. Yeah, that's an invisible variable, right? No, this you've got. You've got ticks per beat.
Every MIDI file has ticks per beat. Up here on GitHub, I wrote a little thing to take a MIDI file and switch it into 24 ticks per beat, which saves you a lot of hassle, and also gets rid of when two events happen on the same tick, and anything else that's weird.
So, ticks per beat is sort of the first level. I'm thinking of, I think there's a lot of interesting applications for live music, for playing live music, and then you're not going to have that because of the human element involved. Yeah, well that's true.
I mean, that's kind of like the real problem I want everybody to solve is audio to MIDI, and you've got to deduce the tempo, and that's like the first job. As soon as you deduce the tempo, everything else becomes a lot easier, because if you're asking like, you want to figure out what the key is, when the key changes, you know, I call this the beat strength, right? The strength of a moment of time, which goes kind of like this, right?
You know, it goes, you know what I mean? So that there's this point of time, things happen there. Notice changes happen here, right? Changes happen on beat, and certain kinds of changes happen here, certain kinds of changes happen here, you know what I mean?
And the whole, the relationship to beat strength to time is really important. So, you know, to deduce key, you know, you want to find key given, you know, or key change, you know, the key change, given that there's a key change on the beat. And it helps a lot, it helps a lot with everything.
If you're doing audio to midi, it helps you to deduce, you know, whether the note is changing, and if there's a lot of slurry sound in there, you want to say, well, it probably was intended to change on the beat right there. Do you know what the beat is? Yeah, yeah, and it's tractable.
You know, there's too many possible choices for key, too many possible choices for meter, you know, that all might kind of work. I don't want to rush you, but I know you have to leave. I do have to leave. It's not very easy, but it's fun, you know. I mean, that's sort of an introduction to some of the problems, you know.
Fantastic. What is in the repo? Oh, like I said, the thing that opens in midi file, they could get the pitch class. You get that one, not beat, not enough beat. Yeah, it's enough beat. I mean, you could modify that code to stream it.
It's also, midi is really icky, you know. It's ancient, and it's been built on to, and you know what I mean. You know, it's really like, you have to split, you know, nibbles, you have to split things into nibbles and variable length numbers, and you know, it's pretty ugly.
Yeah, I think that's all I got. Oh, thank you a lot.