We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Having fun with music and Keras

00:00

Formal Metadata

Title
Having fun with music and Keras
Alternative Title
Change music in two epochs
Title of Series
Number of Parts
132
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
This talk is about applying deep learning to music. We will look at the raw music data and discover the following: How to detect instruments from a piece of music How to detect what is being played by what instrument How to isolate instruments in multi-instrument (polyphonic) music Instead of applying it to existing music we will generate our own music using some simple musical rules. The benefit of this is that we are in control of the complexity and we know exactly what is being played. We start out simple and then start adding more instruments, different timbres, etc. As we go up in complexity, we shall see how to adapt our models to be able to deal with it. This gives interesting insights in what structures in deep nets work well. I will show: How to build a simple synthesizer using numpy How to create an unlimited data set of improvisation that sounds musical How to use this data set for detecting instruments using deep learning How to filter out one instrument when multiple synthesizers are playing at once
35
74
Thumbnail
11:59
TheoryMusical ensembleEndliche ModelltheorieSingle-precision floating-point formatData modelSample (statistics)SimulationNumberComputer-generated imageryShape (magazine)Sampling (statistics)BitMusical ensembleIterationNumberComplex (psychology)Set (mathematics)IntegerCellular automatonNormal (geometry)Image resolution2 (number)Chaos (cosmogony)LaptopTheoryComputer programmingGame controllerBacktrackingInternet forumMereologyEndliche ModelltheorieRaw image formatComputer animation
CurvatureSample (statistics)SineSystem callDigital rights managementPhase transitionPrice indexBit rateRandom numberDensity of statesMultiplication sign2 (number)MereologyScaling (geometry)Sign (mathematics)Zoom lensBitElectronic visual displayGUI widgetPoint (geometry)WaveMusical ensembleFrequencyLine (geometry)Different (Kate Ryan album)Right angleMathematicsQuicksortSampling (statistics)Phase transitionShape (magazine)Inverse elementPotenz <Mathematik>AdditionWebsiteSpectrum (functional analysis)Drum memoryTrailCASE <Informatik>RandomizationEnvelope (mathematics)Web pageSineCountingFrame problemComputer animation
Price indexBit rateInterior (topology)Chemical equationData modelDataflowMiniDiscScale (map)TrailChord (peer-to-peer)InformationUser interfaceRandom numberBoundary value problemSquare numberDemo (music)PC CardArtificial neural networkAsynchronous Transfer ModeComputer-generated imagerySample (statistics)Instance (computer science)CASE <Informatik>Codierung <Programmierung>Interior (topology)Random walkScaling (geometry)Noise (electronics)Chord (peer-to-peer)Computational complexity theoryRule of inferenceBitVideo gameFrequencyResultantOrder (biology)WeightDrum memoryWaveformSubject indexingDimensional analysisWaveComputer architectureFunction (mathematics)Similarity (geometry)outputEndliche ModelltheorieRight angleWave packetTunisResonatorWell-formed formulaArtificial neural networkRootSet (mathematics)Power (physics)NumberGroup actionMathematicsArithmetic progressionLaptopLimit (category theory)Mixed realityMultiplication signCartesian coordinate systemAlgorithmDifferent (Kate Ryan album)Musical ensemblePiBoundary value problemSubsetInfinityComputer animation
outputTotal S.A.Computer networkBit rateData modelFrequencyRecurrence relationExecution unitWave packetRandom numberMathematical analysisComputer multitaskingFast Fourier transformTime domainSample (statistics)Fourier seriesComputer iconParallel portBitEndliche ModelltheorieWeightMusical ensembleSign (mathematics)Group actionFunction (mathematics)Shape (magazine)SoftwareState of matterSequenceSampling (statistics)Cartesian coordinate systemMultiplication signLatent heatGoodness of fitSet (mathematics)Different (Kate Ryan album)NumberForcing (mathematics)CuboidRecurrence relationFitness functionType theoryScaling (geometry)Right angleMereologyFile formatDimensional analysisStapeldateiFrequencyExecution unitNoise (electronics)ResultantRandom walkoutputInfinityMathematical analysisComputer fileRaw image formatWaveFrame problemForm (programming)Social classMatching (graph theory)WaveformPattern languageFourier seriesElectric generatorLogic gateSource codeHarmonic analysisQuicksortMultiplicationGauge theoryComputer animation
Shape (magazine)outputChord (peer-to-peer)Level (video gaming)Spectrum (functional analysis)Real numberStructural loadDuality (mathematics)Musical ensembleTrailAlpha (investment)Bit rateData modelFunction (mathematics)User interfaceVideoconferencingMusical ensembleLine (geometry)Auditory masking1 (number)Rule of inferenceFrequencyMultiplication signImage resolutionEndliche ModelltheorieSound effectLogic gateoutputBit2 (number)Function (mathematics)Shape (magazine)Boolean algebraLengthFourier seriesSoftwarePhase transitionOperator (mathematics)Transformation (genetics)Chaos (cosmogony)Domain nameLatent heatFunctional (mathematics)GradientSigmoid functionTime domainMagnetic stripe cardLambda calculusPiFrame problemDisk read-and-write headPredictabilityWindowCASE <Informatik>Execution unitBootingStapeldateiDimensional analysisRight angleFitness functionMorley's categoricity theoremEntropie <Informationstheorie>SequenceIterationDot productInformationMotion captureMereologyWave functionWaveformMixed realityPunched cardNetwork topologyBacktrackingThresholding (image processing)Chord (peer-to-peer)Moving averageSpectrum (functional analysis)ArmLevel (video gaming)Graph (mathematics)QuicksortOpen setPoint (geometry)Single-precision floating-point formatComputer animation
VideoconferencingBitGoodness of fitMultiplication signImage resolutionInstance (computer science)Beat (acoustics)Classical physicsCASE <Informatik>Order (biology)TelecommunicationLevel (video gaming)MathematicsComplex (psychology)Key (cryptography)Process (computing)ChainState of matterNumberMusical ensembleSoftwareVibrationPoint (geometry)Different (Kate Ryan album)AudiovisualisierungBacktrackingRandom walkEndliche ModelltheorieSpectrum (functional analysis)PredictabilityWaveformFigurate numberNP-hardAdditionComputer animationEngineering drawingDiagram
Transcript: English(auto-generated)
I actually changed my title of the talk to having fun with music and Keras because I think that's that's what it's all about Sorry So What's my personal motivation for you going to music and doing stuff with it? Well, I play guitar and
When I was studying playing guitar 20 years ago, I was listening to a CD trying to figure out what notes were being played Push the rewind button listening over again. Try it try it keep repeating repeating it took so much effort I was like, isn't there a computer program that actually can do this for me
What why is there no computer program that can do this forum for me? But apparently it's very very difficult and right now with deep learning we are actually getting close to tackling this these kind of problems and So that's my personal motivation for looking into it. So today what are we going to do? first we're going to
build a simple synthesizer using just numpy and some Jupiter notebook enhancements then I will explain it just a little bit about music theory and Come up with a very simple way of generating Well you it sounds musical. It's not it's not the biggest new summer hit of 2018
Let's let's be fair about it It's just some music and then use this generated So we have we are going to generate data and we can use it for deep learning So there's two benefits. First of all, we we are in control of our own complexity so we can make the sounds more distinct or more similar, but also we can
We have also an unlimited data set we can just keep on iterating and we know what the ground truth is so that's we don't have the effort of Labeling our data or that's all being taken care of
so that's why we take this approach and Also, it's way more fun this way because then we can build some synthesizers in numpy So what are we going to do with on the deep learning part so onset on synthesized music I will Single out a single instrument when multiple synthesizers are playing at together. So just filtering it out
then the second thing that I will be covering is detect when What notes are being played? first for a single instrument and then when Instruments are playing together and when there are multiple notes being played by the same instrument and then finally I want to extend that model
To with multiple notes and I also have a I couldn't resist doing one real-world example where I've taken a backing track play some guitar on it and try to Find to have a detector when I'm playing or not Okay, so let's first make some
witness This is readable. Should I make a bit bigger? So first let's have a look at what is music actually So let's let's I've loaded here Some data and if you look at the data, it just consists of
Raw the raw data is just integers and what do these integers represent integers actually represent? How far the speaker is going to move left or right? so since the integers have a specific resolution, so right now there are 16-bit integers and I'm going to scale them between minus 1 and 1 being the
All the way to one end and all the way to the other end for the speaker Well, if we look at the shape, it's just one big array one dimensional array because it's mono if it was Stereo you have like multiple channels over here Then we also get another quantity back and that's called the sample rate and that's basically
Tells you how many samples are there per second. So every say there's this like With normal CD quality music. We have 44 kilohertz and that means you have 44,000 Samples per second. So we have this so if we would have
44,000 of these numbers we would have one second of music So we can plot them It looks like this this is for a whole piece of music on the x-axis you have the time in seconds So there's an intro over here. So my
Bunch of music over here and then finally some Intermediate part and outro now, let's just zoom in a bit And there's this very nice I Python display widget where you can just Put in any array and just listen to it It automatically also scales it to minus 1 1 so you don't necessarily have to do it and then let's hope this all works
So this is now what you find here below That's it. If we're going to zoom in a bit. I'm just going to explain you how difficult music is
Then we find something like this That's All encoded in this part of music if we zoom in it slightly more Then this this you should reconstruct from this mess of Stuff over there. That's really a difficult
signal basically If you look at So now we are going to to make a simple synthesizer every going to make some some music So let's start with a sign generator sign is just as the most the simplest
Periodic signal that we can think of so we start Doing it in the data frame. We have some time We sample it with the sample rate and we were going to compute the the sign of it every time step Let's do it. Then we get this kind of big blob Then if we zoom in we see actually the sign back and there's also a nice thing with there's a free
methods called for yay, and actually what for yay does is it takes this signal this the signal and it will Determine what frequencies are present in the signal. So in this case, I've made a sign with 440 Hertz you can actually see if we make a spectral representation for it
We see actually one big line of this 400 Hertz. We can also listen to it. So let's do it Not not most pleasant sound But that's that's the sign It's not not very musical at this point. So let's make Let's put an envelope on it. So let's have an exponential decay of this sine wave. How does that sound?
already slightly better Then Let's add some multiple signs together nice thing about About tones is I can like if I have a sign of a some frequency
I can add another sign of twice the frequency or three times the frequency and will still sound like as the original Sign that I had the same tone will still be perceived but it sounds different. It has a different Sort of timbre or different feel it's more bright or it
But it doesn't change the perception of the pitch. So let's try that out so I've now again added so here I'm Just adding a couple of signs together Like for 40 Hertz two times three times four times with some different random amplitudes that I've put in. I'm going to Well, this is down the thing that we generate
In spectral we see actually the lines going coming back and then how does it sound? So it's still and if you look at the So this is still Right. So if we go back
The same pitch right, although we have added a lot of extra frequencies to it so what we also can do I added over here or we also can do is we can also just Give give the science random phases. So what what I've done over here is
Here you have I've added the science together with all phase zero Then I did some random phases. So the shape is completely different, right? Will they say it still sound the same do you think? Sorry Sort of well, it's a bit late this thing
Is one we hear a difference? Exactly the same. So you're you're your ear is completely Cannot hear the difference between this part in this part
So already for when you're doing deep learning things and you're looking at RMSEs and stuff like that You have to take two into account that that RMSE might not be a very good quantity for comparing Signals with each other Well, let's add some decay on the additive sound Let's do it
We have something that could it's a very simple electrical piano or maybe a harp or it's not very beautiful, but I Can play something with it? Now let's have a little just as a site sidetrack. Let's do some drums. So here I'm taking
I'm Slowly going to doing one over the inverse of the so the frequencies gets much lower as we go on and Also, I'm going to add some Exponential decay, let's listen to that this one
Have a drum kick Then for the snare, I'm going to build a snare from a drum kick plus a bit of noise. So noise, I'm sorry We're going to listen to noise That's noise And if I add some
So the the the the the kick and a very short bit of the noise together Yes Something that's like a snare drum. So you can do a lot of cool things with just simple simple things combining
Simple things there's another thing that's called subtractive synthesis actually starts with building Very rich waveform like a square wave has a lot of frequencies in it. It has a so It's very frequency rich as a and then you you actually cut out some frequencies in order to make it
Again nice nice sounding so this is a Like a Mario computer game That's without the filtering and then with the filtering it sounds like this So there's a there's a lot of different ways of generating sound there's a there's also FM synthesis which is different way
There's also you can also try to model a complete piano with all the resonances and stuff like that So it goes it goes on and on But this is just a very simple way of building it Okay, let's start Make some music So who's aware who knows a bit about music? I'm going to explain it anyway
Okay, that's that's more luckily quite a lot a lot so everyone knows this this picture right of piano actually, there's The nice thing about the white notes. The white notes are all in the same scale
But let's let's I will talk about it a little bit later, so if you look at the notes Every note in modern Western tuning is very easily calculated So you have just this the 12th root of 2 to the power of n where n is the number of half steps
Away from some root note and a half step is like going from this to that to this to that That's that that's a half step. So if you want to know that this the the frequency of this note I'll just count back how many how many I'm away from from a for say that's this note now 1 2 3 4 Add that into this formula and I can compute all the frequencies of the piano that are there
There are different ways of tuning but this is the simplest one. So Well, I've created here now a small instrument from the the things that we did in the previous Notebook and I can just generate now a note
That's a note But now let's have a look at scale So the scale is basically a groups of notes that fits well together so Dore mi fa sol la tito. Everyone knows it, right? Well, you can easily so that's Dore mi fa sol la tito and then we can also make for every
Note that's in a scale. You can also make a chord out of it. So you can also do that mathematically. So that's
so that's
So Now we have this this this kind of rules that govern together with which note sounds musical together well, we can just make some random walk in like this scale and then mix it together and try to find out Something that that fits well together so
but first of all first, I will put a chord progression below it just to listen to I should first say this this corporate I took this chord progression because it's like the most common chord progression in pop music and I Just decided to go for this one again
So you probably all know a song that fits with this I suppose
So now the notes are just I'm going to generate some notes on top of this And this is just going to be a random walk. So of course I Can I don't want my notes to sound too high and I don't want to sound them too low So I'm going to have a bounded random walk. So whenever I'm going to draw a new note that is above the limit
I'm going to bounce back and and Go back again. So that's the way I do it And I'm also going to so there's a lot of some probabilities of making the random walk So I will never stay on the same note And there's a quite a big chance of going to note that's close to the other note and there's a smaller note
Smaller chance to making a big big step and and so on. I just tuned this a little bit by hand until I thought okay This is sort of acceptable well, if you take the random walk and if I just index it with the the scale that I put in I just get a lot of Well notes and then I can just generate those notes and mix it together and let me get something like this
this can go on like infinite infinite time Sounds even better So Well, let's go So now that we have this Infinite data set. I'm going to the deep learning
Applications and the first thing I will do is to filter out the lead instrument. So in the previous sound fragment, I Had the the chords and the melody now, I'm going to Learn the the algorithm to get this melody and regenerate the only the the melody again
The chords plus melody and go to only the melody So a model is input way for three instruments output way for just one instrument and Actually one way of doing it is to do it to do with outer encoder outer encoder like who knows what our train coders
Okay, I will just go quickly over it an outer encoder is a neural network where which Goes from from an input like for instance it took in this case a picture
and the picture will go through a set of layers then it will go to typically to a layer that has less dimensions than the original layer and then it will scale back up to the Original input size so you you get this picture in and it has to so that you train it by giving
This picture in and the same picture in as a as a result that it has to produce again So then it will learn weights to actually come up with a way of compressing it such that it makes it also easy to To generate it again back again So this is the outer encoder. So one of one of the things with outer encounter is like an
An architecture that has the same input nodes as output nodes and you train it with the same as the same input as outputs Well, we can actually do something similar but instead of using the original and the
Inputs the same as the output we can say well, I'll take the input the complete Wave with combined instruments as an output. I will just only give it the the melody So this is the input
And this is the output This is how we build it. So we start with a model like a Keras Keras model
Then I'm going to add a dense layer, which is I'm going to should have mentioned that before So I'm the music I cannot music is like a continuous signal as it goes on it goes on it goes on So we have to cut it up in some way So what I did is I cut the music up into small fragments of 1000 samples 1024 actually
and then I would On this fragment, I would just filter out the lead instrument. So that's what I'm doing putting in here, so there's a 1024 samples going in then I have Pray the layer just Tried out different things with this this work best. Then we have another dance layer. So this is one is smaller
I took the half of it and then we're going to scale up again to the original output shape trained it on the targets and outputs and How well how does it sound before? It's like this should maybe have a bit shorter fragments here. I see people sitting like this
but Not not so much. It's a network still has to learn a lot right? I must say that actually in
Generating this part of the music. I'm still putting in the original Wave format, but then chopped up in batches of 1024. Yeah, so actually what I'm doing is I'm
Taking the the I'm doing the batched ways batched way and then I'm shifting also half by half and then I'm Counting adding them together with like cross fading. So that's a Source it So I also will fit well this white what's the original title so actually you can fit a model into epochs
That's it. That's how it looked like and let's see how it sounds and this is
Can even that's that's that's it. Is it over fit it? I would probably if you just have the dataset It's very likely to be outfitted right? So what I did here is I
Generated the data set again. So just a totally new data set and also put in a different different type of Scale here. So I trained it on a minor scale and this is a this major No, it's Phrygian harmonic belief
Phrygian dominant so it's a different scale and Let's listen to it
There's still a bit of noise left in there main it's sort of that that still needs to to If you train this for longer than two epochs you get more Okay, you keep you get better results Yeah, and also I can also make the epochs really really long of course infinite amounts of data, so
Let's Go to another Part and that is the note detection. So right now we're going to detect what notes are being played So here the model is going to be waving
sequence of detected notes out So the method will here be so generate some random music Chop the wave data again in small batches of 1 10th of a second then use Fourier Fourier is the way of getting to the frequencies that we saw earlier and Then for each of these batches
We're going to predict what note is being played Then we're going to write this this as a multi class classification problem I will show you in more detail later and then each class Is going to be one note is being played and fine. I will train it as a gauge and recurrent
Units as a classifier So for generating and rendering the data so what I get is a data frame where Well, I take the original data that I did random walk from the random walk I will basically make a different that looks like this I have an offset and and and between those two I have one hot encoded
So basically, well, it's not necessarily one hot but I've listed what notes are being played in this in this sequence Then finally, I'm going to match match my batches to one of these offsets ends and then I know okay this notes being played or not
So if you look at the Fourier analysis of a waveform, it looks like this Yeah, so actually you can already see here the individual note. You can see it back for for for the 1d for the one One instrument you also see it. I don't know if it's very visible, but there's like the same pattern
It's also repeated above here and it's like at three times the the frequency and that has to do with the Extra notes that I showed at the beginning for adding the additional signs to it to the original Note so setting up the the group so
Who has ever used gated recurrent unit before? Who knows a bit about recurrent neural nets? So I'll just quickly go over maybe short introduction to it, so
Let's I will only keep it at recurrent neural nets So what is what's recurrent neural nets you're starting with some input X Then that will go to a hidden state
H and that will produce an output. Oh, so what do we what we can what you can see is like if you have a sequence Axis then you will basically update every time the hidden state here So this one will update over here this one over there and then this will
Is a very good way of accumulating evidence for a specific for for specific Feature you don't it's and you can also do this on randomly sized sequences and stuff like that
So I am since I can I know that I've Generated data with only 14 different notes. I'm using a 14 dimensional
State so that's also the the same number of notes that can be played so I've put the number of time steps to one And I put the RNN stateful, which means that it will it will keep on
remembering what it has seen before and then finally, I I will Apply the Fourier before going into it And I'll show you the dimension the model the modeling building over here so this is the number of channels that I There's for a great gate recurrent unit. You basically build it up with a number of steps
Per batch. So you you you basically chop up your original signal into batches Then in that batch you have a number of time steps per batch and then you have a number of channels Which for typically raw wave for wave files, it's only one, but since we are now giving it a Fourier input
It will be actually a lot of frequencies per time step in this case. So we have a Slightly more complicated inputs over here Then we have the gated to current units. We boot at the end a dense Layer just for compensating being able to fit it a bit easier. Then we have input over here
32 is a batch size. Just one step or so one free a spectrum per time step 2049 for a dimensions and then our gated recurrent units and finally we have our
Network predicting which one of these 14 nodes are being played and we use the categorical cross-entropy for fitting it So this is fitting it's typically Keras outputs and Then if we Look at the prediction So if we this on the right is what we put in as a function of time over here
We see what note is being played as a yellow blob as a punch card basically, and this is what it predicts now So this is only after just a little bit of fitting Okay, so this is actually was the easy Easy one if we by the way, if we test it we see that it's also still works
I must say that if you train this a lot longer it starts to look a lot better and actually I Will do that for? This one we're going to make it a bit harder now instead of looking at a single instrument we're now going to look at the combined the whole sequence of the music that I
Displayed earlier, so we're going to add a chords and bass instruments so this is how it sounds and Add and we're also going to add harmony. So the the lead and the melody instrument is going to play two nights at the same time
So there's that that's the music this is how it free a spectrum now looks like so but so this is also what I'm going to offer to my Network to to fit on and you can see over here is messy, right?
It's really messy. I can it's hard for me to find out what's being played Although if I look a little bit higher, I can still find the dots there Information is still in a way it's still there but it is really hard to find for institute to come up with business rules that that that's That that's captured this problem very well
So this case I'm still using the same the same Network topology, I Think I've did if I see look here I split did it half the time frequency is twice as big and then Let's fit it Get some coffee and it looks like this
And so this is after a lot of iterations. It's actually it finds its as you can see it finds it back quite well Every now and then they're still like very close to the notes. It's not not completely confident you see over here sometimes so when it's close to to to To to the part where it's really difficult. So if you remember this part, right?
That corresponds to this part So it really has done quite well at this part Well, okay if we get a bit of coffee and it looks already this good what will happen if we train it for one night?
It actually does it almost perfectly I would say so that That's it Now the final thing is Let's let's do it on the real music now So and the the case is can we detect when I'm using it when an instrument is playing
So we're going to load a bit of data and this is I played some music over a backing track and there's a full mix and there's so because I played over it, I can also just mix down only the guitar and I have the two separately now what I'm going to do First of all, we have a big a big issue of detecting when is the guitar playing solo?
so here this is only the guitar part and First of all, I cannot say if the guitar is below zero. I am NOT playing because it means There the guitar goes the the waveform goes through zero all the time and so I cannot just say
Below some threshold. It's not working. So I'm instead of that. I'm calculating the rolling RMC and When the rolling RMC is dropping below some level I'm going to gate it and so I'm going to stop Saying the guitar is still playing
so if we just So in the if you look at this graph if the time the guitar is in blue and you see the original So the mix mixed waveform in gray and you see the rolling RMC in the orange and whenever I'm the the the green Line is just one and or zero whether I'm playing or not playing. So I'm I can
basically Mask the signal with the zeros and the ones And so I'm going to Multiply the signal with with zero when I'm not playing or with one if I am playing and then we can actually hear the
effect of the the gate So that it detects when it's not playing or not playing Sometimes it's a little bit for jitter, but it happens when
Now we are going to build a model and we are again going to chop up in fragments of 0.1 seconds and That's so going to be the time resolution so I can either know I'm going to be playing the instruments or not within this time resolution and
For each fragment. We are going to detect her whether the guitars playing or not so we have So we have a short fragment which is Open point seconds, then we have the model and we are going to predict playing or not playing a boolean So this is how it looks like
So I'm taking an input over here With the fragment length, which is open one second then I'm actually one very nice thing about TensorFlow and Keras that you can still apply some specific functions within
Keras so instead of first doing Fourier by hand before I can also say well have Apply a Fourier somewhere in the middle of my network and calculate even the gradients of the that operation so you can have a look at what kind of transformations are still in in TensorFlow or
That you can actually just do some nice things inside your model instead of in the pre-processing case So here I'm actually doing the the free a inside model with a lambda and this is the output shape will be then The fragment will still be the fragment length and
Finally, I'm going to put on that a dense layer. That's just going to add with the sigmoid activation So it's either a 0 or 1 and I'm going to predict whether I'm playing or not So the model then hey, you see this is the the time domain the wave function
This is the frequency domain like the stripes and finally yes or no So actually the nice thing about it is we can actually listen to it and watch at the same time So I made does anyone know movie pie? I So definitely a package that you want to check out because there's a very nice function where you can just
You can just make a make a frame given some time that you Some some value some X and then you can render it to a movie very easily. So I use this for for These movies what we're going to see is we have the waveform and we are going to have a sliding window
going across the waveform and As the the waveform progresses, I'm going to light up blue whenever I detected the the the notice playing Then we have also the green line with the gating whether I'm playing or not as well as the prediction in orange
Open it differently
Because otherwise you'll hear me fucking up That's it. Okay. I think we've got a few minutes for questions
I saw you first
Hey, thanks for your talk. I'm really curious. How fast can a network figure this stuff out like would it be applicable for music visualization because With electronic music, it's fairly easy to do stuff be detection is that through but classic music or acoustic is kind of hard
So would this be useful for that case? Well with music So people have a very good internal clock for when it comes to have a very good internal time resolution when it comes to music perception, so
You need to be really really really you have to have a very short latency So for instance already the rendering you need to be able to do it within like order of 16th of many milliseconds so I don't know if a lot of people probably did see a little bit of a desync between Within the USB well already there you it's very humans are very perceptible to to small
Latencies, so I think it in principle it would be possible, but you yeah you you would It's not going to be easy well For what we see right now, I don't know actually
So, but I know that that latencies of 20 milliseconds Can become noticeable already Especially when you're making the music so if you're playing an instrument you have a latency of 20 20 milliseconds So you'll already start noticing it But that's when you're playing it if you're just listening doing it and and and seeing it of course with electronic music
You can already pre process it probably right so you can pre render it I suppose but for for real live music where there is People playing live on stage with no backing track or whatever then yeah, it's it's going to be hard. I suppose
Okay, any other questions well key changes is well repeat that How to deal with more complex music like jazz where there's key changes in principle so now I've really tuned my model to
To the complexity of of that I have right now, so I've only had my hidden state has only 14 14 neurons which exactly the number of notes that I can draw from the random walk in principle. You can still enlarge it to like the complete spectrum of all different
Different possible notes on piano for instance, but then okay jazz music has still a lot of like like Notes gets gets banned up like it goes higher slowly So it's in between or a vibrato of vibrato stuff like that. We have a lot of Say
Additional things that color the music that you still need to be able to Write down maybe or not and or maybe can even harm you reference of vibrato could maybe be Transcribed as higher lower higher lower and stuff like that so it's going to be difficult, but I suppose that Actually the thing that I tried here right now is actually to make it simple and then build it up
So I'm actually not not intending to stop it at this point and slowly yeah, maybe get some jazzy having on Think we have time for one more question
So they are published So if you look at my name github, and then my name Marcel has you'll find it There's also the whole synthesizer thing is also there It could could be cleaned up a bit, but that's so Okay, there any other questions
If not, I'd like to thank myself