Merken

Music transcription with Python

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
so this is on a ship full of practices I didn't point at the beginning and you talk about music transcription you given behind already would do it again and enjoy the all fj high everyone answers found out there I have 5 minutes left there but I did so I'm interested in having find and meaning is there enough of them and I worked at the top the developer in the music industry in a company called a wooden tables and we will use 3 main products and the men that self blades said it so you workstation which allows musicians to regard and even if you needed a path that was actually designed as and instrument you can take on state and perform like and and next thing you think that acknowledging that allows people to play stay together on electronic devices thanks to thinking them in time over and over for this network and last but not least training beauty is actually something I work on a day but only
uses an instrument that allows you to create your musical musical ideas without looking at a computer although it's connected to like
um yes but let's get back to that so what is the mean to transcribe music transcribing music means and transforming the audio recording to communication what means you just like right down here where nodes that can be interpreted by other people there some the people that like superpowers of doing it and you're under 1 of them might years haven't been properly trained to do it so that's why I referred to figure out that what that when change was so let's have a look on what we need to the right to do it from that so 1st of all we have to figure out how to read the other thing is the thing about what it is we're going to be and how to read and start so that we can
later processing then how to figure out there is no actually occurred and what not was that like the SII was if you would of the D 1 and then I prescribed so write down some standardized way that could be interpreted by software and finding the instruments or
OK so the events that 1st question how can we sort it 1st of all we need to know what data so our audio and what this basically continues continue as a way for the weight that we approach may consciously like present like this with a computer 1st need to be decided words that we did excitation comes in 2 steps something and quantization what is mean so something basically changes so the signal to a sequence of samples so we end up having the
unit and a set of samples how many samples this is turning by something right that we have to use so something right
defines how many edits samples that will be key parts and there is also an important thing that we need to remember all while something which is like 1 of the basic and most important things in the whole hold still Signal Processing which he is so called channel or quickly simply something theorem that says that you had to make sure to be able to later stored a continuous signal we had uh the input to get to be able to start we have to make sure that I would with in the components in our digitized signal don't don't contain convincing components about half of the something right what it means is that like what happens if you have like a higher frequencies in there and then these frequency comes in and wants to represent himself itself never have some data so what he does he kind he tries to do is you a different frequency that example which is called thing so connect taken LES and like to the text itself which he and uh thing is like that of course because 1st of all was we lose the pregnancies but we also graph
low frequency because we don't know any in the the frequency we have there is actually the 1 that belong to the original signal or the use of was there because of anything that so we have just make sure you pick up the rights and something right the usual 1 ESA 44 . 1
kilohertz and allows us to the and hold everything that audible for human beings because we hear something that the range from about 20 to 28 OK and so now we we have our independent domain that contains a the number of them things like in number of samples but now feel our and been variable it's not a thing again because each value can be and so let's say we want to have the information on a set number of bytes that states and want to have only like 100 numbers available for encoding data which means we have to to quantize the data the most simple thing to do is just to find the nearest when the quantization levels of the news possible value to the assignment arms to some also like just correctly amplitude and so both something and quantization end up like restricting how much information we end up having in our ridiculous um so we know what we have when we read and read of this thing it's going to be like probably rather large right data will have to figure out what things to choose like what data that's it used to start and then we're going to do image segments of 1st of all we want say that the the current we here have applauded recording
how down this 1 I think that this is something that we can see exactly like I like to know right it's quite easy to see on and thought with firmer we'll have to grasp how to how to calculate the of a and then we we know that that they do not now in a spectrum graph we can see that these weren't at different all right because there was a significantly in different frequencies of OK and the uh last is to think about the standard have to encode the notes that are there so for instance that later and our with the questions again need to reconstruct started after it had to do that not have plants and before we go any further to my suggested implementation of
that let's see what we are aiming for so many forces to demonstrate and any other things in their own right now and on the affinity of processing and you watch situation and it's just a small can get those notes later period detailed Lorentz around and see how it works so I'm gonna play a beautiful thing that's been with me ever since I was in elementary school probably and we use it quite a lot from the last time you like I have this microphone here that I'm gonna later the and I'm not gonna patents melodies later wanted units of energy to work some of the so I'm going to ask my argument is that the nodes had that in speech and language and some people that a you prayer a called To étant raw artwork or begin Diggins iii ee ee ee iii I feel better than expected expected this could have happened um so what's happened here is that we're chunks data time and we process them trying to find trying to figure out the angle approach and what not words then we create we created an
analogy the standardized formats send to the center that's either that produce in south something that effect so how do we do that
by metadata using a highly recommended reading which is basically a user Python bindings around a quarter of you just like a cross-platform library enabling you to do that and play an important audio in real-time that it supports blocking and nonblocking knowledge number of knowledge that is based on just calling collections increasing temperature right that's what region here as basically each instantiated prior object create an open a stream in our case for reading the data that tell what's data from format 1 from each of within the region and what he is the holder and most importantly start to stream and make sure that main thread doesn't die as term that's a you we need to keep it alive by way
of putting some legacy there's an so let's see how call what it called the sort that you see the
data being found in the remainder their companies they survived and our data stream that is point and and uh what's most important needs to return the brain can't frames and reflect that flag tells our words uh stream and it
should continue reading the data and it happens when we passed the continued that or it femininity in that case but yeah we you would apply
to say at the top of the of of Eq
what's next next we have to figure out how to start the latest here
we're getting streams not the best I've made before the manipulation of of relations I think you could get to latest string
here so that's why we have some converted to an employer white gray because we do i fermentation than Python lists and you to into an implementation even though about implementations on by just because certainly allow you to include life different type of objects in the same reasoning to stored information about that 5 that can be used the vectorized implementations and con we optimize operations on them you use doing this and that is this number right so there things an I think
so and also like this the reason of rather complicated some but very popular in common um operations on the mattresses lie uh bowling and or like making
transpose transposing mattresses or light and we get the Fourier transform perfect but it and signals we Britain data now and the seeds are at the front end of data something changed and denoted here by I
plotted shorthand for transform of our recording from previous example of why I would serve would mean that a certain point it means that the recording was divided into chunks and for each and we calculated the power spectrum to see the energy changes of the spectrum time uh this finding and uh distinguishing like was a significant change in our spectrum assuming like the end of the year uh what have we done do that in our inference an implementation because like we analyze the terms of the time
but basically what it says calculate power spectrum that and we tried to compare it with the previous and 2nd so as we want to measure how we need power of spectrum changes with time um we do it using so called spectral light which is basically the difference between the current and spectrum and that this 1 and plot it with the green line here
over a short time Fourier um of soul we find the peaks and you're already there is a there are some minor peaks that he might stand out and about and you know finding and we don't want any this kind of a noise we apply simple thresholding function which is basically we choose a number of chance that we never really and multiplied by a given constant and to make sure we uh and we're like basically on the
interested in seeking peace there about a given threshold and uh that is the beta nodes the values that the bigger than previous that have been here we can see that thresholding function
could be better because it's still leaves behind some uh things don't wanna uh I choose in my implementation thresholding function like which is far higher because it wants to make sure it's not too sensitive in an environment like this said valley from better but these 2 in front the number of Transcription average and multiplier is basically as the parameters that you can change to adapt your um to make your
application from that but no that we keep this is the 2nd and here
and we want to find so what speech is not not fact that we do it by calculating so-called cepstrum of signal and cepstrum used in various the Fourier transform of the
logarithm of the calculated spectrum is in light the word suspended the spectrum of the spectrum so that's a way to think about it and they think treated as the information of the about the rate of change over time so it's kind of a measure of time you should think about it was over signal in time domain by the chemical anatomical of ghrelin correlated with the time 18 uh in that so we all know that frequency equals 1 pair time of in the cycle and way so uh knowing debt that in our frequency domain we can see the domain of cepstrum always playing with words that consist throughout frequency and we can think like it that we presented at a later time cycle so like the high frequencies will be like have shorter time cycles of and they would be they will be represented
at the beginning of the frequency domain and then uh lower with year here so before we start thinking power finding of fundamental frequency in this section we want to actually this is all written out so we want to narrow the set some to the frequencies we're interested
in I the frequency to their frequencies to ones corresponding to light 8 nodes so probably you play tonight and I'm here and there simply begin narrow range and we can think of it a so the discrepancies are 5 times 100 efforts to of 100 hertz you can also think about them as narrowing it to 80 microseconds over time cycle to to music the place on the responses by and but this process from work from effects of knowing this as we take the maximum value to our example world between 25 and 30 and
effect we have you like the value we consider mainly have not on frequency to frequency because that's what we're interested to calculate we simply define the sample rates by and 2nd such B in which kind of is derived from what I've just try to explain the so in our
example and his low-dimensional we're narrowing the cepstrum and a and then find the narrow September then well actually when we're trying to grapple with the of the project to have to remember that the the index should refer to the region of the right we would drink it and we not like saying valuable and that being 689 but now
it's also nice mention that by the pitch detection of
splice site correction to our detection because then you can just ignore the the onset that are out of power in the frequency range of our up to now we found to now we wanna encoded in that thing that can be later understood by a simple but so what we do use so what we do know is that we choose something ready to use something where I used must in everything needed practical me stands for a musical instrument digital interface and basically and it can collect a and the loss of the and the speech and velocity of our not death
messages that media messages here um interested in would be not on a dull note on means they're out of something about and message from the tho we might have to considered here uh and the first one we say what kind of message and is what some of you be using we have the chance to use a 2nd data by I repeat encoded as we can see on 7 128 values and the left 1 velocity meaning like the velocity is friend of and not being so it means like we proceeded noting right now or subject right so is how we transform frequency to meeting notes and number because it concealed we only have 7 bytes to encode a peach meaning that maximum value b 107 but our preconceived
689 so as to
be tho that while I believe it should and and this is the
part of the part of the lecture because you want to know what out as what media number and what frequency and as you can see here some of the power from knowledgeable frequency 680 98 plates these uh is notes as In the OK and have the median number sets of now we know
what our notice what it we can encode the message and send it to a degree instruments I chose the a latter recalled hyperlinks and which is also a stratified by means of a round thing called plugin which is basically suffered it allows you to play found once input instruments and yet it
would be in its um uh 1 underground conclusions heightened these amazing for rapid-prototyping I think I wouldn't really used primarily for production all but to just uh check from late 2000 and uh these solutions were likely and try different detection algorithms the just would look at the time the whole thing then seemed and now it's not
that's a lot of about you can
have a look at it and you have a standing up for the roughly 2 hours the social and this field there sometimes they they use now it's not
uh who's there it was to so you have to weigh wait the them you run by the available online from and I so why is it so with correct from having so 1st of all thanks to amazing numerical libraries that indicates that before like number by makes things much easier than advice and of course I O operations inside our life reasonable and know much
messing around like there that's really useful and the API of the Roberts I used are really going and to make use of very clean and very readable so yeah the answer that's all I have had over today and Valori thank you for Mr. questions this time 1 of the major planes active was listening to the 1st you need a very good microphone really words might is known as the anticipated from this fun because last month I get knowledge from all around the country has its designed this 1 thank and 2nd whether out instrument unification me knowing that it's the piano parameter by that is different things so of distinguishing between the 2 different instruments would have different spectral spectral uh the features and then we would needs to analyze it and things like and we'll probably wouldn't care about the energy distribution over time rather than care about the energy distribution at all like you know like different instruments probably have different characteristics they have yet the people would be probably um yet completely different things to analyze other of having the head and the a often supervision could courses is for models of this 1 meaning melody and only 1 not playing at the same time as and 1 of the means of the challenges for me certifies that you might have joined this for courts yeah so the problem we want to that life heart to distinguish what not to replace because the frequencies overlap and you are had a very performance as an algorithm that sent on a wide transcribing parts of probably be a and so on and the management something by the solutions available by now on no more uh efficient and 70 per cent so it's a rather complicated thing but I still think I'm going work and trying to convince very interesting concept is CQT transfer and this is what people tend to like try to work with money music written about so yeah that's why I did by more questions and you can have the same question the you know and that there is a story that says that the of nonlinear models Maltsev when he was a child he had a full playoff 1 piece of music and he was able to transcribe from his memory all the pieces from the your your system is able to to here on a piece of music and direct and in music notes all the pieces like Mozart's air what he with the problem so as version and his vision was to give up works only for real time input so uh I want and the time I was 1 of the time I initially implemented for a analyzing uh twofold like we pre-recorded things uh file and uh so actually each can recognize notes and pictures of that in time and like uh reconstruct like it creates meeting notes that will make you time and and beach and so on but unfortunately the problem and has discussed with their uh by answering the question would be to like I have a conference music transcription reach items 48 so that means that the network many pictures help that the direction we have time for 1 more question the and think about units so in you can make a recognition of monophonic music would you do you know about some implementations that that also led to the user to teach you unintelligible something clock desist enormous and don't interpret it well this is my color of instruments and don't interpret and they think that this isn't and approachable because even in of the tonal and use different language for harmonic detection different funding from managing for a rhythm and know a new method for supervised the detection so that is a property called is practical at all users tools which would be too difficult to implement and then directly in question like would the money on the recommendations are a kind of like I could say to that I want you to try to produce and dumped on transcribed up and so in such a to know detected your ghost modes of Q so this is the noise and it interprets it so maybe you could just put this noise to of goods and it would lend that this is just another yes mentioned directly from maybe you 1st have to have like their and he's very much like you know like it to find out some spectral features that would be able to determine lies in the use and that you want sort about something that not trivial to implement against the student and I don't think NO greater but certainly uh into interesting thing I think so he just probably yeah 1st of all we have to try different like tryouts that like taking different spectral features like what is what works best for distinguishing being different instruments by what about the timber or something and then just you're you have word for the feature vector and that would be my idea of intervention by the I don't know of any real information like this b and there are people who are trying to like separating separating instruments and the symbolic the frequencies of a lot of the characteristics of the facility but yeah I don't know of anything but that would perform well we were and we have time for so I'm sure you will join me in thanking and
Arithmetisches Mittel
Wellenpaket
Punkt
Gruppe <Mathematik>
Arbeitsplatzcomputer
Softwareentwickler
Biprodukt
Computeranimation
Aggregatzustand
Tabelle <Informatik>
Arithmetisches Mittel
Telekommunikation
Knotenmenge
Rechter Winkel
Gruppe <Mathematik>
Mathematisierung
Computer
Computeranimation
Folge <Mathematik>
Gewicht <Mathematik>
Viereck
Geometrische Quantisierung
Software
Stichprobenumfang
Wort <Informatik>
Computer
Ereignishorizont
Computeranimation
Analoge Signalverarbeitung
Digitalsignal
Einheit <Mathematik>
Graph
Menge
Rechter Winkel
Theorem
Mereologie
Stichprobenumfang
Zusammenhängender Graph
Ein-Ausgabe
Frequenz
Computeranimation
Domain-Name
Geometrische Quantisierung
Kategorie <Mathematik>
Rechter Winkel
Stichprobenumfang
Zahlenbereich
Information
Frequenz
Bildgebendes Verfahren
Computeranimation
Übergang
Parametersystem
Prozess <Physik>
Graph
Hyperbelverfahren
Winkel
Formale Sprache
Sprachsynthese
Frequenz
Punktspektrum
Computeranimation
Energiedichte
Knotenmenge
Einheit <Mathematik>
Rohdaten
Forcing
Rechter Winkel
Wort <Informatik>
Affiner Raum
Instantiierung
Standardabweichung
Soundverarbeitung
Schnelltaste
Adressierung
Zahlenbereich
Term
Computeranimation
Objekt <Kategorie>
Streaming <Kommunikationstechnik>
Metadaten
Echtzeitsystem
Rechter Winkel
Statistische Analyse
Programmbibliothek
Dateiformat
Thread
Reelle Zahl
Bitrate
Streaming <Kommunikationstechnik>
Lesen <Datenverarbeitung>
Streaming <Kommunikationstechnik>
Divergente Reihe
Rahmenproblem
Filetransferprotokoll
Rahmenproblem
Fahne <Mathematik>
Datenstrom
Zählen
Wort <Informatik>
Information
Quick-Sort
Computeranimation
Binärdaten
Unterring
Uniforme Struktur
Zählen
Statistische Analyse
Reelle Zahl
Information
Bitrate
Computeranimation
Videospiel
Nichtlinearer Operator
Unterring
Subtraktion
Relativitätstheorie
Zahlenbereich
Implementierung
Mailing-Liste
Information
Computeranimation
Objekt <Kategorie>
Streaming <Kommunikationstechnik>
Datentyp
Information
p-Block
Inklusion <Mathematik>
Zeichenkette
Nichtlinearer Operator
Torus
Uniforme Struktur
Perfekte Gruppe
Debugging
Computeranimation
Subtraktion
Punkt
Inferenz <Künstliche Intelligenz>
Mathematisierung
Transformation <Mathematik>
Rechnen
Term
Punktspektrum
Computeranimation
Energiedichte
Bildschirmmaske
Garbentheorie
Fourier-Entwicklung
Gerade
Leistung <Physik>
Lineares Funktional
Knotenmenge
Schwellwertverfahren
Zahlenbereich
Geräusch
Computeranimation
Parametersystem
Lineares Funktional
Schwellwertverfahren
Multiplikation
Garbentheorie
Mittelwert
Implementierung
Zahlenbereich
Kartesische Koordinaten
Programmierumgebung
Computeranimation
Zeitmessung
Hyperbelverfahren
Zeitbereich
Mathematisierung
Sprachsynthese
Bitrate
Frequenz
Punktspektrum
Systemaufruf
Computeranimation
Inverser Limes
RFID
Spannweite <Stochastik>
Fourier-Transformation
Domain-Name
Logarithmus
Garbentheorie
Dreiecksfreier Graph
Wort <Informatik>
Information
Soundverarbeitung
Prozess <Physik>
Extrempunkt
Frequenz
Systemaufruf
Computeranimation
Eins
Spannweite <Stochastik>
Domain-Name
Knotenmenge
Spannweite <Stochastik>
Garbentheorie
Gruppe <Mathematik>
Dreiecksfreier Graph
Garbentheorie
Diskrepanz
Leistung <Physik>
Soundverarbeitung
Analog-Digital-Umsetzer
Rechter Winkel
Automatische Indexierung
Projektive Ebene
Data Envelopment Analysis
Extrempunkt
Frequenz
Computeranimation
Gammafunktion
Geschwindigkeit
MIDI <Musikelektronik>
Einfügungsdämpfung
Sprachsynthese
Frequenz
Computeranimation
Leistung <Physik>
Geschwindigkeit
Arithmetisches Mittel
Metropolitan area network
Verbandstheorie
Rechter Winkel
Extrempunkt
Hypermedia
Zahlenbereich
MIDI <Musikelektronik>
Data Envelopment Analysis
Frequenz
Ext-Funktor
Message-Passing
Computeranimation
Metropolitan area network
Menge
Logarithmus
Hypermedia
Mereologie
Zahlenbereich
MIDI <Musikelektronik>
Frequenz
Medianwert
Computeranimation
Leistung <Physik>
Arithmetisches Mittel
Font
Minimalgrad
Hyperlink
Unrundheit
Plug in
Biprodukt
Ein-Ausgabe
Computeranimation
Nichtlinearer Operator
Datenfeld
Programmbibliothek
Zahlenbereich
Gruppoid
Systemaufruf
Computeranimation
Gammafunktion
Ebene
Distributionstheorie
Harmonische Analyse
Subtraktion
Zustandsmaschine
Formale Sprache
Versionsverwaltung
t-Test
Implementierung
Geräusch
Wärmeübergang
Computeranimation
Richtung
Informationsmodellierung
Datenmanagement
Algorithmus
Einheit <Mathematik>
Gruppe <Mathematik>
Maschinelles Sehen
Nichtlineares System
Schreib-Lese-Kopf
Videospiel
Interpretierer
ATM
Parametersystem
Datennetz
Kategorie <Mathematik>
Güte der Anpassung
Physikalisches System
Vektorraum
Elektronische Publikation
Ein-Ausgabe
Mustererkennung
Frequenz
Quick-Sort
Arithmetisches Mittel
Energiedichte
Echtzeitsystem
Verbandstheorie
Festspeicher
Mereologie
Wort <Informatik>
Kantenfärbung
Information
Charakteristisches Polynom
Baum <Mathematik>
Lie-Gruppe

Metadaten

Formale Metadaten

Titel Music transcription with Python
Serientitel EuroPython 2016
Teil 82
Anzahl der Teile 169
Autor Wszeborowska, Anna
Lizenz CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
DOI 10.5446/21107
Herausgeber EuroPython
Erscheinungsjahr 2016
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Anna Wszeborowska - Music transcription with Python Music transcription allows to convert an audio recording to musical notation through mathematical analysis. It is a very complex problem, especially for polyphonic music - currently existing solutions yield results with approx. 70% or less accuracy. In the talk we will focus on transcribing a monophonic audio input and see how we can modify it on the fly. To achieve that, we need to determine pitch and duration of each note, and then use these parameters to create a sequence of MIDI events. MIDI stands for Musical Instrument Digital Interface and it encodes commands used to generate sounds by musical hardware or software. Let's see how to play around with sounds using Python and a handful of its powerful libraries. And let's do it in real-time!

Ähnliche Filme

Loading...