Q&A panel for data science newbies
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 141 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 4.0 International: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/68720 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 202353 / 141
8
17
22
26
27
31
42
48
52
55
56
59
64
66
67
72
73
77
79
83
86
87
95
99
103
105
113
114
115
118
119
123
129
131
135
139
140
141
00:00
Data analysisBiostatisticsNumbering schemeStatisticsProjective planeOpen setSet (mathematics)MeasurementQuicksortSoftware developerEndliche ModelltheorieEntropie <Informationstheorie>Process (computing)Web browserOnline helpRight angleScripting languageSign (mathematics)Computer scienceBitPoint (geometry)MathematicsMultiplication signMessage passingComputer fileVirtual machineCodeType theoryPattern recognitionSpeech synthesisView (database)WordData centerInformation engineeringHypothesisPresentation of a groupSuite (music)Machine learningArithmetic meanRoundness (object)Formal grammarField (computer science)Different (Kate Ryan album)Product (business)Electronic data processingGodContent (media)Electric generatorVideoconferencingDatabaseSeries (mathematics)Moment (mathematics)Formal languagePhysicalismLevel (video gaming)BuildingVisualization (computer graphics)File formatRoutingMereologyOpen sourceStandard deviationCodeExpected valueGoodness of fitMaxima and minimaTelecommunicationCore dumpObservational studyAnalytic setTransformation (genetics)Library (computing)Software engineeringWebsiteDegree (graph theory)SoftwareSlide ruleFeedbackGraph (mathematics)Mathematical analysisSoftware bugBlogFocus (optics)Information technology consultingException handlingPlotterDomain nameLoop (music)Descriptive statisticsNumberAbsolute valueBasis <Mathematik>Term (mathematics)Statement (computer science)LengthSelf-organizationLaptopAlgorithmAngleComputational linguisticsPreconditionerMathematicianCovering spacePredictabilityWeb-DesignerGraph (mathematics)Axiom of choiceStaff (military)CASE <Informatik>Web pageJava appletContext awarenessComputer programmingNeuroinformatikSinc function1 (number)Front and back endsHydraulic jumpDebuggerDependent and independent variablesComputer animationLecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
00:04
Good, welcome everyone. I'm very glad to see all of you here. Valerio, I am now working for Anaconda as a dev rel, but before that I was actually working in academia, so I was a researcher doing
00:21
my research in machine learning, specifically applied lately to healthcare domain, so working a lot with really different types of data, but most of all working with different types of colleagues, so I'd like to talk about that as well. And so my path to data science is
00:42
one of the most boring ones. I have a background in computer science, I have a PhD in machine learning, so this is how it gets boring. And then I'm working in data science since then, and it's been a very interesting path, especially I've learned a lot through this, and I'm
01:01
very glad I had the opportunity to share a little bit of this, and also the most important part is not about the technical things you learn, but mostly what you learn from other people working with you, and this is one of the things I personally like about data science. Right, passing it on to Chuc now.
01:21
Hello, hey, so yeah, so my background is of course I'm not as smart, I didn't get a PhD, but I study, my study is physics, so I was, I like math and you know, complex things, so I studied math and physics but then decided that I
01:42
won't get a PhD because there's no job for me in physics back when, where I came from, I grew up in Hong Kong, so there's no physics job there, like all my friends, they all work for the government, which I don't want to work for the government, so I decided to go out and try different things, I've been a mascot in a theme park.
02:01
Anyway, joke aside, I moved to the UK and then wanted to start something new, and my friend suggested me to, oh you have, you can do math, you can code, how about, you know, but my coding skill was bare minimum, right, so then my friend kind of encouraged me to start studying data science, that kind of thing, try to get
02:25
into the field, so I was very lucky, I got a job first as a data analyst, then I was promoted to a PhD, I was a data scientist, I worked as a data scientist for a few years, and then decided that I love the community, so I switched to more like community role, I used to work with Valerio, so yeah, and now I'm also kind of in a community role right now, so.
02:47
All right, so I've had like a bit of a weirdish sort of part, but more or less traditional, so like my name is VB, I work as a machine learning developer advocate engineer, that's the
03:02
official role, it's more boring than it sounds, and I work at Hugging Face, and so I studied computer science, and then I did some web development for a bit, you know, Flask used to be cool back then, there was no fast API, and
03:25
then after that I did some consulting, so I did like business consulting, so I literally made PowerPoint presentations for a living for three years, and then after that I decided to go back to academia, and then I, you know, did my masters in computational linguistics, which is a worldwide
03:47
way of saying that I studied English but with a bit more maths, and so I finished that actually last Friday, and so I submitted my pieces, and I barely scraped through, so I did that, and now I specialize in sort of speech, and like text to speech, and like
04:10
audio sort of speech recognition, and so on, so it's been like kind of like an up and down journey, and like I've taken a bunch of different roles, and you know, I did well in some, and horribly in some, but you know, it's been all right, so yeah.
04:25
And hi, I'm Jodie, I work as a data science developer advocate at JetBrains, so I probably have the weirdest background maybe of all three, I'm an academic as well, ex-academic, but my background was clinical psychology, so I
04:42
was a psychologist, I was licensed to practice, I saw lots of patients with anxiety disorders and things like that, but actually what I fell most in love with during psychology, other than the people, was statistics, I loved it so much, psychometrics, measurement, things like that, so I did a post-doc in public health and biostatistics, because I
05:05
just wanted to do so much more stats, and then I realized I hated academia, and I didn't know what to do, and my now husband suggested I go into data science, but the thing was, I didn't know anything about programming, I'd done like an introduction to Python course, I'd used some R, the first time I used R it took me
05:25
two hours to read in a file, because the slashes were the wrong way, and I started crying at one point, and I was like I hate this so much, and I was scared of the command line, I'm like I can't do this, luckily I had my husband to mentor me, and we'll maybe talk more about mentoring if anyone wants to know about that, but yeah, I just persisted, my first role was much more analytics as well, because that was what I knew,
05:45
and just over time I've kind of explored how much I want to go into the engineering side, I've realized I don't like it that much, I'm a scientist at heart, and I've stuck more with the research side of things, and prototyping, and communication, and teaching, so I want to say if you feel like you can't do the engineering side, think of me crying about the slashes going the wrong way.
06:09
So over to you guys now, who wants to go first? Does anyone have a question? So jump up to
06:23
the mic there, and if you feel more comfortable you can grab the mic and sit down, this is pretty chill. I don't really have a huge, it's a fairly open question, I guess that's what this is all about. I'm sitting in the lunchroom talking to colleagues, and some of them are sort of half developers, and half data science, and then something in between.
06:47
So we tried to define what a data scientist is, and of course you already introduced that, and I see several different backgrounds here, I guess from what you're saying, I have a data science background,
07:01
and especially in research, but I am a software developer. Try to make that into a question I guess somehow. This is such a good starting question though. Does anyone want to start? So when it comes to data science in general, every firm on this planet has a
07:23
different sort of expectation from a data scientist, and that's the general standard in the field. The same goes for machine learning, the same goes for, actually if you think about it from a software development standpoint as well, when you look at a job description of a software developer, it can range from someone who is dealing with the entire front
07:47
end stack versus someone who is dealing with a back end stack, someone who is dealing with data engineering pipelines and so on. So in my opinion, the best way to explain data science is any and everyone who works with data.
08:02
And it could be something as simple as just analyzing a bunch of text fights, or do something more involved like connecting to big databases and then building dashboards on top of it, to building some predictive modeling algorithms.
08:27
So that's how I understand the field. So, I think VB has a very good point here, but for me, I've heard my friend who
08:43
is like very, I respect them a lot, they are very experienced data scientists, and he said like, you know, he would say like, oh, I used to be called statistician, now data scientist sounds cooler. I would say that the field is very broad, like VB said, but if you are using data
09:02
to tell a story scientifically, which means that you use scientific method, you're not trying to mislead people, you're using data as an evidence to tell a story, that is data scientist, that's what you're doing. The field is very broad, of course, like every company they expect, the data science team is doing different things, there may be some new job titles, like I think before, you know, like ten years
09:23
ago, maybe nobody talked about data science, and then now, like every company will have a data science team. And then now, there's like even roles that I've never seen before, there's like machine learning engineer, which like, you know, when I was just started as data scientist, I never imagined, what is a machine learning engineer, right?
09:40
There's like, you know, a job title and a job roles keep changing, but as long as you use data to tell a story scientifically, even if it's something, you know, with minimal coding, or something like machine learning, you're doing like last language model, like today's keynote, you know, later today, we'll talk
10:01
about last language model, all those are all data science, because you're using data and you're telling a story. So, yeah. Well, I completely agree with everything you said, taught you on the same page in this, and it's also my experience. And in fact, I feel like the only thing I should add on this point is,
10:23
well, when you're working with data science, first off, you don't have to do machine learning. So machine learning is just one bit of it. In reality is always the last thing you do, whenever you're doing whatever data processing or data science, how you want to call it.
10:42
And I guess that the most important thing to highlight to me, and I totally sympathize with everything they said, is so the takeaway message from data science is such a broad term, such an umbrella term, that can like cover every niche detailed role we want to come up with, is that there's no precondition to get started with data science.
11:09
You can tackle with the data science domain from the angle that suits you best. Meaning, if you have a background in computer science, well, data science
11:23
has a lot about programming things, and we've been talking about ML engineer. This is just the sublimination of, is that a word? Oh, no. It's, my God. It's like it's whatever is ultimate into, yes, I have something working, I need to make it in production.
11:43
This is what we're doing. And this is really a computer science job. This is what it is. But somehow you have to come up with a model, you have to do data analysis, you have to come up with a method, you have to prepare the data. And there's lots of experience and expertise in play in doing this.
12:04
And so you can really contribute to this project. It's not just like a label, and I am a data scientist, I can only do this and that, but I cannot do that. I mean, you can contribute to the whole scheme of a project with your expertise no matter what.
12:21
That's my takeaway message. And just a brief thing to add. I think what really overwhelms people when they're first starting is because of the broadness, or the breadth. I swear I speak English. It can be really overwhelming and you feel like you need to know everything. So you need to know math really deeply, you need to know how to program like an engineer, you need to know this and that and that.
12:45
My code is embarrassing, but I'm not embarrassed anymore. I used to be. Because that's a specialization, I don't want to learn it. Over time, you will gain the knowledge you need. There's sort of a core set of things that I think you do need to get started.
13:00
And again, you can ask us about that if you're interested. But what I would say is when you're starting, think about what interests you and start projects in that. And over time it will come. I don't know everything. I've been doing this for seven years, I don't know everything.
13:33
The tools and the skills are secondary to the problem solving, but I also think that's true of software engineering.
13:41
You don't write software, you create a product when you are doing software engineering. Maybe you can disagree. But with data science, you're doing a scientific method, you're solving a problem. Can I add one other thing? I think first I would like to clarify that data scientists normally don't work alone.
14:06
There's a lot of people involved into the project. You never work alone. It means that you're not responsible for everything. You're responsible for the thing you're good at. And there's no shame in not being able to do the rest of it, because it's not your expertise.
14:21
And it's not possible. Sorry, I don't think it's possible either, except in a teeny tiny startup where you have three people. I totally agree. And actually the thing I wanted to say is these breadth of the field and this diversification of expertise and experience in the team can be overwhelming in the beginning for beginners.
14:43
But at the end, I promise you, it's a feature, it's not a bug. Meaning that first you don't have to worry about everything. And talking about the code, it is very likely that if you're not coming from a computer science background, you don't care about the code.
15:01
Because it's something you're not so opinionated about as a computer scientist. So it's like, yes, it works, fine. For me, it's fine. And it's very fair. And in fact, if you want to make it working in production or whatever, you want to prepare the pipeline to be maintained
15:20
and going in the experimental is numerically sound and whatever, it's not your job to do that. Your job is may probably come up with the analysis. So once it works, someone else is going to take over. Yeah, I will add that. I do agree at some point that you should focus, if you're a data scientist,
15:44
you should focus on how to use the data to solve the problem, like Jody said. But I do suggest that if you want to heighten your experience, learn some new skills. Like learning what tool would help you solve the problem, maybe learn some.
16:02
So learn programming as it's like a tool to help you to do your job, not as like, you know, you don't have, like, or, you know, sometimes like, oh, there's always a debate of like, oh, data scientists, they don't write like, quotation, good code. I think it's not a very good statement, but you can try to understand how to write some code that would work better with your colleagues.
16:27
For example, if you have data engineer, what kind of, what code will make the job easier? Like, you both work together so you can learn in the job of like how to make the whole team be successful.
16:44
I can tell my story of me working with mathematicians, for example, in this team data scientist. And I was the one thinking about the code, of course, as you might imagine. So I was like, what this code is about? What are you doing here? He says, you can do this better, you can do this more efficiently. And he was like, yeah, right, okay, it works.
17:04
And jokes apart, that was a learning process for both of us, meaning that whenever I had some doubt about modeling or like brainstorming some ideas or things we might want to do or something, I was working with him and with other people in the team.
17:23
But in return, when he had doubt about should I do this, should I do that through my code and blah, blah, blah, he came to me and all the colleagues and like sharing ideas. So it's just like a learning experience. And that's why I personally love data science. It's a never-ending learning process. Even when you're working on something new, it's always a new experience.
17:45
One small thing. So just like remind me of like how exactly did I start into data analysis. I would just like go on like any open data set that was available that I could find an API for or like just a CSV.
18:02
And there was like this open sort of data sort of portal within India that anyone has access to. So what I would do is I would just like download a CSV and import pandas. And then like start making like plots. And what I used to do is there's this really nice subreddit.
18:20
It's called Data is Beautiful. So I used to go there and I used to just like create a tagline and upload a graph on it. Right. And my goal for a week would be that I would try and depict one problem with this graph. Thank you, thank you, thank you. And you know, as soon as and so week over week, I would like I would get quite a lot of criticism as well.
18:45
Like people would be like, what are you showing? Like you can't like plot absolute numbers. I mean. And then like so like week over week, I would learn to like one thing that I would really, really strongly encourage is to like not get into the perfectionism loop as you know. What Valeriya was saying is to just like just really just like try and get into the
19:03
habit of like just pushing out one thing one week like in a week in like two weeks. Just like push out something, get some feedback, be it critical and like try and do something different the next week and so on. And that's like a good way to sort of. Yeah. You know. Yeah. Get in. Yes.
19:20
There's just one very pressing thing I want to say in the interest of time. And this is something we all discussed before. And by the way, you just remember that I should add a slide in my talk with a joke about graph and things, because I'm not talking about the graphs you're going to. Anyway. Let's put this out crystal clear. You don't need a Ph.D. to get to do this science.
19:42
So I have a Ph.D., but it's I'm not recommending you to do that. Do that only if you really want to. She might. She might agree with this. But let's talk about this now and forever. You don't need Ph.D. to do machine learning nowadays.
20:00
It's probably never been the case. So if you don't have if you don't feel entitled enough, that's to say politely, that's something not very true. It could be something. I had different words in mind. Yes. Yes. Okay. I think we can go for another question.
20:21
Yeah. I think we can go for another question. Sorry, we went for so long. Would you mind coming up to the mic? Oh, sorry. And then, yeah. And we'll just maybe try and keep it a little tighter. Yeah, because I think we've got around five, ten minutes. So, yeah. Sorry. Hi. Nice to meet you. Just a quick question. You mentioned about the breadth of different facets of data science.
20:42
Just what do you guys do on a day-to-day basis? What are you working on right now? I'd like to see if you guys are all working on different things. Let's do a lightning round. Right now, my job is a little weird because I'm a developer advocate. But at the moment, I'm working on a series of videos on how to use database tooling better in PyCharm.
21:03
Same for me, actually. I'm a developer advocate as well now. So, it's content generation. And what I'm actually doing is like working a lot in this new PyScript project. So, trying to put data science in the browser. That's what I'm doing. I think I've made the career switch. So, now I'm working with the open source community.
21:23
So, I would say that I'm an open source advocate. Right. I help sort of make complex machine learning models specifically for audio accessible for developers. Hugging face. On hugging face. So, yes.
21:40
That's why it's not working. So, essentially like I help like make bring like models for speech recognition and text-to-speech within transformers, which is a library within hugging face. So, that.
22:03
So, hi. I think stepping back a bit, I think you've covered it anyway. I'm going into my final year of doing a math degree. I've got basic coding language. I can Python. I can do a bit of that. And I am potentially looking into going into data science or whatever you want to call it.
22:24
What do you think the best way is going in? Because you mentioned go to data analysis and then going into data science. What path do you think from someone who's quite got basic levels of coding is the best kind of route to go and is willing to learn? Can I maybe just cue it in?
22:40
Okay. So, I would say that the best path would be to just like start with analyzing any sort of data set that you get. So, like get a feel for data first. And when I say like get a feel for data, I mean like just try and understand the nuances that exist when you deal with data of different kinds.
23:05
So, data is just not like a tidy CSV or like a tidy Excel sheet right around. It could be in like different formats and so on. So, just like try and get to sort of look at the breadth of available things. Like once you get a feel for it, you know, then you can sort of build your way through.
23:23
So, like so you start with understanding the data, then you start with like building certain visualizations on top of it. Then like you, once you understand data, then you can like start looking into like predictive stuff and so on. But I would say like just take it step by step and start with the data and then go all the way from visualizing it to like predictive side and to, you know, modeling side and so on.
23:44
Yeah, again, quite briefly, perfect point. I actually started a blog when I was first. My posts are so embarrassing, but they're still up there.
24:00
I still have the blog. And then my first role was data analytics because it was a lot easier to get into. So, you don't need to limit yourself to that. But maybe a role where you're going to be doing less machine learning even and more focus on maybe analysis and stuff like that. And then on the job, you can learn the engineering stuff.
24:22
There's always chances to have people look over your code and see what other people are doing. Yeah. There's only very one tiny thing I want to add, and thank you very much for the question, is about the technology, the tooling you choose. Choose whatever you like. Yes.
24:40
There's no right or wrong choice. And I tell you my story. I was the only one in my lab as well. To put everything in context was my Ph.D. time. So, I was sort of free to choose whatever I liked. But it doesn't really mean that you have to choose whatever you also use at work. You can choose your own path to learn independently.
25:01
But what I'm trying to say is I was the only one using Python back then. It was like no one was using Python. We have to use Java. And I said it. It's Java. So, yes. That was my essentially reaction all the time. I didn't want to because I was feeling more comfortable with Python.
25:22
It's also a case of not just the tool, it's also the community and also the support you have. Because if you're using a very tiny little niche language, yes, it's fine. It could be fun for you, but it's limited. So, it reflects to the next steps.
25:43
But nonetheless, to get started, start with whatever you like. And this is a really broad recommendation for tools, to language, to technology, even the laptop. I have a bonus point for that.
26:01
It's to find something fun to do. Find something fun to do. You can, you know, either like Jody said, start a blog. If you found something interesting, for example, today you hear something interesting, you want to dive deeper into it. Study it and write a blog about it. Or if you prefer working with people, volunteer. I know there's like an organization called DataKind. They are doing a lot of like investigative data, investigative things.
26:23
So, by learning, like spending a day hacking with people, doing some like project, it's good for the society. But at the same time, you learn from each other, learn from other people. So, go to meet people, go to meet up and go to the community and you'll find lots of fun stuff to do. And for me, I think that's the best way to learn.
26:42
Yeah. Just quickly, go to Reddit. Data is beautiful. I cannot recommend it more. I didn't get made from that. Oh, you don't? Yeah. All right. Okay. I was thinking of. Well, I totally agree with what you've said. And actually, I'm more like more strict to that.
27:02
It's not one way, it's the only way to learn. Because you can take a book, learn everything about it. You don't understand anything. You're just like, yes, you can understand all the time. But then it's like, it's very difficult to put in practice. Whereas if you have really a project to work on, however complicated you think it is, it's always the right one.
27:24
How are we doing for time? Do we have time? All right. So, we got through three questions.
27:40
I would say we can stay in here if you want to stay for another question. I think we can do maybe one more. You may need to run, Chook. Yeah, yeah. I think we can. So, maybe let's end the formal session.
28:02
Let's stick around here. You guys can go out and get your coffee. And then we'll sort of slowly make our way out of here as we get kicked out. So, thank you.