Scraping Art - Mapping the Neuland
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 234 | |
Author | ||
License | CC Attribution - ShareAlike 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/33053 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Texture mappingDisk read-and-write headComputer animationJSONXMLUMLLecture/Conference
00:40
QuicksortProjective planeBitComputer programmingWhiteboardContent (media)Online helpMeeting/Interview
01:16
Water vaporWordTrailLecture/Conference
01:39
Web 2.0Content (media)Multiplication signInstance (computer science)WebsiteMathematicsWeb serviceAlgorithmWater vaporQuicksort1 (number)ResultantMedical imagingSet (mathematics)Mathematical analysisCodeGoodness of fitCASE <Informatik>SoftwareInformationBitWordGraph coloringComputer programmingLecture/Conference
05:52
Different (Kate Ryan album)InternetworkingContent (media)MereologyBitElectronic visual displayBit rateQuicksortDigitizingStress (mechanics)Level (video gaming)SatelliteStreaming mediaBasis <Mathematik>Workstation <Musikinstrument>1 (number)Motion captureSpacetimeComputer cluster2 (number)Medical imagingWebsiteImage resolutionStack (abstract data type)Water vaporCategory of beingInstance (computer science)DialectTesselationNeuroinformatikWordFilm editingPosition operatorMathematicsTable (information)Cartesian coordinate systemAreaRhombusGreatest elementVideo gameMilitary baseOrbitInstallation artComputer-assisted translation
11:46
SoftwareInternetworkingSatelliteOrder (biology)Power (physics)QuicksortFacebookComputing platformLeakInstance (computer science)Web pageLink (knot theory)EmailLevel (video gaming)Context awarenessExpert systemStack (abstract data type)CybersexLecture/Conference
15:15
InternetworkingServer (computing)NumberBit rateContent delivery network2 (number)LeakWebsiteFacebookInternet service providerIntercept theoremBitStructural loadDifferent (Kate Ryan album)Presentation of a groupDigital photographyBlock (periodic table)Information securityProgramming languageMachine visionContent (media)VideoconferencingOrder (biology)Right angleLink (knot theory)Computer fileCartesian coordinate systemInteractive televisionWordSpacetimeLecture/Conference
19:53
Content (media)Web 2.0Device driverPoint (geometry)BitProgrammer (hardware)Web browserLibrary (computing)QuicksortMereologyHTTP cookieOrder (biology)WebsiteComputer programmingOpen sourceCodeLine (geometry)Medical imagingSatelliteProxy serverComputer fileHacker (term)TrailHoaxInternetworkingGraphical user interface
22:39
Binary codeQuicksortVideoconferencingLibrary (computing)Frame problem2 (number)Streaming mediaTape driveLecture/Conference
23:02
Open setMereologyGame theoryFreewareVideoconferencingContent (media)Streaming mediaFrame problemCodeLibrary (computing)BitProjective planeFilter <Stochastik>Execution unitQuicksortLecture/Conference
24:28
PlanningContent (media)Artificial neural networkMedical imagingInternetworkingQuicksortBitProjective planeLecture/Conference
25:07
Multiplication signComputer animation
Transcript: English(auto-generated)
00:16
Hi, everybody.
00:21
Yeah, so we still have to set it up. But I think I might just start already while it's still being set up in the background. So what we're going to do today is, ah, now it's working. So we're going to basically talk about this installation we've been working on for the past four months.
00:41
And it started off as a project for me at uni. And then he sort of jumped on board of it, helped me a little bit with the programming. And basically, what we would like to do today is to introduce it, talk a bit about scraping, what scraping actually is, and talk a bit about the content, what we can actually see inside here.
01:01
So to start, first of all, could you maybe explain to us what scraping is in the first place? Because I think many people in the audience might actually not know what scraping is. All right, so first thing first, I have a question for all of you. How many of you know what scraping is? Just raise your hand up. All right, all the people in the back, but nobody in the front.
01:24
So because we're going to be throwing this world around a lot today, we want to actually explain it so everybody can sort of keep track of what we're going to be saying. When you think of the word scraping, what would be the first hand move you would make? It would be probably like this, like scraping something off a ground,
01:43
scraping something off a desk, scraping something just to like, I don't know, scraping a stain off. And that's basically quite similar to what web scraping is, except that instead of scraping a desk or scraping, I don't know, a stain from a t-shirt, you actually scrape content from websites and web services.
02:04
And the way you usually do this is you use something called a scraper, and a scraper is just a piece of code or software that goes to these websites or goes to these places online, and it just takes the data you need. Now obviously to like get data from websites online
02:23
and to get data, if you think about it, you can already think probably of a lot of useful cases. And that's why it's used quite often in business. I have one good example. Let's say, I mean the weather in Berlin here
02:41
is not exactly the best, it can be quite cloudy. And then you wanna go somewhere where it's a bit warm, it's a bit hot, so you decide to go to Barcelona. And when you're booking the tickets, the best thing or like the most efficient thing to do would be to compare the prices of those tickets. So you go to a website like Skyscanner, and then you write the information about the trip
03:04
you wanna make and when, around what time you wanna go there. And this website shows you basically like what are the best prices, what are the best time for you, and the way it does this is, by going to the websites of those airlines,
03:21
if they don't provide like the content already, by going to those prices and telling them when you wanna fly and taking the prices from those websites and then comparing them and then giving you the final result. And that's how usually like, I'm not sure that Skyscanner does that, but like that would be one example
03:41
of how you would use this in business. So yeah, now we heard how scraping is used in business, but also it's been used in the arts before, so we're not the first ones who are inventing this new methodology for the arts. So for instance, there's this new sort of discipline called the digital humanities, where artists are scraping big data sets
04:02
and then using them for artistic analysis. Most prominently, that's probably Lev Manovich, and he's a Russian artist and researcher working in New York, and his work has been focusing on research doing on Instagram or on Flickr, and basically what he's doing
04:21
is he downloads like big data sets of images from those websites and then analyzes them according to sort of the colors you can see in the image. So if you take like a picture with blue sky in the background or with night sky in the background, and then he orders sort of all those images accordingly to the color inside of it, and then takes those images,
04:42
puts them together in a collage, and then you can see it in a museum or in an exhibition. But what this kind of work so far with scraping in the arts has been doing is to just download one big data set and that's it. So what we actually find way more interesting about scraping is that it can be used
05:01
as a continuous process so that you not download something once and that's it, but you actually go to a website or like rather actually you write a program which goes to a website and then gets off the content from there continuously all the time. And the reason why we decided to do that or why we thought that would be interesting
05:21
is because we're thinking that nowadays we're like living in a time where things really change quite fast, where lots of things are happening around us, and also where automation is having quite a huge impact on our lives. And if you look at, for instance, trading algorithms,
05:41
which then change food prices, for instance, it is quite weird that those concepts of automation still haven't reached the arts yet, and therefore we thought it would be interesting and worthwhile to explore those methods actually and use them as an artistic method.
06:00
And we also think that nowadays, since everything is so fast, like if you consider, for instance, like you do an oil painting and then it takes you three weeks for it to dry and then you can exhibit it. But in the meantime, the world has already changed so much that you can actually, well, that it's not contemporary anymore. And obviously we're not saying
06:20
that everyone should do scraping now, but that scraping is a way to sort of capture what is going on right now. And this is basically the sort of nowness which we wanted to explore. And so scraping is like the method also which we use for this work, which we can see here in the background.
06:41
So today we thought instead of like bringing a slideshow and like do the usual talk stuff, we figured we'd bring like the thing that we wanted to show. And I guess like hopefully if you've been paying attention, you would have noticed that it's a bit like, I don't know, I wouldn't have known what's going on if we didn't do it ourselves.
07:03
So let me ask you something, Melinda. What is it actually that we're doing here or what are all these pictures that we're showing? Yeah, so basically, as I already said, our method is scraping and here the content of this sort of digital collage is actually internet infrastructures. And what we wanted to do is using scraping as a method
07:23
to sort of display and to map internet infrastructure and to also like stress the character of their liveness because those internet infrastructures bring the internet into being and it is because of them that we can actually also use scraping again as a method. Alrighty, so air, water, stack.
07:42
We have three categories and we decided to divide them in a way where we can sort of represent all of them in the work. And I'm gonna start now by explaining where the air is. Air starts in the background. And it starts in the background, even though this is just a picture of planet Earth,
08:02
it's actually taken by NASA satellites. And this is the latest picture that you would find if you would try to, if you would access the satellites or like access what NASA offers and get like the latest picture of Earth. And this picture, we have to actually build ourselves because what NASA offers is just a tile
08:23
or it offers different tiles from different positions on Earth. And that's why you would see these cuts here because these are the parts where the satellites can reach and these are on the bottom as well. And what we do basically is because the satellites have to go a whole trip around Earth,
08:42
we can't like scavenge this every second. We can actually only do this once a day and this is the picture from today. And this one we update like on a daily basis so that we have the full Earth represented here. The second thing is the satellites themselves. And I think they're not on our site today,
09:02
the satellites, because I've seen before that the images are quite glitchy of them today. Yeah, I think that's sort of because it was, we actually made it for a different resolution and now when we change the resolution, this is actually the picture of the satellite supposedly, just imagine satellite on Earth here
09:21
and a couple of satellites around it. And this picture should update, like it updates on a five-second basis and it just shows the satellites, the same satellites actually that take the pictures and all the other satellites are on orbit, at least the official ones which we know of because obviously there's probably some
09:41
that we don't know that they exist. And we also update this on a five-second basis and then you can see where the satellites exactly are. The third thing in Earth is this live stream here and this is actually live right now. What we're seeing right now is what the crew of the ISS station are seeing from outer space.
10:02
Now that we're done with Earth, we have another thing which is water. And for water here, we have the deep-sea cables. And these deep-sea cables are the cables that connect our internet today and connect our world today. And what we did here, we created three different lanes basically.
10:22
One to represent the north, one to represent the center, and one to represent the south. In the north, we have North America and Europe and some parts of Asia. In the middle, we have some parts of Asia, we have Africa and some parts of South America. And in the bottom, we have Australia, we have New Zealand and all basically
10:42
the islands around it. We have some parts of South America and a couple of parts and some little parts from Africa. And if you've been paying attention, you would have noticed that the speed of the cables changes. And that's because we have actually 400 cables
11:00
here represented. And each cable represents a certain region. So this one from north, it's from the north, so it's either North America or Europe and that's why it's one of the fastest because what we did was we decided to determine the speed based on internet traffic rates in those regions. And we're gonna explain a bit later how we actually get these internet traffic rates.
11:23
But if you keep noticing now and you notice that the slower you go, the slower it is. And that's because the internet in those regions is obviously not that fast. Like in Africa, for instance. And since the installation is called Air Water Stack, I would like to speak a little bit
11:40
about the idea of the stack now. And maybe some of you know the word stack from computation. But it's also like actually a geographical concept which was introduced by a philosopher called Benjamin Breton. He's written a full book on this called The Stack as Well. It's 500 pages. And there basically he argues that the stack
12:00
is this new infrastructure. So that it's not only the internet itself which is like creating all those links between us, but that it's also then the satellites which are creating those linkages. But also the end users themselves using their phones. But it's also the wifi networks and the platforms like Facebook, vContacta,
12:21
and all those things together. This new mega infrastructure is creating this thing which he termed the stack. And he's sort of proposing the stack as a new geographical layer. Which is sort of spanning around the earth as well. And what we were trying to do here in this installation
12:42
was basically to map out the stack. Because it actually is sort of a new territory. And this is also what he argues, but also what has been said in other places. So for instance in 2013 when the Snowden leaks got out and it got out that like the NSA,
13:03
the American National Security Agency had been spying on many of, or had been intercepting German phones, Angela Merkel first of all didn't say anything. And then later on it got out that actually the NSA had been spying on her phone as well. And then she was like, well but the internet is all like really new to us.
13:21
And she said it's an oil land to us. So like a sort of a virgin soil or a new territory. And therefore she declared the internet which is now celebrating its 26th birthday as something which is completely new to her. And therefore we thought, okay it would be actually nice to sort of map this new territory
13:40
and like to look at this territory and in order to be explored to also like visualize it. Because I think the problem with this deep sea cables and also with the satellites is often that like a really common example is if you ask people how the mail is working, so how you send a letter, everyone can tell you how it works.
14:00
But if you ask someone how actually email is working most people can't answer that question. And I think most people are also not aware that there are those deep sea cables. I think still most people would hopefully know there are satellites flying around us. But the deep sea cables are rather like something which are only getting mentioned in like expert context
14:23
or then there are like great artists like Trevor Paglen who's like organizing actually diving trips to visit those deep sea cables which I think is really cool. And yeah so our attempt here was to like visualize the stack and visualize this big infrastructure by trying to map it out.
14:41
And obviously we're not the NSA so we don't have all the data which we would need in order to map out this whole infrastructure and to accommodate for all the user data which we would need. But it's more like this idea of showing how this new territory could potentially look like. And also to show sort of how the power struggles
15:04
which unfold in this territory look like or in order to think about it. Because even though this is quite a peaceful installation but if you think of the internet as a place where like cyber war is taking place where censorship is taking place where people are getting spied on where surveillance is like a common practice
15:23
then actually like thinking about it as this new territory or this new world we're actually inhabiting it's quite a dystopian vision to stick also with what the presenter before has been talking about. And yeah so therefore we thought it would be interesting for us as artists to explore it as well
15:42
using methods which are normally actually used in businesses and using those business methods of scraping to actually look more into how we can map out those territories. And now we would also like to speak where we actually got the numbers from which we're using for the scraper
16:00
so maybe you can tell us how we did that. Thank you Malia. So I said earlier that we were getting these internet traffic rates and if I was sitting in the audience I'd think what is he talking about? What kind of internet traffic rates? So this installation here is actually quite interactive.
16:20
Millions of users every day literally influence how all these pictures move. And the way that happens is like I said we said we're taking these internet traffic rates and the way we do that we're taking them from a company called Akamai. And no worries if you don't know what this company is because we actually didn't know that it existed
16:40
a couple of months ago. It's something that's called a content delivery network or a content delivery network provider. And what content delivery networks do is basically they solve a very important problem. If somebody was accessing a website that is based in a different country
17:02
or a different continent they would have to go through all those cables all the way to that different continent. So if I was going to Alibaba I would need to go all the way to China to just order something. And if that was the way that the internet worked then it would take us always 15 seconds
17:21
or 20 seconds to even have a website load. But what happened is that these content delivery networks came. And what they do is they provide customers or websites with servers all around the world in different continents and in different countries
17:40
so that when I go to a website that's based in a different country I don't access their content in that country. I access the content they have cached or saved on the servers here in Germany for example. And Akamai is one of the biggest content delivery network providers. Some of their biggest customers are Airbnb, Facebook, Apple.
18:04
And what Akamai does, it offers these numbers of it offers the numbers of the internet traffic rates that go through their servers. And it's reported that on a given day 15% to 30% of the internet traffic in the whole world
18:21
goes through Akamai servers. So we figured if that was to be true then I think this installation is not just user interactive, it is being dedicated by the millions of internet users around the world. Now another thing about Akamai is that in 2013 in the Snowden leaks
18:42
it showed up that Akamai was a target of a joint work by the FBI and the NSA to steal or to intercept Facebook photos that were hosted in some of Akamai's CDNs. But Akamai said that it's not their fault and that anybody who could have used their CDNs
19:02
would have to be responsible for the security they provide. Another thing about Akamai is that a while ago they used to block Tor users from accessing some of their servers but they don't do that anymore at least not from our trials. Yes, and now we would like to also speak a little bit
19:22
at the end of our presentation about actually the methods we've used. So it's gonna get a bit more technical and basically just speak about how we actually made this because I think to most people it probably looks a bit more like a video installation or something like that but it's actually something which we coded
19:41
and it's all done in Python which is a programming language and so basically what we've done is in order to get the data in the first place there are some really simple methods to do it and they work with some websites and some other methods which are a bit more complex which we also had to use
20:00
in order to get the more trickier parts into our installation as well. So all the websites or all the methods we used are like open source libraries and basically if you know some programming as well you can use them as well and experiment around and try some scraping for yourself and so one basic method is called requests
20:22
and basically what you do is you just like write four lines of code where it's gonna say okay, go to this and this website, get the HTML file of the website and then analyze it, take the content you wanna have and then download it. But the problem is today that the internet is way more dynamic because not everything is written in HTML
20:41
but also in JavaScript and that's why we had to do something else as well. So what we then had to do was basically fake a user. Pretend we're a browser, a user going to a website, clicking on things and then getting the JavaScript rendered content and to do that you use something called Selenium
21:01
and Selenium is just a web driver. So a web driver opens a browser and you pick which one, so if you like Chrome you can still do this with Chrome, if you like Firefox you can do this with Firefox and you just program it to tell it that open Firefox, pretend that you're the user, go to this website, go to this tab
21:22
and get me that data and that's basically how Selenium works and it's pretty simple. Sometimes it can be a bit hard to get it like to work properly because a lot of the websites what they do is they store your cookie data and then when you enter as a new user they either require you to do something or they suspect that you're a bot
21:42
and so they try sometimes to prevent this from happening. But you can always find sort of a bypass to go around that and pretend you're just a normal user who happens to be very fast at clicking tabs. Yeah, one of the other problems we ran into is that well here are our lovely satellites
22:01
which cannot be displayed here but one of the problems we had with them was that since they're constantly updating as well so they're not normal images but they're getting like generated by JavaScript which is running on the respective websites so therefore we had to sort of hack our way around in order to get those images actually
22:21
and then my idea was to just go to those websites and constantly take screenshots and this is basically what it's doing now in order to generate some of the content here. Now Malena didn't stop at taking screenshots, she wanted to also like taking screenshots of pictures. At some point we sort of experimented
22:41
with taking screenshots of videos but if we continue with that you wouldn't have seen this sort of video where you see the fishes move. It took like five seconds for each frame to render so we had to do something else and that was using a Python library called Live Streamer and what it does it just gives you
23:00
the live stream in binary data or it gives you like the live stream itself without all the other content and it works with YouTube, it works with Ustream, you just give it a link, it gives you just the content you need and once you have that content which is the video you're still gonna need to extract the separate frame from it and to do that you need or what we used
23:21
was something else called OpenCV to just extract the separate frames and then display them and that was basically like all we needed to get the data. The second part was to actually display it because then we could have had like so much data, we could have had so many video streams but then we still wouldn't be able to display it
23:42
and to display it we used something called OpenGL and I know there's been a lot of open in this talk and a lot of OpenGL, OpenCV but that's because a lot of this stuff is literally available online for free for people to experiment with it, to make it better when they find problems and OpenGL is a famous 3D rendering library,
24:02
you can use it for anything that is related to graphics, you can use it, Instagram uses it for its filters, Photoshop uses it for a lot of features, a lot of games are actually programmed in OpenGL and it's just a great library if you wanna do anything graphical but you still wanna code it yourself.
24:22
Yeah and what are our ideas for the future if we continue with this project is to maybe experiment a little bit with AI and neural networks because as we already said that some of the content like the tempo with which it is moving is already like sort of determined by internet users
24:44
which is then measured by this company called Akamai but it would be also interesting if the images themselves would be also curated by something whether it's an AI or maybe also using user data to then determine the content and yeah so we're planning to maybe go a bit more into this with our project.
25:02
Thank you very much. Thank you.