We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Empowering social scientists with web mining tools

00:00

Formal Metadata

Title
Empowering social scientists with web mining tools
Subtitle
Why and how to enable researchers to perform complex web mining tasks
Title of Series
Number of Parts
490
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Web mining, as represented mostly by the scraping & crawling practices, is not a straightforward task and requires a variety of skills related to web technologies. However, web mining can be incredibly useful to social sciences since it enables researchers to tap into a formidable source of information about society. But researchers may not have the possibility to invest copious amount of times into learning web technologies in and out. They usually rely on engineers to collect data from the web. The object of this talk is to explain how Sciences Po's médialab designed & developed tools to empower researchers and enable them to perform web mining tasks to answer their research questions. Here is an example of issues we will tackle during this talk: How a social sciences laboratory life can be a very fruitful context for tool R&D regarding webmining How to create performant & effective webmining tools that anyone can use (multithreading, parallelism, JS execution, complex spiders etc.) How to re-localize data collection: researchers should be able to conduct their own collections without being dependent on external servers or resources How to teach researchers the necessary skills: HTML, the DOM, CSS selection etc. Examples will be taken mainly from the minet CLI tool and the artoo.js bookmarklet. Speaker Guillaume Plique is a research engineer working for SciencesPo's médialab. He assists social sciences researchers daily with their methods and maintain a variety of FOSS tools geared toward the social sciences community and also developers.
33
35
Thumbnail
23:38
52
Thumbnail
30:38
53
Thumbnail
16:18
65
71
Thumbnail
14:24
72
Thumbnail
18:02
75
Thumbnail
19:35
101
Thumbnail
12:59
106
123
Thumbnail
25:58
146
Thumbnail
47:36
157
Thumbnail
51:32
166
172
Thumbnail
22:49
182
Thumbnail
25:44
186
Thumbnail
40:18
190
195
225
Thumbnail
23:41
273
281
284
Thumbnail
09:08
285
289
Thumbnail
26:03
290
297
Thumbnail
19:29
328
Thumbnail
24:11
379
Thumbnail
20:10
385
Thumbnail
28:37
393
Thumbnail
09:10
430
438
Complex (psychology)Data miningTask (computing)Data miningOpen setWeb 2.0InternetworkingTask (computing)XML
BitWeb 2.0Data miningHypermediaMultilaterationComputer animation
Data miningWeb 2.0Data miningPoint (geometry)View (database)ECosWeb pageWebsiteInstance (computer science)Computer animation
Convex hullWeb pageWebsiteProcess (computing)Instance (computer science)Scripting languageSource code
Host Identity ProtocolMenu (computing)Electronic visual displayLink (knot theory)Data miningShared memorySoftwareComputer programmingContent (media)Web pageWeb crawler2 (number)Different (Kate Ryan album)BitWeb 2.0Source codeComputer animation
Web crawlerInformationData miningRobotWeb pageFacebookWeb crawlerComputer programmingInstance (computer science)Web 2.0TwitterSoftwareContent (media)Shared memoryPhysicalismNegative numberPoisson-KlammerSource codeProgram flowchart
Point (geometry)PhysicalismView (database)Poisson-KlammerInstance (computer science)Web 2.0Source codeComputer animation
Source codeInternetworkingParadoxSource codeInternetworkingData miningState observerWeb 2.0Instance (computer science)GoogolService (economics)ParadoxTwitter
InternetworkingState transition systemSource codeGoogolMetaanalyseData miningWeb 2.0Instance (computer science)Field (computer science)Observational studyTwitterSource codeComputer virusXML
Cross-site scriptingData miningWeb 2.0Level (video gaming)Extreme programmingInstance (computer science)Direct numerical simulationXML
Web 2.0Instance (computer science)Scaling (geometry)Execution unitXML
Computing platformBitWeb 2.0Computing platformInstance (computer science)Different (Kate Ryan album)XML
Web 2.0Instance (computer science)Point (geometry)Multiplication signData miningProcess (computing)Rational numberXML
NP-hardWeb browserParallel computingGoogolWeb browserData miningWeb 2.0InternetworkingWeb pageUniverse (mathematics)Computer programmingCoroutineNeuroinformatikInstance (computer science)FamilyWebsiteProcess (computing)XML
Extreme programmingData managementSubject indexingData storage deviceScalabilityComputer programmingWeb 2.0XML
HypermediaSource codeXML
Electronic program guideTask (computing)Data miningPower (physics)
Demo (music)Web browserTask (computing)Data miningWeb 2.0Web pageServer (computing)Multiplication signDemo (music)Table (information)Client (computing)Web browserElectronic mailing listWindowInstance (computer science)BitAbstractionXMLComputer animation
QuantumContext awarenessWeb pageTable (information)Electronic mailing listCodeArtificial lifeWeb 2.0Online helpVideo gameSource codeComputer animation
Inclusion mapOnline helpInformationFunction (mathematics)Message passingCategory of beingComputer fontRankingBookmark (World Wide Web)Interface (computing)Bookmark (World Wide Web)Point (geometry)Computer fileCodeWeb browserWebsiteWeb pageFamilyComputer animationSource codeXML
Electronic mailing listWebsiteSource codeComputer animation
Bookmark (World Wide Web)Scale (map)BitLine (geometry)Data miningGroup actionXMLUML
Web pageContent (media)MultiplicationDatabase normalizationMatching (graph theory)HeuristicUniform resource locatorWeb crawler8 (number)Data miningWeb 2.0Content (media)Multiplication signWeb pageMultiplicationInstance (computer science)Computer animation
Uniform resource locatorWeb 2.0Instance (computer science)Web page19 (number)Row (database)Parallel portXML
Demo (music)Content (media)Web 2.0Web pageData miningLine (geometry)WordService (economics)NeuroinformatikLocal ring
Server (computing)Game controllerNeuroinformatikService (economics)Complex (psychology)InternetworkingComputer programmingWeb pageLoop (music)
MultiplicationLetterpress printingExecution unitInternetworkingLoop (music)Web pageInstance (computer science)Bit
Web crawlerSubsetInterface (computing)BitWeb 2.0
Web crawlerInterface (computing)outputWeb pageSubsetKeyboard shortcutInterface (computing)Instance (computer science)Web 2.0Web crawlerComputer animationSource code
Interface (computing)Multiplication signWeb 2.0Computer programmingWeb crawlerComputer animationSource codeProgram flowchart
Normed vector spaceMultiplication signInterface (computing)Web crawlerWeb 2.0Instance (computer science)Subject indexingComputer animation
ScalabilityUsabilitySubject indexingDatabaseProcess (computing)Graph (mathematics)Point (geometry)Multiplication signData miningIntelligent NetworkComputer programmingLine (geometry)Graphical user interfaceWeb 2.0Computer animationXML
InformationInstance (computer science)HypermediaComputer configurationReading (process)Sanitary sewerProxy serverFacebookVideoconferencingMachine visionProjective planeMultiplicationReal numberBlock (periodic table)TwitterCartesian coordinate systemData miningRoboticsShared memoryRight angleGoodness of fitUniform resource locatorAngleWeb 2.0Contrast (vision)YouTubeRobotPolygon meshRevision controlMobile app
Open sourcePoint cloudFacebook
Transcript: English(auto-generated)