Empowering social scientists with web mining tools

Cite

Related Material

FOSDEM VZW

Plique, Guillaume

Formal Metadata

Title

Empowering social scientists with web mining tools

Subtitle

Why and how to enable researchers to perform complex web mining tasks

Title of Series

FOSDEM 2020

Number of Parts

490

Author

Plique, Guillaume

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/46919 (DOI)

Publisher

FOSDEM VZW

Release Date

2020

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Web mining, as represented mostly by the scraping & crawling practices, is not a straightforward task and requires a variety of skills related to web technologies. However, web mining can be incredibly useful to social sciences since it enables researchers to tap into a formidable source of information about society. But researchers may not have the possibility to invest copious amount of times into learning web technologies in and out. They usually rely on engineers to collect data from the web. The object of this talk is to explain how Sciences Po's médialab designed & developed tools to empower researchers and enable them to perform web mining tasks to answer their research questions. Here is an example of issues we will tackle during this talk: How a social sciences laboratory life can be a very fruitful context for tool R&D regarding webmining How to create performant & effective webmining tools that anyone can use (multithreading, parallelism, JS execution, complex spiders etc.) How to re-localize data collection: researchers should be able to conduct their own collections without being dependent on external servers or resources How to teach researchers the necessary skills: HTML, the DOM, CSS selection etc. Examples will be taken mainly from the minet CLI tool and the artoo.js bookmarklet. Speaker Guillaume Plique is a research engineer working for SciencesPo's médialab. He assists social sciences researchers daily with their methods and maintain a variety of FOSS tools geared toward the social sciences community and also developers.