Web Scraping Best Practises

206

Cite

EuroPython

Evans, Shane

Formal Metadata

Title

Web Scraping Best Practises

Title of Series

EuroPython 2015

Part Number

Number of Parts

173

Author

Evans, Shane

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/20204 (DOI)

Publisher

EuroPython

Release Date

2015

Language

English

Production Place

Bilbao, Euskadi, Spain

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Shane Evans - Web Scraping Best Practises Python is a fantastic language for writing web scrapers. There is a large ecosystem of useful projects and a great developer community. However, it can be confusing once you go beyond the simpler scrapers typically covered in tutorials. In this talk, we will explore some common real-world scraping tasks. You will learn best practises and get a deeper understanding of what tools and techniques can be used and how to deal with the most challenging of web scraping projects! We will cover crawling and extracting data at different scales - from small websites to large focussed crawls. This will include an overview of automated extraction techniques. We'll touch on common difficulties like rendering pages in browsers, proxy management, and crawl architecture.

Keywords

EuroPython Conference

EP 2015

EuroPython 2015