We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Web Scraping in Python 101

Formale Metadaten

Titel
Web Scraping in Python 101
Serientitel
Teil
103
Anzahl der Teile
119
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
ProduktionsortBerlin

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
M.Yasoob Khalid - Web Scraping in Python 101 This talk is about web scraping in Python, why web scraping is useful and what Python libraries are available to help you. I will also look into proprietary alternatives and will discuss how they work and why they are not useful. I will show you different libraries used in web scraping and some example code so that you can choose your own personal favourite. I will also tell why writing your own scrapper in scrapy allows you to have more control over the scraping process. ----- Who am I ? ========= * a programmer * a high school student * a blogger * Pythonista * and tea lover - Creator of freepythontips.wordpress.com - I made soundcloud-dl.appspot.com - I am a main contributor of youtube-dl. - I teach programming at my school to my friends. - It's my first programming related conference. - The life of a python programmer in Pakistan What this talk is about ? ================== - What is Web Scraping and its usefulness - Which libraries are available for the job - Open Source vs proprietary alternatives - Whaich library is best for which job - When and when not to use Scrapy What is Web Scraping ? ================== Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. - Wikipedia ###In simple words : It is a method to extract data from a website that does not have an API or we want to extract a LOT of data which we can not do through an API due to rate limiting. We can extract any data through web scraping which we can see while browsing the web. Usage of web scraping in real life. ============================ - to extract product information - to extract job postings and internships - extract offers and discounts from deal-of-the-day websites - Crawl forums and social websites - Extract data to make a search engine - Gathering weather data etc Advantages of Web scraping over using an API ======================== - Web Scraping is not rate limited - Anonymously access the website and gather data - Some websites do not have an API - Some data is not accessible through an API etc Which libraries are available for the job ? ================================ There are numerous libraries available for web scraping in python. Each library has its own weaknesses and plus points. Some of the most widely known libraries used for web scraping are: - BeautifulSoup - html5lib - lxml - re ( not really for web scraping, I will explain later ) - scrapy ( a complete framework ) A comparison between these libraries ============================== - speed - ease of use - what do i prefer - which library is best for which purpose Proprietary alternatives ================== - a list of proprietary scrapers - their price - are they really useful for you ? Working of proprietary alternatives =========================== - how they work (render javascript) - why they are not suitable for you - how custom scrapers beat proprietary alternatives Scrapy ======= - what is it - why is it useful - asynchronous support - an example scraper
Schlagwörter