• Python 3
Reading time
  • Approximately 30 days
What you will learn
  • Web Scraping
  • Database
  • Ryan Mitchell
  • 3 years, 7 months ago
Packages you will be introduced to
  • beautifulsoup4
  • scrapy
  • requests
  • nltk
  • pillow
  • tesseract
  • numpy
  • pysocks

Official description

If programming is magic then web scraping is surely a form of wizardry. By writing a simple automated program, you can query web servers, request data, and parse it to extract the information you need. The expanded edition of this practical book not only introduces you web scraping, but also serves as a comprehensive guide to scraping almost every type of data from the modern web.

Part I focuses on web scraping mechanics: using Python to request information from a web server, performing basic handling of the server’s response, and interacting with sites in an automated fashion. Part II explores a variety of more specific tools and applications to fit any web scraping scenario you’re likely to encounter.

  • Parse complicated HTML pages
  • Develop crawlers with the Scrapy framework
  • Learn methods to store data you scrape
  • Read and extract data from documents
  • Clean and normalize badly formatted data
  • Read and write natural languages
  • Crawl through forms and logins
  • Scrape JavaScript and crawl through APIs
  • Use and write image-to-text software
  • Avoid scraping traps and bot blockers
  • Use scrapers to test your website


There are 1 reviews for this book on GitHub.
Hendeca left a review on GitHub 4 years ago.

A very practical, well-written, enjoyable, and entertaining book; the most enjoyable programming book I've read to date


I just finished Python Web Scraping and I absolutely loved it. I am somewhat new to Python, though I have been a programmer for about 9 years. I got this book because I've always had an interest in the internet and in web scraping/data collecting. I started the book with a decent grasp of the basics of Python, and would recommend familiarizing oneself with the basics before reading it. That said, if you can write Python scripts, install Python libraries, and execute the scripts, I don't see why you can run the examples in this book with little trouble.

What I loved about this book was its practical nature and concise, clear explanations. Right away the book teaches you how to gather html from a remote URL and parse it with BeautifulSoup. The code examples are easy to understand and they work! The structure of the book is also fantastic. The list of topics and how the book progresses feels very natural. I found myself turning the page to start a new chapter, seeing the chapter title, and thinking "I was just wondering about that!"

I am somewhat fascinated by web scraping, so things like the appendix section on the legality and ethics of web scraping was just icing on the cake for me and made this book all the more interesting. For me, this book led directly to the creation of a few scraping projects I've been thinking about for some time. It's very rare for me that as I go through a book I find information that is immediately applicaple.

Who's it for?

In my view, this book is great for anyone in the following categories:

  • Someone who has begun coding in Python and wants to find some practical projects to help expand their knowledge and get their feet wet
  • An experienced coder (Python or not) who has a particular interest in web scraping and wants to explore the possibilities of web scraping/data collecting

Write a review

Read this book? Comment on this book's GitHub issue page and share what you liked and what you didn't like about it. Your GitHub comment will show up as a review here.