Simone Magnaschi
Senior Full Stack Web Dev
Bookmarks tagged with #scraping.
Show all

Crawlee · Build reliable crawlers. Fast. | Crawlee

Crawlee won't fix broken selectors for you (yet), but it helps you build and maintain your crawlers faster. When a website adds JavaScript rendering, you don't have to rewrite everything, only switch to one of the browser crawlers.
Saved on: 2022-08-23

Reversing private APIs, Safeway, and not-so-extreme couponing

The browser fires an OPTIONS request to Since we’re doing this programmatically, we don’t need to worry about this (as it’s for cross origin request safety, a browser safety feature).
Saved on: 2019-10-15

Headless Chrome support in Cloud Functions and App Engine | Hacker News

Using this, in conjunction with AWS Step Functions, Lambda, and ECS, it became merely cents a month to run a headless scraper task in the cloud. What does your workflow look like?
Saved on: 2018-08-20

emadehsan/thal: Getting started with Puppeteer and Chrome Headless for Web

Puppeteer is official tool for Chrome Headless by Google Chrome team. Since the official announcement of Chrome Headless, many of the industry standard libraries for automated testing have been discontinued by their maintainers. Including PhantomJS.
Saved on: 2017-08-29

Advanced Web Scraping: Bypassing "403 Forbidden," captchas, and more | sang

The full code for the completed scraper can be found in the companion repository on github. I wouldn’t really consider web scraping one of my hobbies or anything but I guess I sort of do a lot of it.
Saved on: 2017-03-16