Hey guys, today we’re showing HN a new open-source library that we have been working on for almost a year. It incorporates lessons learned from scraping of thousands of websites over the last 4 years. We figured there was no such universal library for JavaScript, while for example Python has one (https://scrapy.org/). That wasn’t fair, because JavaScript is THE language of the web :)
Anyway, we hope you’ll give it a shot and we’re really looking forward to hear what you think about it. All feedback welcome!
The SDK runs anywhere where you have Node running. And if you can run headless Chrome with Puppeteer there too, than you can use it in the SDK too. This might require several libraries and configuration settings. If I’m not mistaken, Google Cloud Functions support Puppeteer by default, AWS Lambda does not. With any Docker-based serverless platform such as Zeit Now or Apify Cloud you just need to use the right Docker image.
Anyway, we hope you’ll give it a shot and we’re really looking forward to hear what you think about it. All feedback welcome!
Thank you so much for making and sharing this!
Does it have to run on an instance or can we also use a serverless environnement?