How to Scrape a Web Page With Node.js

• 2 min read

Even though web APIs are becoming more common, it is still common to find a service you want to get data from but can’t because the data is not easily accessible. Web scraping is a last-resort technique that requires programatically extracting the required data from a webpage’s raw HTML. This tutorial will cover how to download and traverse a web page server-side using Node.js and the Zepto.js library.

What we’re Building

Currently, Google does not offer an API to search and browse Android applications. To get around this, we are going to build a command line tool that takes in an app ID, and spits out a bunch of info for that app. Let’s get started by defining our package.json file.

Requirements

The first thing we need to do is install the modules that we will need to perform our task. First, create a package.json file in the root of your project directory that looks like this:

We need three modules to perform our task:

  • zepto-node, a jQuery-like library for traversing the DOM
  • domino, to simulate the browser DOM in node (*DOM in No*de)
  • request, to make HTTP calls

Once your package.json file is ready to go, type $ npm install in your terminal to download and install the dependencies. Once complete, you should see a new directory called node_modules/ in the root of your project. This means you are ready to get started. Let’s write some code.

The Code

The code for this app needs little explanation. Here is a high level overview of the steps required:

  • include required modules
  • define the Google Play URL to fetch
  • fetch the HTML page
  • extract the desired data

The following is a code replay that shows you step-by-step how to create the app. Press play and then hit next after each step to walk through the code:

Here is a sample output from running the command on the ForkJoy app:


Is That It?

Well that was pretty easy wasn’t it? In 30 lines of code we have a command line tool that scrapes information from a web page. So now that you know how easy scraping web pages is with Node.js, what will you build?

If you enjoyed this tutorial, please consider sponsoring my work on GitHub 🤗

Be the first to cheers
Now look what you've done 🌋
Stop clicking and run for your life! 😱
Uh oh, I don't think the system can't handle it! 🔥
Stop it, you're too kind 😄
Thanks for the love! ❤️
Thanks, glad you enjoyed it! Care to share?
Hacker News Reddit

×

Recommended Posts ✍🏻

See All »
• 3 min read
✨ HTML Share Buttons
Read Post »
• 3 min read
🚅 Next Stop, Yaak
Read Post »
• 4 min read
💻 Wait for User to Stop Typing, in JavaScript
Read Post »