Node Cheerio Scrape Data into JSON File using Axios Tutorial

In this tutorial we will learn how to scrape the data from the website in Node js app using the cheerio, axios and beautify packages.

Almost all of us know what web scraping is however who don’t know this is an automatic process to gather large amount of data or information from websites.

The data in web scraping is obtained in unstructured HTML format.

In this node website data scraping example, we will show you how to use Cheerio module for web scraping in node.

For making the cheerio work in node, we will be needing a website url or link from where we will extract the data.

For that we are going to use a simple demo site for web scraping purposes.

Also, we will use the axios module that will help us make the request to server, it will return the data response.

Mainly, we will load the data or unstructured HTML in cheerio module and access the children or DOM elements using some jQuery methods.

Finally create a dynamic json file and add the scrapped data into that file.

How to Scrape Data from Website in Node Js using Cheerio

  • Step 1: Generate New Directory
  • Step 2: Build Package JSON
  • Step 3: Set Up Main App.js
  • Step 4: Install Cheerio and Axios
  • Step 5: Scrape Website Data in Node
  • Step 6: Start Cheerio Scraping

Generate New Directory

To create a node project, your very first requirement is to have a blank folder.

We have a simple command for you that easily create a new folder in your system.

We named it node-world; however you can give any name you want.

mkdir node-world

Enter into the application’s root.

cd node-world

Build Package JSON

This is the most significant step, where we will show you how to invoke the node project with the npm init command.

Head over to the command prompt of your project through a terminal app; in the previous step, we used the command that helped us create the directory.

Similarly, we will use the single command to set up the new package.json file in this step.

npm init

The package.json file will keep the records of our installed packages and scripts records.

{
  "name": "node-world",
  "version": "1.0.0",
  "description": "",
  "main": "app.js",
  "dependencies": {},
  "devDependencies": {},
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC"
}

Set Up Main App.js

We have reached another important step, and now you have to create an app.js file in your node app directory.

Afterwards, you have to open the package.json file.

Look for scripts object, in this object; you have to add a new property; the purpose of adding this property is to display the output using the command-line tool.

  "scripts": {
    "start": "node app.js"
  },

Now, the command has been utterly registered, you may use the node app.js command to invoke the node project.

Install Cheerio and Axios

Now, we have almost created the basic structure of the node project.

Next, we need to install the essential modules for pulling the data from another site.

This consists of making server requests, extracting the site HTML data, and beautifying the json data created after getting the data through the scrapping method.

npm install cheerio
npm install pretty
npm install axios

Scrape Website Data in Node

In this step, we have to combine some modules, and some functions this is something that we need to do to scrape data from a web page.

We have a web page link from where we have to grab the data, we are using books.toscrape.com site that is particularly made for testing the scraping methodology.

Next, you must open the app.js file and add all the following code within the file.

const fs = require('fs')
const cheerio = require('cheerio')
const axios = require('axios')

const API = 'http://books.toscrape.com/catalogue/category/books_1/index.html'

const scrapeSite = async () => {
  try {
    const { data } = await axios.get(API)
    const $ = cheerio.load(data)

    const bookCollection = $('.row li .product_pod')

    const bookItem = []

    bookCollection.each((index, el) => {
      const BOOK = { title: '', price: '' }

      BOOK.title = $(el).children('h3').text()
      BOOK.price = $(el)
        .children('.product_price')
        .children('p.price_color')
        .text()

      bookItem.push(BOOK)
    })

    console.dir(bookItem)

    fs.writeFile('books.json', JSON.stringify(bookItem, null, 2), (error) => {
      if (error) {
        console.log(error)
        return
      }
      console.log('Website data has been scrapped.')
    })
  } catch (e) {
    console.error(e)
  }
}

scrapeSite()

Start Cheerio Scraping

In this final segment, we are going to ascertain you how to successfully pull the data from a website.

Here is the command that will help you do the honour.

node app.js

Make sure to check the console filled with the scrapped data, also a new file generates by the name of books.json.

[
  { title: 'A Light in the ...', price: '£51.77' },
  { title: 'Tipping the Velvet', price: '£53.74' },
  { title: 'Soumission', price: '£50.10' },
  { title: 'Sharp Objects', price: '£47.82' },
  { title: 'Sapiens: A Brief History ...', price: '£54.23' },
  { title: 'The Requiem Red', price: '£22.65' },
  { title: 'The Dirty Little Secrets ...', price: '£33.34' },
  { title: 'The Coming Woman: A ...', price: '£17.93' },
  { title: 'The Boys in the ...', price: '£22.60' },
  { title: 'The Black Maria', price: '£52.15' },
  { title: 'Starving Hearts (Triangular Trade ...', price: '£13.99' },
  { title: "Shakespeare's Sonnets", price: '£20.66' },
  { title: 'Set Me Free', price: '£17.46' },
  { title: "Scott Pilgrim's Precious Little ...", price: '£52.29' },
  { title: 'Rip it Up and ...', price: '£35.02' },
  { title: 'Our Band Could Be ...', price: '£57.25' },
  { title: 'Olio', price: '£23.88' },
  { title: 'Mesaerion: The Best Science ...', price: '£37.59' },
  { title: 'Libertarianism for Beginners', price: '£51.33' },
  { title: "It's Only the Himalayas", price: '£45.17' }
]

# Website data has been scrapped.

Summary

Node Cheerio Scrap Data into JSON File using Axios Tutorial

In this small yet detailed guide, we have intensively tried to learn how to use the cheerio web scraping library in the node js application. We have easily comprehended the process of web scraping in Node js using Axios, pretty and cheerio modules.

You saw how to use the actual API, make the HTTP get request to the server from where we have to scrape the data and generate the data JSON file in node app.

We hope you loved how we explained everything; keep learning and winning.