Flipkart Web Scraping

Flipkart Web Scraping : Laptop

Suppose Big Bazaar wants to move to E-commerce industry and wants to collect data of multiple products from flipkart. Let us start with the collection of laptops data from flipkart.

Data to collect :

Step 1 : Get Html document of the Webpage.

Introduction to requests:

Requests allow you to send HTTP requests extremely easily. HTTP (Hypertext transfer protocol) request is used to send requests to the server and access information.

When a search engine or website visitor makes a request to a web server, a three digit HTTP Response Status Code is returned. This code indicates what is about to happen. A response code of 200 means "OK, here is the content you were asking for. It means the request is accepted.

To understand more about response codes :

If the status_code is 200, request is succeeded. Most common status code is 200. if the request is succeded , we can get the HTML Document of the web page and start scraping.

Let the Scraping Begins :

1. Go to flipkart.com and search for laptop and copy the web page url.

So first thing that we need to do is to import all necessary libraries and then send a HTTP request to our website and if the request is succeeded , we can access to the html text of the particular web page for scraping.

2. Converting HTML text into BeautifulSoup

Now what are we going to scrape ?

Scraping first post of the laptop :

If we need to scrape laptops details, we need to scrape first instance of the laptop or first post. If we can scrape first post , we can use that post as anchor and scrape information for all the laptops.

As we have learned previously, same kind of data lies in same class. So let us find class for our Laptop post.

As you can see, all of the contents of our laptop is in <a> tag, with class name _1fQZEK , So let's scrape this.

Scraping Laptop Details using <a> tag :

Now that we have scraped our Laptop post, let's scrape other details.

Scraping Laptop Model :

So the tag name is <div> class name for the model is _4rR01T.

Scraping Laptop Brand :

Don't you think we can get Laptop brand from the Model itself by splitting the string using index number to get the laptop name.

Scraping Price of the Product :

You can scrape price from <div> tag with class name <_30jeq3 _1_WHN1> .

But type of price is string instead of integer, so we need to convert it into integer and for that we need to replace unnecessary characters.

We can get the <href> attribute from the a tag <a> . But we had already scraped this part and assigned it to the variable laptop.

Before moving ahead , let us combine everything that we have done by now.

Specifications of the laptop :

To scrape this we need to follow these steps :

Scraping Rating from the product :

Now that we have Scraped everything , let us combine everything and see the result.

Section 2
  • We will scrape details of all the phones

  • Create a Dictionary to store all the details.

Scrape details of all the Laptops in our current Web page :

We have scraped a single Laptops detail , now if we want to scrape details of all the Laptops all we need to do is to find all the occurences of <a> tag with class name <_1fQZEK>.

Doing this is pretty simple.All you need to do is to use <find_all> , Like this :

Like this :

Last updated