PDA Assignments
  • Python For Data Analytics
    • 1.Python
      • 1.Python Documents
        • 1.Data Types
        • 2.Variables In Python
        • 3.Operators In Python
        • 4.User Input In Python
        • 5.TypeCasting In Python
        • 6.Strings In Python
        • 7.Conditional Statements In Python
        • 8.Branching using Conditional Statements and Loops in Python
        • 9.Lists In Python
        • 10.Sets In Python
        • 11.Tuples In Python
        • 12.Dictionary In Python
        • 13.Functions In Python
        • 14.File Handling In Python
        • 15.Numerical Computing with Python and Numpy
      • 2.Python Assignments
        • Data Type & Variables
        • Operators Assignment
        • User Input & Type Casting
        • Functions- Basic Assignments
        • String Assignments
          • String CheatSheet
        • Conditional Statements Assignments
        • Loops Assignments
        • List Assignments
          • List Cheatsheet
        • Set Assignments
          • Sets Cheatsheet
        • Dictionary Assignments
          • Dictionary Cheatsheet
        • Function Assignments
        • Functions used in Python
      • 3.Python Projects
        • Employee Management System
        • Hamming distance
        • Webscraping With Python
          • Introduction To Web Scraping
          • Importing Necessary Libraries
          • Basic Introduction To HTML
          • Introduction To BeautifulSoup
          • Flipkart Web Scraping
            • Scraping Step By Step
        • Retail Sales Analysis
        • Guess the Word Game
        • Data Collection Through APIs
        • To-Do List Manager
        • Atm-functionalities(nested if)
        • Distribution of Cards(List & Nested for)
        • Guess the Number Game
      • 4.Python + SQL Projects
        • Bookstore Management System
    • 2.Data Analytics
      • 1.Pandas
        • 1.Pandas Documents
          • 1.Introduction To Pandas
          • Reading and Loading Different Data
          • 2.Indexing and Slicing In Pandas
          • 3.Joining In Pandas
          • 4.Missing Values In Pandas
          • 5.Outliers In Pandas
          • 6.Aggregating Data
          • 7.DateTime In Pandas
          • 8.Validation In Pandas
          • 9.Fetching Data From SQL
          • 10. Automation In Pandas
          • 11.Matplotlib - Data Visualization
          • 12. Seaborn - Data Visualization
          • 13. Required Files
        • 3.Pandas Projects
          • Retail Sales Analysis
            • Retail Sales Step By Step
          • IMDB - Dataset Analysis - Basic
        • 2. Pandas Assignments
          • 1. Reading and Loading the Data
          • 2. Data frame Functions and Properties
          • 3. Series - Basic Operations
          • 4. Filtering in Pandas
          • 5. Advance Filtering
          • 6. Aggregate Functions & Groupby
          • 7. Pivot Tables
          • 8. Datetime
          • 9. String Functions
Powered by GitBook
On this page
  • Flipkart Web Scraping : Laptop
  • Data to collect :
  • Step 1 : Get Html document of the Webpage.
  • Introduction to requests:
  • Let the Scraping Begins :
  • 2. Converting HTML text into BeautifulSoup
  • Scraping first post of the laptop :
  • Scraping Laptop Details using <a> tag :
  • Scraping Laptop Model :
  • Scraping Laptop Brand :
  • Scraping Price of the Product :
  • Scraping Link of the laptop :
  • Scraping Link of the image :
  • Specifications of the laptop :
  • Scraping Rating from the product :
  • Scrape details of all the Laptops in our current Web page :
  1. Python For Data Analytics
  2. 1.Python
  3. 3.Python Projects
  4. Webscraping With Python

Flipkart Web Scraping

PreviousIntroduction To BeautifulSoupNextScraping Step By Step

Last updated 2 years ago

Flipkart Web Scraping : Laptop

Suppose Big Bazaar wants to move to E-commerce industry and wants to collect data of multiple products from flipkart. Let us start with the collection of laptops data from flipkart.

Data to collect :

Step 1 : Get Html document of the Webpage.

Introduction to requests:

Requests allow you to send HTTP requests extremely easily. HTTP (Hypertext transfer protocol) request is used to send requests to the server and access information.

When a search engine or website visitor makes a request to a web server, a three digit HTTP Response Status Code is returned. This code indicates what is about to happen. A response code of 200 means "OK, here is the content you were asking for. It means the request is accepted.

To understand more about response codes :

from bs4 import BeautifulSoup
import requests

url = 'https://www.flipkart.com/search?q=laptop&sid=6bo%2Cb5g&as=on&as-show=on&otracker=AS_QueryStore_OrganicAutoSuggest_2_2_na_na_na&otracker1=AS_QueryStore_OrganicAutoSuggest_2_2_na_na_na&as-pos=2&as-type=RECENT&suggestionId=laptop%7CLaptops&requestId=483bcb46-7458-4c48-869c-773583f9e774&as-searchtext=laptop'

response = requests.get(url)

print(response.status_code)

Output : 
200

If the status_code is 200, request is succeeded. Most common status code is 200. if the request is succeded , we can get the HTML Document of the web page and start scraping.

Let the Scraping Begins :

1. Go to flipkart.com and search for laptop and copy the web page url.

So first thing that we need to do is to import all necessary libraries and then send a HTTP request to our website and if the request is succeeded , we can access to the html text of the particular web page for scraping.

# Importing necessary Libraries 

from bs4 import BeautifulSoup
import requests

# url of the webpage

url = "https://www.flipkart.com/search?q=laptop&sid=6bo%2Cb5g&as=on&as-show=on&otracker=AS_QueryStore_OrganicAutoSuggest_2_2_na_na_na&otracker1=AS_QueryStore_OrganicAutoSuggest_2_2_na_na_na&as-pos=2&as-type=RECENT&suggestionId=laptop%7CLaptops&requestId=483bcb46-7458-4c48-869c-773583f9e774&as-searchtext=laptop"

# sending HTTP Request to the website and get the response 

response = requests.get(url) 

# if response code is 200, get content of Webpage using content property.

if(response.status_code == 200):
    html_text = response.content 
    print(html_text)
    

Output : 

<!doctype html><html lang="en"><head>
<link href="https://rukminim1.flixcart.com" rel="preconnect"/>
<link rel="stylesheet" href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app_modules.chunk.905c37.css"/>
<link rel="stylesheet" href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app.chunk.104e9a.css"/>
<meta http-equiv="Content.........
.................</html>

2. Converting HTML text into BeautifulSoup

soup = BeautifulSoup(html_text , "lxml")
print(soup.prettify())

Now what are we going to scrape ?

Scraping first post of the laptop :

If we need to scrape laptops details, we need to scrape first instance of the laptop or first post. If we can scrape first post , we can use that post as anchor and scrape information for all the laptops.

As we have learned previously, same kind of data lies in same class. So let us find class for our Laptop post.

As you can see, all of the contents of our laptop is in <a> tag, with class name _1fQZEK , So let's scrape this.

Scraping Laptop Details using <a> tag :

laptop = soup.find('a',"_1fQZEK")
print(laptop)

Now that we have scraped our Laptop post, let's scrape other details.

Scraping Laptop Model :

So the tag name is <div> class name for the model is _4rR01T.

model = laptop.find('div','_4rR01T').get_text()
print(model)

Output: 

acer Aspire 3 Dual Core 3020e - (4 GB/256 GB SSD/Windows 11 Home) A314-22 Laptop

Scraping Laptop Brand :

Don't you think we can get Laptop brand from the Model itself by splitting the string using index number to get the laptop name.

brand = model.split(' ')[0].title()
print(brand)

Output : 

Acer

Scraping Price of the Product :

You can scrape price from <div> tag with class name <_30jeq3 _1_WHN1> .

price = laptop.find('div', "_30jeq3 _1_WHN1")
print(price)

Output : 

₹24,990

But type of price is string instead of integer, so we need to convert it into integer and for that we need to replace unnecessary characters.

price = int(phone.find('div', "_30jeq3 _1_WHN1").replace('₹','').replace(',',''))
print(price)

Output : 

24990

Scraping Link of the laptop :

We can get the <href> attribute from the a tag <a> . But we had already scraped this part and assigned it to the variable laptop.

link =  "https://www.flipkart.com" + phone['href']
print(link)

Output:

https://www.flipkart.com/acer-aspire-3-dual-core-3020e-4-gb-256-gb-ssd-windows-11-home-a314-22-laptop/p/itm07e718a865608?pid=COMGBNUMF8CDJZFH&lid=LSTCOMGBNUMF8CDJZFHVNFSHE&marketplace=FLIPKART&q=acer&store=6bo%2Fb5g&srno=s_1_6&otracker=search&otracker1=search&fm=organic&iid=ad76d4cc-ba93-4913-b04e-889d6cfb5f9c.COMGBNUMF8CDJZFH.SEARCH&ppt=hp&ppn=homepage&ssid=a2vg4ctgsg0000001664870452054&qH=e4721cdedd884fee

Before moving ahead , let us combine everything that we have done by now.

from bs4 import BeautifulSoup
import requests

url = "https://www.flipkart.com/search?q=laptop&sid=6bo%2Cb5g&as=on&as-show=on&otracker=AS_QueryStore_OrganicAutoSuggest_1_2_na_na_na&otracker1=AS_QueryStore_OrganicAutoSuggest_1_2_na_na_na&as-pos=1&as-type=HISTORY&suggestionId=laptop%7CLaptops&requestId=483bcb46-7458-4c48-869c-773583f9e774&as-backfill=on"

response = requests.get(url)

if(response.status_code==200):
    html_text = response.content
    soup = BeautifulSoup(html_text,"lxml")
    laptop = soup.find('a',"_1fQZEK")
    model = laptop.find('div','_4rR01T').get_text()
    brand = model.split(' ')[0].title()
    price = int(laptop.find('div','_30jeq3').get_text().replace('₹','').replace(',',''))
    link =  "https://www.flipkart.com" + laptop['href']
    print(f'''Brand : {brand}
Model : {model}
Price : {price}
Link : {link}''')


Output : 

Brand : Acer
Model : acer Aspire 3 Dual Core 3020e - (4 GB/256 GB SSD/Windows 11 Home) A314-22 Laptop
Price : 24990
Link : https://www.flipkart.com/acer-aspire-3-dual-core-3020e-4-gb-256-gb-ssd-windows-11-home-a314-22-laptop/p/itm07e718a865608?pid=COMGBNUMF8CDJZFH&lid=LSTCOMGBNUMF8CDJZFHVNFSHE&marketplace=FLIPKART&q=acer&store=6bo%2Fb5g&srno=s_1_6&otracker=search&otracker1=search&fm=organic&iid=ad76d4cc-ba93-4913-b04e-889d6cfb5f9c.COMGBNUMF8CDJZFH.SEARCH&ppt=hp&ppn=homepage&ssid=a2vg4ctgsg0000001664870452054&qH=e4721cdedd884fee

Scraping Link of the image :

image = laptop.find('img')['src']
print(image)

Output : 

https://rukminim1.flixcart.com/image/312/312/kp2y2kw0/computer/y/0/c/na-thin-and-light-laptop-asus-original-imag3ebnzawky4kn.jpeg?q=70

Specifications of the laptop :

To scrape this we need to follow these steps :

specification_list = []
lists = laptop.find('ul')
specifications = lists.find_all('li')
for specification in specifications:
    specification_list.append(specification.get_text())
print(specification_list)

Output : 

['AMD Dual Core Processor', 
'4 GB DDR4 RAM', 
'64 bit Windows 11 Operating System', 
'256 GB SSD', 
'35.56 cm (14 Inch) Display', 
'Acer Care Center, 
Quick Access, 
Acer Product Registration', 
'1 Year International Travelers Warranty (ITW)']

Scraping Rating from the product :

rating = float(laptop.find('div','_3LWZlK').get_text())
print(rating)

Output : 

4.0

Now that we have Scraped everything , let us combine everything and see the result.

from bs4 import BeautifulSoup
import requests

url = "https://www.flipkart.com/search?q=laptop&sid=6bo%2Cb5g&as=on&as-show=on&otracker=AS_QueryStore_OrganicAutoSuggest_1_2_na_na_na&otracker1=AS_QueryStore_OrganicAutoSuggest_1_2_na_na_na&as-pos=1&as-type=HISTORY&suggestionId=laptop%7CLaptops&requestId=483bcb46-7458-4c48-869c-773583f9e774&as-backfill=on"

response = requests.get(url)

if(response.status_code==200):
    html_text = response.content
    soup = BeautifulSoup(html_text,"lxml")
    laptop = soup.find('a',"_1fQZEK")
    model = laptop.find('div','_4rR01T').get_text()
    brand = model.split(' ')[0].title()
    price = int(laptop.find('div','_30jeq3').get_text().replace('₹','').replace(',',''))
    link =  "https://www.flipkart.com" + laptop['href']
    image = laptop.find('img')['src']
    specification_list = []
    lists = laptop.find('ul')
    specifications = lists.find_all('li')
    for specification in specifications:
        specification_list.append(specification.get_text())
    rating = float(laptop.find('div', '_3LWZlK').get_text())
    print(f'''Brand : {brand}
Price : {price}
Link : {link}
Image Link : {image}
Rating : {rating}
Specifications : {specification_list}''')


Output : 

Brand : Acer
Price : 21890
Link : https://www.flipkart.com/acer-aspire-3-dual-core-3020e-4-gb-256-gb-ssd-windows-11-home-a314-22-laptop/p/itm07e718a865608?pid=COMGBNUMF8CDJZFH&lid=LSTCOMGBNUMF8CDJZFHVNFSHE&marketplace=FLIPKART&q=laptop&store=6bo%2Fb5g&srno=s_1_1&otracker=AS_QueryStore_OrganicAutoSuggest_1_2_na_na_na&otracker1=AS_QueryStore_OrganicAutoSuggest_1_2_na_na_na&fm=organic&iid=en_Gbo3777pGH7PAcF2H26oAIXyFRd0RPuu3u9xDpwameJZkT0HUiVYAfbkaioxeA9TFXuZ9FCqywavLb%2B8qN9sag%3D%3D&ppt=None&ppn=None&ssid=7yr62k7ir40000001664969271008&qH=312f91285e048e09
Image Link : https://rukminim1.flixcart.com/image/312/312/xif0q/computer/h/u/b/-original-imagg2vnsrgrfkmz.jpeg?q=70
Rating : 4.0
Specifications : 
['AMD Dual Core Processor', 
'4 GB DDR4 RAM', 
'64 bit Windows 11 Operating System', 
'256 GB SSD', '35.56 cm (14 Inch) Display', 
'Acer Care Center, 
Quick Access, 
Acer Product Registration', 
'1 Year International Travelers Warranty (ITW)']
Section 2
  • We will scrape details of all the phones

  • Create a Dictionary to store all the details.

Scrape details of all the Laptops in our current Web page :

We have scraped a single Laptops detail , now if we want to scrape details of all the Laptops all we need to do is to find all the occurences of <a> tag with class name <_1fQZEK>.

Doing this is pretty simple.All you need to do is to use <find_all> , Like this :

laptops = soup.find_all('a','_1fQZEK')
for laptop in laptops : 
    # use above written script for single laptop data scraping 

Like this :

from bs4 import BeautifulSoup
import requests

url = "https://www.flipkart.com/search?q=laptop&sid=6bo%2Cb5g&as=on&as-show=on&otracker=AS_QueryStore_OrganicAutoSuggest_1_2_na_na_na&otracker1=AS_QueryStore_OrganicAutoSuggest_1_2_na_na_na&as-pos=1&as-type=HISTORY&suggestionId=laptop%7CLaptops&requestId=483bcb46-7458-4c48-869c-773583f9e774&as-backfill=on"

response = requests.get(url)

if(response.status_code==200):
    html_text = response.content
    soup = BeautifulSoup(html_text,"lxml")
    laptops = soup.find_all('a',"_1fQZEK")
    for laptop in laptops:
        model = laptop.find('div','_4rR01T').get_text()
        brand = model.split(' ')[0].title()
        price = int(laptop.find('div','_30jeq3').get_text().replace('₹','').replace(',',''))
        link =  "https://www.flipkart.com" + laptop['href']
        image = laptop.find('img')['src']
        specification_list = []
        lists = laptop.find('ul')
        specifications = lists.find_all('li')
        for specification in specifications:
            specification_list.append(specification.get_text())
        rating = float(laptop.find('div', '_3LWZlK').get_text())
        print(f'''Brand : {brand}
    Price : {price}
    Link : {link}
    Image Link : {image}
    Rating : {rating}
    Specifications : {specification_list}''')
Response Codes Explained with PicturesMoz
Logo