Introduction To BeautifulSoup
Problem Statement:
In our previous session, we have already built a normal webpage. If we try to scrape the data from our e-commerce website, with python only, it is going to be complicated. So we need BeautifulSoup Library to extract data from HTML Document.
Beautiful Soup is a Python library for getting data out of HTML, XML, and other markup languages. Say you’ve found some webpages that display data relevant to your research, such as date or address information, but that do not provide any way of downloading the data directly. Beautiful Soup helps you pull particular content from a webpage, remove the HTML markup, and save the information. It is a tool for web scraping that helps you clean up and parse the documents you have pulled down from the web.

So let's try to scrape the HTML Document that we have created in previous Document.
Import BeautifulSoup:
Store HTML Document in a variable:
Convert HTML Document in a BeautifulSoup Object:
You need to convert your HTML Document in a BeautifulSoup Object, if you want to scrape data from HTML Doc efficiently. We will also need a parser "lxml" that converts Document in a meaningful structure so that scraping can be done easily.
To read HTML Document:

To Navigate through Structure :
To get <title> tag from html document :
<title> tag from html document : If you only want text inside <title> tag. We can use get_text() method. It returns a string value.
find() method -> To get first div tag with class "Phone" :
If you want to find something specific, we may need to use attributes. Like if you watch HTML Code carefully, First div tag has an attribute 'class = 'Phone'.
We will use find() method to get our desired result.
So we got details about our first phone inside <div> tag.
contents -> To get contents of any tag:
contents returns a list of all the child tags inside Parent tag that is <div> here.
To better understand, contents return a list of different tags so we can find it by indexing.

To get name of the phone using find() method :
As you can see , name of the phone is in <h1> tag, So we can find it by find() method.
To get Link of the image :
If you see first div tag , <img> tag has the link in src attribute. We can use this attribute to get the link of the image. You can get it simply by doing this:
find_all() ---> To get all the specifications of our phone :
find() method only gives the first occurence of any particular tag, but find_all() method gives all occurences that returns a list of tags.
let us say we use find method for <li> to get specification.
using find() method to get specification.
As you can see there are multiple specifications, but we got the first instance only.
using find_all() method to get all specifications.
Using loop to get all the texts from the lists.
get link to know more about phone :
To get link , we need to get <a> tag and its <href> attribute.
Assignment 1 : Try to combine everything and get info about our Phone.
Output should be Something like this:
Try to get details of all phones:
Now that we have got details of a single phone , we can use find_all() method to find all the phones and then scrape data of phones by using loop.
Assignment 2 : Try to Scrape All Laptop Details:
Last updated