11.Matplotlib - Data Visualization
Last updated
Last updated
Data visualization is the graphic representation of data. It involves producing images that communicate relationships among the represented data to viewers. Visualizing data is an essential part of data analysis and machine learning.
We'll use Python libraries Matplotlib and Seaborn to learn and apply some popular data visualization techniques. We'll use the words chart, plot, and graph interchangeably in this tutorial.
To begin, let's install and import the libraries. We'll use the matplotlib.pyplot module for basic plots like line & bar charts. It is often imported with the alias plt.
In [ ]:
Imagine a shop , where items are sold each day and shopkeeper wants to see , how many items are sold each day visually. So We have data for days and items sold for a week.
In [2]:
Now here you can see on the 5th day , most of the items are sold. When we see data in graph, it becomes more clear and we can take decisions accordingly.
As you have seen above example, all columns are numeric, say one column is categorical column like Day1 and Day2.
title property is used to give name or title to your graph.Say we want to name this graph Shop - Data
x-label is a property to give name to x-axis. Say we want to give x-axis Days name.
y-label is a property to give name to y-axis. Say we want to give y-axis Items name.
You can change colors by using this property
grid property is used to make graph more understandable. You are seeing the graph above , if you are not aware about data, you will face problems about the actual data but with the help of grid , graph is more readable.
In [5]:
first Data point is (Day1,10) and Second Data point is (Day2, 10) and so on.
Let us say you have two data and you simply created a plot. But the plot was confusing as lines were overlapping each other.
In [6]:
Now here , our line chart is not clear and the whole point of graph is dismissed and it is happening because we donot have any order. To overcome this overlapping , all we need to do is to sort_values. We can sort Data1 so that no overlapping occurs.
In [7]:
As of now we have dealt with two kinds of data , now in this example we will deal with more. Suppose a bunch of students gave two tests for mathematics and science.
In [8]:
You can create multiple line chart by using plot methods.
In [9]:
It is a type of visualization that helps in representing categorical data. It has rectangular bars (hence the name, bar graph) that can be represented horizontally and vertically. Let's take same example of students and their scores.
In [11]:
For horizontal graph we use barh() method :
In [12]:
It helps visualize the relationship between 2 or more variables. In addition to this, it helps in identifying outliers(abnormalities) which could be present in a dataset. This way, the exceptions in data could be better understood and the reason behind the same could be found out.
Like if there are more than two variables , we can use scatter plot. So our example for maths and science score fits well here.
In [13]:
A histogram is a graph showing frequency distributions.
It is a graph showing the number of observations within each given interval.
In simple words , let us say you have a data of weights , something like this :
weights = [50,60,70,80,90]
and another data of people falling in the range of weight:
people_weight = [51,54,57,78,88,63,52,86]
As you can see,
People who fall in the range of 50-60 : 51,54,57,52 People who fall in the range of 60-70 : 63 People who fall in the range of 70-80 : 78 People who fall in the range of 80-90 : 88,86
So in histogram we pass two parameters :
range(r) : Here weights is range. values : Here people_weight is values
From this graph , It is clear that 4 people fall in range of 50 to 60 and only 1 person fall in range of 60 to 70 and as well as 70 to 80 and there are 2 people who fall in range of 80 to 90.
Suppose we want to find out how many pokemons fall in a particular range of speed. We can use our Pokemon data set to find out.
In [15]:
Also known as a circle chart or Emma chart.It has been named as pie chart due to its resemblance to a piece of a pie.Usually the data shown in a pie chart are in percentages.
For an example , suppose we are making a dish where ingredients are put in a definite percentage. For an example i say this cake has 70% chocolate and 10% milk and so on.
In [16]:
autopct : you can use this property to show percentage.
colors : You can also use colors to change colors.You need to pass a list of colors to use this property. You can also use color codes instead of their name.
In [17]:
Say if i want to know percentage of types of pokemons in our data set. Pie chart will be right option to choose.
In [21]:
Stack Plots are used to visualize multiple linear plots, stacked on top of each other.
For an example Suppose i have data of how many hours i studied and played for 7 days. This can be show with the help of stackplot.
In [19]:
As you can see i spent total 11 hours playing and studying the first day. This is how we use stackplot.But it is not quite clear as of now , we can use legend and labels and make this graph much more understandable.
Here labels are more than one so we will be using a list to store all the labels. For example :
In [20]:
Create a Line chart Comparing all the countries Yearly.
Congrats !!! You have made it. Now practice all kind of graphs and implement in your datasets. Next topic that we are going to cover is Seaborn.
You can change lines in line chart with this property. You can go through to check all kind of lines.
We can use marker keyword argument to mark each data point with a circle while creating line chart. You can go through all kind of markers from .