Reading and Loading Different Data
when working on pandas, it is necessary to load data into pandas. In this section we will learn thoroughly how to load different kinds of data.
Last updated
when working on pandas, it is necessary to load data into pandas. In this section we will learn thoroughly how to load different kinds of data.
Last updated
Download this file first:
Do not forget to install openpyxl. openpyxl is an excel reader, without it pandas will not be able to read it.
There is another library that you can install : xlrd.
The head() function is used to get the first or last n rows.This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.
In [ ]:
In [ ]:
In [ ]:
Say , We want to work on HP Column.
In [ ]:
In [ ]:
To get the data types of columns , you need to use dtype attribute.
In [ ]:
The Describe function returns the statistical summary of the dataframe or series. This includes Count, Mean, Median (or 50th percentile) standard variation, min-max, and percentile values of columns.
In [19]:
count
800.000000
800.00000
800.000000
800.000000
800.000000
800.000000
800.000000
800.000000
800.00000
mean
362.813750
435.10250
69.258750
79.001250
73.842500
72.820000
71.902500
68.277500
3.32375
std
208.343798
119.96304
25.534669
32.457366
31.183501
32.722294
27.828916
29.060474
1.66129
min
1.000000
180.00000
1.000000
5.000000
5.000000
10.000000
20.000000
5.000000
1.00000
25%
184.750000
330.00000
50.000000
55.000000
50.000000
49.750000
50.000000
45.000000
2.00000
50%
364.500000
450.00000
65.000000
75.000000
70.000000
65.000000
70.000000
65.000000
3.00000
75%
539.250000
515.00000
80.000000
100.000000
90.000000
95.000000
90.000000
90.000000
5.00000
max
721.000000
780.00000
255.000000
190.000000
230.000000
194.000000
230.000000
180.000000
6.00000
We can perform slicing on columns as well as rows on pandas Dataframe.
df.iloc[rows,columns]
Say We want to slice Top 4 rows with all columns , We will slice it in the same way as we learned it in Numpy.
0
1
Bulbasaur
Grass
Poison
318
45
49
49
65
65
45
1
False
1
2
Ivysaur
Grass
Poison
405
60
62
63
80
80
60
1
False
2
3
Venusaur
Grass
Poison
525
80
82
83
100
100
80
1
False
3
3
VenusaurMega Venusaur
Grass
Poison
625
80
100
123
122
120
80
1
False
Say We want to slice 4 Columns from start with all rows , We will do it something like this.
In [21]:
0
1
Bulbasaur
Grass
Poison
1
2
Ivysaur
Grass
Poison
2
3
Venusaur
Grass
Poison
3
3
VenusaurMega Venusaur
Grass
Poison
4
4
Charmander
Fire
NaN
...
...
...
...
...
795
719
Diancie
Rock
Fairy
796
719
DiancieMega Diancie
Rock
Fairy
797
720
HoopaHoopa Confined
Psychic
Ghost
798
720
HoopaHoopa Unbound
Psychic
Dark
799
721
Volcanion
Fire
Water
In this topic, we will learn to iterate over :
1.Rows 2.Columns
You can iterate over rows with the help of iterrows().
It gives you a tuple of index and whole row. Say i want index and only name from the row, We will write it something like this:
In [ ]:
In [25]:
Selecting all the rows from the given Dataframe in which Type1 is Grass using loc[ ].
In [26]:
#
Name
Type 1
Type 2
Total
HP
Attack
Defense
Sp. Atk
Sp. Def
Speed
Generation
Legendary
0
1
Bulbasaur
Grass
Poison
318
45
49
49
65
65
45
1
False
1
2
Ivysaur
Grass
Poison
405
60
62
63
80
80
60
1
False
2
3
Venusaur
Grass
Poison
525
80
82
83
100
100
80
1
False
3
3
VenusaurMega Venusaur
Grass
Poison
625
80
100
123
122
120
80
1
False
48
43
Oddish
Grass
Poison
320
45
50
55
75
65
30
1
False
...
...
...
...
...
...
...
...
...
...
...
...
...
...
718
650
Chespin
Grass
NaN
313
56
61
65
48
45
38
6
False
719
651
Quilladin
Grass
NaN
405
61
78
95
56
58
57
6
False
720
652
Chesnaught
Grass
Fighting
530
88
107
122
74
75
64
6
False
740
672
Skiddo
Grass
NaN
350
66
65
48
62
57
52
6
False
741
673
Gogoat
Grass
NaN
531
123
100
62
97
81
68
6
False
You can use sort.values() function to sort data in ascending or descending order.For Example, We want to sort Type1 in Ascending Order.
In [35]:
600
540
Sewaddle
Bug
Grass
310
45
53
70
40
60
42
5
False
136
127
Pinsir
Bug
NaN
500
65
125
100
55
70
85
1
False
457
412
Burmy
Bug
NaN
224
40
29
45
29
45
36
4
False
132
123
Scyther
Bug
Flying
500
70
110
80
55
80
105
1
False
656
595
Joltik
Bug
Electric
319
50
47
50
57
50
65
5
False
...
...
...
...
...
...
...
...
...
...
...
...
...
...
172
158
Totodile
Water
NaN
314
50
65
64
44
48
43
2
False
610
550
Basculin
Water
NaN
460
70
92
65
80
55
98
5
False
145
134
Vaporeon
Water
NaN
525
130
65
60
110
95
65
1
False
574
515
Panpour
Water
NaN
316
50
53
48
53
48
64
5
False
184
170
Chinchou
Water
Electric
330
75
38
38
56
56
67
2
False
By using ascending=False parameter, You can Sort Data in Descending Order.
In [36]:
371
339
Barboach
Water
Ground
288
50
48
43
46
41
60
3
False
97
90
Shellder
Water
NaN
305
30
65
100
45
25
40
1
False
240
222
Corsola
Water
Rock
380
55
55
85
65
85
35
2
False
403
368
Gorebyss
Water
NaN
485
55
84
105
114
75
52
3
False
93
86
Seel
Water
NaN
325
65
45
55
45
70
45
1
False
...
...
...
...
...
...
...
...
...
...
...
...
...
...
54
49
Venomoth
Bug
Poison
450
70
65
60
90
75
90
1
False
208
193
Yanma
Bug
Flying
390
65
65
45
75
45
95
2
False
182
168
Ariados
Bug
Poison
390
70
90
70
60
60
40
2
False
734
666
Vivillon
Bug
Flying
411
80
52
50
90
50
89
6
False
179
165
Ledyba
Bug
Flying
265
40
20
30
40
80
55
2
False
In [37]:
307
283
Surskit
Bug
Water
269
40
30
32
50
52
65
3
False
460
413
WormadamTrash Cloak
Bug
Steel
424
60
69
95
69
95
36
4
False
220
205
Forretress
Bug
Steel
465
75
90
140
60
60
40
2
False
693
632
Durant
Bug
Steel
484
58
109
112
48
48
109
5
False
650
589
Escavalier
Bug
Steel
495
70
135
105
60
105
20
5
False
...
...
...
...
...
...
...
...
...
...
...
...
...
...
264
245
Suicune
Water
NaN
580
100
75
115
90
115
85
2
True
548
490
Manaphy
Water
NaN
600
100
100
100
100
100
100
4
False
12
9
BlastoiseMega Blastoise
Water
NaN
630
79
103
120
135
115
78
1
False
421
382
Kyogre
Water
NaN
670
100
100
90
150
140
90
3
True
422
382
KyogrePrimal Kyogre
Water
NaN
770
100
150
90
180
160
90
3
True
Say , We want to create a new Column named total_sum , that is addition of HP,Attack,Defense,Sp.atk,Sp.def,Speed,Generation.
In [44]:
Out[44]:
We can also use sum() function to achieve same column , for Example :
In [55]:
Out[55]:
NOTE
If i See my dataframe, it will add both columns. Let's see.
In [56]:
0
1
Bulbasaur
Grass
Poison
273
45
49
49
65
65
45
1
False
273
273
273
1
2
Ivysaur
Grass
Poison
345
60
62
63
80
80
60
1
False
345
345
345
2
3
Venusaur
Grass
Poison
445
80
82
83
100
100
80
1
False
445
445
445
3
3
VenusaurMega Venusaur
Grass
Poison
545
80
100
123
122
120
80
1
False
545
545
545
4
4
Charmander
Fire
NaN
270
39
52
43
60
50
65
1
False
270
270
270
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
795
719
Diancie
Rock
Fairy
550
50
100
150
100
150
50
6
True
550
550
550
796
719
DiancieMega Diancie
Rock
Fairy
650
50
160
110
160
110
110
6
True
650
650
650
797
720
HoopaHoopa Confined
Psychic
Ghost
520
80
110
60
150
130
70
6
True
520
520
520
798
720
HoopaHoopa Unbound
Psychic
Dark
600
80
160
60
170
130
80
6
True
600
600
600
799
721
Volcanion
Fire
Water
520
80
110
120
130
90
70
6
True
520
520
520
In [59]:
0
1
Bulbasaur
Grass
Poison
45
49
49
65
65
45
1
False
1
2
Ivysaur
Grass
Poison
60
62
63
80
80
60
1
False
2
3
Venusaur
Grass
Poison
80
82
83
100
100
80
1
False
3
3
VenusaurMega Venusaur
Grass
Poison
80
100
123
122
120
80
1
False
4
4
Charmander
Fire
NaN
39
52
43
60
50
65
1
False
...
...
...
...
...
...
...
...
...
...
...
...
...
795
719
Diancie
Rock
Fairy
50
100
150
100
150
50
6
True
796
719
DiancieMega Diancie
Rock
Fairy
50
160
110
160
110
110
6
True
797
720
HoopaHoopa Confined
Psychic
Ghost
80
110
60
150
130
70
6
True
798
720
HoopaHoopa Unbound
Psychic
Dark
80
160
60
170
130
80
6
True
799
721
Volcanion
Fire
Water
80
110
120
130
90
70
6
True
Let's say we have a data player_score.txt and we want to convert it into learn_pandas.csv. Here is our data player_score.txt in text form.
0
1
Virat
100
25
75
1
2
Dhoni
78
93
25
2
3
Jadeja
56
78
90
3
4
Ashwin
78
47
112
4
5
Sachin
112
45
78
If you do not want index in your file, you can simply put index=False
index : Lets you insert index by changing index=True
and index=False
.
sep : lets you select sep , for example sep=',' or sep='@'.
Say, You want no index Column and separator should be ',' :
In [4]:
0
1
Virat
100
25
75
1
2
Dhoni
78
93
25
2
3
Jadeja
56
78
90
3
4
Ashwin
78
47
112
4
5
Sachin
112
45
78
In [7]:
0
1
Virat
100
25
75
1
2
Dhoni
78
93
25
2
3
Jadeja
56
78
90
3
4
Ashwin
78
47
112
4
5
Sachin
112
45
78
DataFrame has a method called drop() that removes rows or columns according to specify column(label) names and corresponding axis.
1.inplace=True
means the operation would work on the original object.
2.axis = 1
we are dropping the column, not the row.
In [12]:
0
1
Bulbasaur
Grass
Poison
45
49
49
65
65
45
1
False
1
2
Ivysaur
Grass
Poison
60
62
63
80
80
60
1
False
2
3
Venusaur
Grass
Poison
80
82
83
100
100
80
1
False
3
3
VenusaurMega Venusaur
Grass
Poison
80
100
123
122
120
80
1
False
4
4
Charmander
Fire
NaN
39
52
43
60
50
65
1
False
...
...
...
...
...
...
...
...
...
...
...
...
...
795
719
Diancie
Rock
Fairy
50
100
150
100
150
50
6
True
796
719
DiancieMega Diancie
Rock
Fairy
50
160
110
160
110
110
6
True
797
720
HoopaHoopa Confined
Psychic
Ghost
80
110
60
150
130
70
6
True
798
720
HoopaHoopa Unbound
Psychic
Dark
80
160
60
170
130
80
6
True
799
721
Volcanion
Fire
Water
80
110
120
130
90
70
6
True
If the original index are numbers, now we have indexes that are not continuous. Well, pandas has reset_index() function. So to reset the index to the default integer index beginning at 0, We can simply use the reset_index() function.
In [13]:
0
1
Bulbasaur
Grass
Poison
45
49
49
65
65
45
1
False
1
2
Ivysaur
Grass
Poison
60
62
63
80
80
60
1
False
2
3
Venusaur
Grass
Poison
80
82
83
100
100
80
1
False
3
3
VenusaurMega Venusaur
Grass
Poison
80
100
123
122
120
80
1
False
4
4
Charmander
Fire
NaN
39
52
43
60
50
65
1
False
...
...
...
...
...
...
...
...
...
...
...
...
...
795
719
Diancie
Rock
Fairy
50
100
150
100
150
50
6
True
796
719
DiancieMega Diancie
Rock
Fairy
50
160
110
160
110
110
6
True
797
720
HoopaHoopa Confined
Psychic
Ghost
80
110
60
150
130
70
6
True
798
720
HoopaHoopa Unbound
Psychic
Dark
80
160
60
170
130
80
6
True
799
721
Volcanion
Fire
Water
80
110
120
130
90
70
6
True
In [14]:
2
3
Venusaur
Grass
Poison
525
80
82
83
100
100
80
1
False
3
3
VenusaurMega Venusaur
Grass
Poison
625
80
100
123
122
120
80
1
False
50
45
Vileplume
Grass
Poison
490
75
80
85
110
90
50
1
False
77
71
Victreebel
Grass
Poison
490
80
105
65
100
70
70
1
False
79
73
Tentacruel
Water
Poison
515
80
70
65
80
120
100
1
False
111
103
Exeggutor
Grass
Psychic
520
95
95
85
125
65
55
1
False
168
154
Meganium
Grass
NaN
525
80
82
100
83
100
80
2
False
197
182
Bellossom
Grass
NaN
490
75
80
95
90
100
50
2
False
204
189
Jumpluff
Grass
Flying
460
75
55
70
55
95
110
2
False
207
192
Sunflora
Grass
NaN
425
75
75
55
105
85
30
2
False
298
275
Shiftry
Grass
Dark
480
90
100
60
90
60
80
3
False
390
357
Tropius
Grass
Flying
460
99
68
83
72
87
51
3
False
433
388
Grotle
Grass
NaN
405
75
89
85
55
65
36
4
False
434
389
Torterra
Grass
Ground
525
95
109
105
75
85
56
4
False
505
455
Carnivine
Grass
NaN
454
74
100
72
90
72
46
4
False
510
460
Abomasnow
Grass
Ice
494
90
92
75
92
85
60
4
False
511
460
AbomasnowMega Abomasnow
Grass
Ice
594
90
132
105
132
105
30
4
False
516
465
Tangrowth
Grass
NaN
535
100
100
125
110
50
50
4
False
550
492
ShayminLand Forme
Grass
NaN
600
100
100
100
100
100
100
4
True
551
492
ShayminSky Forme
Grass
Flying
600
100
103
75
120
75
127
4
True
556
497
Serperior
Grass
NaN
528
75
75
95
75
95
113
5
False
571
512
Simisage
Grass
NaN
498
75
98
63
98
63
101
5
False
617
556
Maractus
Grass
NaN
461
75
86
67
106
67
60
5
False
652
591
Amoonguss
Grass
Poison
464
114
85
70
85
80
30
5
False
659
598
Ferrothorn
Grass
Steel
489
74
94
131
54
116
20
5
False
701
640
Virizion
Grass
Fighting
580
91
90
72
90
129
108
5
True
720
652
Chesnaught
Grass
Fighting
530
88
107
122
74
75
64
6
False
741
673
Gogoat
Grass
NaN
531
123
100
62
97
81
68
6
False
In [16]:
0
1
Bulbasaur
Grass
Poison
318
45
49
49
65
65
45
1
False
1
2
Ivysaur
Grass
Poison
405
60
62
63
80
80
60
1
False
2
3
Venusaur
Grass
Poison
525
80
82
83
100
100
80
5
False
3
3
VenusaurMega Venusaur
Grass
Poison
625
80
100
123
122
120
80
5
False
4
4
Charmander
Fire
NaN
309
39
52
43
60
50
65
1
False
...
...
...
...
...
...
...
...
...
...
...
...
...
...
795
719
Diancie
Rock
Fairy
600
50
100
150
100
150
50
5
True
796
719
DiancieMega Diancie
Rock
Fairy
700
50
160
110
160
110
110
5
True
797
720
HoopaHoopa Confined
Psychic
Ghost
600
80
110
60
150
130
70
5
True
798
720
HoopaHoopa Unbound
Psychic
Dark
680
80
160
60
170
130
80
5
True
799
721
Volcanion
Fire
Water
600
80
110
120
130
90
70
5
True
Pandas GroupBy is a powerful and versatile function in Python. It allows you to split your data into separate groups to perform computations for better analysis.
Let me take an example to elaborate on this. Let’s say we are trying to analyze the weight of a person in a city. We can easily get a fair idea of their weight by determining the mean weight of all the city dwellers. But here ‘s a question – would the weight be affected by the gender of a person?
We can group the city dwellers into different gender groups and calculate their mean weight. This would give us a better insight into the weight of a person living in the city. But we can probably get an even better picture if we further separate these gender groups into different age groups and then take their mean weight (because a teenage boy’s weight could differ from that of an adult male)!
In [11]:
0
1
Bulbasaur
Grass
Poison
318
45
49
49
65
65
45
1
False
1
1
2
Ivysaur
Grass
Poison
405
60
62
63
80
80
60
1
False
1
2
3
Venusaur
Grass
Poison
525
80
82
83
100
100
80
1
False
1
3
3
VenusaurMega Venusaur
Grass
Poison
625
80
100
123
122
120
80
1
False
1
4
4
Charmander
Fire
NaN
309
39
52
43
60
50
65
1
False
1
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
795
719
Diancie
Rock
Fairy
600
50
100
150
100
150
50
6
True
1
796
719
DiancieMega Diancie
Rock
Fairy
700
50
160
110
160
110
110
6
True
1
797
720
HoopaHoopa Confined
Psychic
Ghost
600
80
110
60
150
130
70
6
True
1
798
720
HoopaHoopa Unbound
Psychic
Dark
680
80
160
60
170
130
80
6
True
1
799
721
Volcanion
Fire
Water
600
80
110
120
130
90
70
6
True
1
Say We want to initialize all values of column count as 1.
In [10]:
0
1
Bulbasaur
Grass
Poison
318
45
49
49
65
65
45
1
False
1
1
2
Ivysaur
Grass
Poison
405
60
62
63
80
80
60
1
False
1
2
3
Venusaur
Grass
Poison
525
80
82
83
100
100
80
1
False
1
3
3
VenusaurMega Venusaur
Grass
Poison
625
80
100
123
122
120
80
1
False
1
4
4
Charmander
Fire
NaN
309
39
52
43
60
50
65
1
False
1
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
795
719
Diancie
Rock
Fairy
600
50
100
150
100
150
50
6
True
1
796
719
DiancieMega Diancie
Rock
Fairy
700
50
160
110
160
110
110
6
True
1
797
720
HoopaHoopa Confined
Psychic
Ghost
600
80
110
60
150
130
70
6
True
1
798
720
HoopaHoopa Unbound
Psychic
Dark
680
80
160
60
170
130
80
6
True
1
799
721
Volcanion
Fire
Water
600
80
110
120
130
90
70
6
True
1
In [14]:
Fire
1
1
NaN
0
1
1
1
1
1
1
1
1
1
Grass
4
4
NaN
4
4
4
4
4
4
4
4
4
4
Fire
4
4
NaN
3
4
4
4
4
4
4
4
4
4
Water
1
1
NaN
0
1
1
1
1
1
1
1
1
1
Bug
2
2
NaN
0
2
2
2
2
2
2
2
2
2
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Fairy
1
1
NaN
0
1
1
1
1
1
1
1
1
1
Flying
2
2
NaN
2
2
2
2
2
2
2
2
2
2
Fire
1
1
NaN
1
1
1
1
1
1
1
1
1
1
Psychic
2
2
NaN
2
2
2
2
2
2
2
2
2
2
Rock
2
2
NaN
2
2
2
2
2
2
2
2
2
2
You can generate a column with random numbers by using random module . For Example , here total column is random number column.
In [18]:
0
1
Bulbasaur
Grass
Poison
54
45
49
49
65
65
45
1
False
1
2
Ivysaur
Grass
Poison
69
60
62
63
80
80
60
1
False
2
3
Venusaur
Grass
Poison
17
80
82
83
100
100
80
1
False
3
3
VenusaurMega Venusaur
Grass
Poison
34
80
100
123
122
120
80
1
False
4
4
Charmander
Fire
NaN
18
39
52
43
60
50
65
1
False
...
...
...
...
...
...
...
...
...
...
...
...
...
...
795
719
Diancie
Rock
Fairy
49
50
100
150
100
150
50
6
True
796
719
DiancieMega Diancie
Rock
Fairy
80
50
160
110
160
110
110
6
True
797
720
HoopaHoopa Confined
Psychic
Ghost
43
80
110
60
150
130
70
6
True
798
720
HoopaHoopa Unbound
Psychic
Dark
7
80
160
60
170
130
80
6
True
799
721
Volcanion
Fire
Water
67
80
110
120
130
90
70
6
True