Python Pandas#
Pandas is an open-source, BSD-licensed Python library. Pandas is a handy and useful data-structure tool for analyzing large and complex data.
In this tutorial we will practice:
DataFrame,
Data Selection,
Group-By,
Series,
Sorting,
Searching,
statistics.
Let us practice Data analysis using Pandas.
In this exercise, we are using automobile_data.csv for data analysis. This Dataset has different characteristics of an auto such as body-style, wheel-base, engine-type, price, mileage, horsepower, etc.
What included in this Pandas exercise?
It contains 10 questions. The solution is provided for each question. Each question includes a specific Pandas topic you need to learn. When you complete each question, you get more familiar with data analysis using pandas.
Exercise 1: From the given dataset print the first and last five rows#
import pandas as pd
#df = pd.read_csv("C:\\Users\\Milaan\\01_Learn_Python4Data\\10_Python_Pandas_Module\\automobile_data.csv")
df = pd.read_csv("automobile_data.csv") # just give the name of file only if the file is in the same folder.
df.head(5)
index | company | body-style | wheel-base | length | engine-type | num-of-cylinders | horsepower | average-mileage | price | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | alfa-romero | convertible | 88.6 | 168.8 | dohc | four | 111 | 21 | 13495.0 |
1 | 1 | alfa-romero | convertible | 88.6 | 168.8 | dohc | four | 111 | 21 | 16500.0 |
2 | 2 | alfa-romero | hatchback | 94.5 | 171.2 | ohcv | six | 154 | 19 | 16500.0 |
3 | 3 | audi | sedan | 99.8 | 176.6 | ohc | four | 102 | 24 | 13950.0 |
4 | 4 | audi | sedan | 99.4 | 176.6 | ohc | five | 115 | 18 | 17450.0 |
import pandas as pd
df = pd.read_csv("automobile_data.csv")
df.tail(5)
index | company | body-style | wheel-base | length | engine-type | num-of-cylinders | horsepower | average-mileage | price | |
---|---|---|---|---|---|---|---|---|---|---|
56 | 81 | volkswagen | sedan | 97.3 | 171.7 | ohc | four | 85 | 27 | 7975.0 |
57 | 82 | volkswagen | sedan | 97.3 | 171.7 | ohc | four | 52 | 37 | 7995.0 |
58 | 86 | volkswagen | sedan | 97.3 | 171.7 | ohc | four | 100 | 26 | 9995.0 |
59 | 87 | volvo | sedan | 104.3 | 188.8 | ohc | four | 114 | 23 | 12940.0 |
60 | 88 | volvo | wagon | 104.3 | 188.8 | ohc | four | 114 | 23 | 13415.0 |
Exercise 2: Clean the dataset and update the CSV file#
Hint: Replace all column values which contain ?
, n.a
, or NaN
.
#df = pd.read_csv("C:\\Users\\Milaan\\01_Learn_Python4Data\\10_Python_Pandas_Module\\automobile_data.csv", na_values={
df = pd.read_csv("automobile_data.csv", na_values={
'price':["?","n.a"],
'stroke':["?","n.a"],
'horsepower':["?","n.a"],
'peak-rpm':["?","n.a"],
'average-mileage':["?","n.a"]})
print (df)
#df.to_csv("C:\\Users\\Milaan\\01_Learn_Python4Data\\10_Python_Pandas_Module\\automobile_data1.csv")
df.to_csv("automobile_data1.csv")
index company body-style wheel-base length engine-type \
0 0 alfa-romero convertible 88.6 168.8 dohc
1 1 alfa-romero convertible 88.6 168.8 dohc
2 2 alfa-romero hatchback 94.5 171.2 ohcv
3 3 audi sedan 99.8 176.6 ohc
4 4 audi sedan 99.4 176.6 ohc
.. ... ... ... ... ... ...
56 81 volkswagen sedan 97.3 171.7 ohc
57 82 volkswagen sedan 97.3 171.7 ohc
58 86 volkswagen sedan 97.3 171.7 ohc
59 87 volvo sedan 104.3 188.8 ohc
60 88 volvo wagon 104.3 188.8 ohc
num-of-cylinders horsepower average-mileage price
0 four 111 21 13495.0
1 four 111 21 16500.0
2 six 154 19 16500.0
3 four 102 24 13950.0
4 five 115 18 17450.0
.. ... ... ... ...
56 four 85 27 7975.0
57 four 52 37 7995.0
58 four 100 26 9995.0
59 four 114 23 12940.0
60 four 114 23 13415.0
[61 rows x 10 columns]
Exercise 3: Find the most expensive car company name#
Print most expensive car’s company name and price.
import pandas as pd
df = pd.read_csv("automobile_data1.csv")
df = df [['company','price']][df.price==df['price'].max()]
df
#or
#df=df.groupby(["company"]).sum().sort_values("price", ascending=False)
#df
company | price | |
---|---|---|
35 | mercedes-benz | 45400.0 |
Exercise 4: Print All Toyota Cars details#
import pandas as pd
df = pd.read_csv("automobile_data1.csv")
car_Manufacturers = df.groupby('company')
toyotaDf = car_Manufacturers.get_group('toyota')
toyotaDf
#or
#df=pd.DataFrame(pd.read_csv(r'C:\\Users\\Milaan\\01_Learn_Python4Data\\10_Python_Pandas_Module\\automobile_data1.csv'))
#df[(df.company == 'toyota')]
Unnamed: 0 | index | company | body-style | wheel-base | length | engine-type | num-of-cylinders | horsepower | average-mileage | price | |
---|---|---|---|---|---|---|---|---|---|---|---|
48 | 48 | 66 | toyota | hatchback | 95.7 | 158.7 | ohc | four | 62 | 35 | 5348.0 |
49 | 49 | 67 | toyota | hatchback | 95.7 | 158.7 | ohc | four | 62 | 31 | 6338.0 |
50 | 50 | 68 | toyota | hatchback | 95.7 | 158.7 | ohc | four | 62 | 31 | 6488.0 |
51 | 51 | 69 | toyota | wagon | 95.7 | 169.7 | ohc | four | 62 | 31 | 6918.0 |
52 | 52 | 70 | toyota | wagon | 95.7 | 169.7 | ohc | four | 62 | 27 | 7898.0 |
53 | 53 | 71 | toyota | wagon | 95.7 | 169.7 | ohc | four | 62 | 27 | 8778.0 |
54 | 54 | 79 | toyota | wagon | 104.5 | 187.8 | dohc | six | 156 | 19 | 15750.0 |
Exercise 5: Count total cars per company#
import pandas as pd
df = pd.read_csv("automobile_data1.csv")
df['company'].value_counts()
toyota 7
bmw 6
nissan 5
mazda 5
volkswagen 4
audi 4
mitsubishi 4
mercedes-benz 4
isuzu 3
alfa-romero 3
honda 3
jaguar 3
chevrolet 3
porsche 3
volvo 2
dodge 2
Name: company, dtype: int64
Exercise 6: Find each company’s Higesht price car#
import pandas as pd
df = pd.read_csv("automobile_data1.csv")
car_Manufacturers = df.groupby('company')
priceDf = car_Manufacturers['company','price'].max()
priceDf
<ipython-input-7-c8f749f7a9ca>:5: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
priceDf = car_Manufacturers['company','price'].max()
company | price | |
---|---|---|
company | ||
alfa-romero | alfa-romero | 16500.0 |
audi | audi | 18920.0 |
bmw | bmw | 41315.0 |
chevrolet | chevrolet | 6575.0 |
dodge | dodge | 6377.0 |
honda | honda | 12945.0 |
isuzu | isuzu | 6785.0 |
jaguar | jaguar | 36000.0 |
mazda | mazda | 18344.0 |
mercedes-benz | mercedes-benz | 45400.0 |
mitsubishi | mitsubishi | 8189.0 |
nissan | nissan | 13499.0 |
porsche | porsche | 37028.0 |
toyota | toyota | 15750.0 |
volkswagen | volkswagen | 9995.0 |
volvo | volvo | 13415.0 |
Exercise 7: Find the average mileage of each car making company#
import pandas as pd
df = pd.read_csv("automobile_data1.csv")
car_Manufacturers = df.groupby('company')
mileageDf = car_Manufacturers['company','average-mileage'].mean()
mileageDf
#or
#company_grp=df.groupby('company')
#company_grp.agg({
# 'average-mileage':np.mean
#})
<ipython-input-8-c02acf54079b>:5: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
mileageDf = car_Manufacturers['company','average-mileage'].mean()
average-mileage | |
---|---|
company | |
alfa-romero | 20.333333 |
audi | 20.000000 |
bmw | 19.000000 |
chevrolet | 41.000000 |
dodge | 31.000000 |
honda | 26.333333 |
isuzu | 33.333333 |
jaguar | 14.333333 |
mazda | 28.000000 |
mercedes-benz | 18.000000 |
mitsubishi | 29.500000 |
nissan | 31.400000 |
porsche | 17.000000 |
toyota | 28.714286 |
volkswagen | 31.750000 |
volvo | 23.000000 |
Exercise 8: Sort all cars by Price column#
import pandas as pd
carsDf = pd.read_csv("automobile_data1.csv")
carsDf = carsDf.sort_values(by=['price', 'horsepower'], ascending=False)
carsDf.head(5)
Unnamed: 0 | index | company | body-style | wheel-base | length | engine-type | num-of-cylinders | horsepower | average-mileage | price | |
---|---|---|---|---|---|---|---|---|---|---|---|
35 | 35 | 47 | mercedes-benz | hardtop | 112.0 | 199.2 | ohcv | eight | 184 | 14 | 45400.0 |
11 | 11 | 14 | bmw | sedan | 103.5 | 193.8 | ohc | six | 182 | 16 | 41315.0 |
34 | 34 | 46 | mercedes-benz | sedan | 120.9 | 208.1 | ohcv | eight | 184 | 14 | 40960.0 |
46 | 46 | 62 | porsche | convertible | 89.5 | 168.9 | ohcf | six | 207 | 17 | 37028.0 |
12 | 12 | 15 | bmw | sedan | 110.0 | 197.0 | ohc | six | 182 | 15 | 36880.0 |
Exercise 9: Concatenate two data frames using the following conditions#
Hint: Create two data frames using the following two dictionary.
import pandas as pd
GermanCars = {'Company': ['Ford', 'Mercedes', 'BMV', 'Audi'], 'Price': [23845, 171995, 135925 , 71400]}
carsDf1 = pd.DataFrame.from_dict(GermanCars)
japaneseCars = {'Company': ['Toyota', 'Honda', 'Nissan', 'Mitsubishi '], 'Price': [29995, 23600, 61500 , 58900]}
carsDf2 = pd.DataFrame.from_dict(japaneseCars)
carsDf = pd.concat([carsDf1, carsDf2], keys=["Germany", "Japan"])
carsDf
Company | Price | ||
---|---|---|---|
Germany | 0 | Ford | 23845 |
1 | Mercedes | 171995 | |
2 | BMV | 135925 | |
3 | Audi | 71400 | |
Japan | 0 | Toyota | 29995 |
1 | Honda | 23600 | |
2 | Nissan | 61500 | |
3 | Mitsubishi | 58900 |
Exercise 10: Merge two data frames using the following condition#
Hint: Create two data frames using the following two Dicts, Merge two data frames, and append the second data frame as a new column to the first data frame.
import pandas as pd
Car_Price = {'Company': ['Toyota', 'Honda', 'BMV', 'Audi'], 'Price': [23845, 17995, 135925 , 71400]}
carPriceDf = pd.DataFrame.from_dict(Car_Price)
car_Horsepower = {'Company': ['Toyota', 'Honda', 'BMV', 'Audi'], 'horsepower': [141, 80, 182 , 160]}
carsHorsepowerDf = pd.DataFrame.from_dict(car_Horsepower)
carsDf = pd.merge(carPriceDf, carsHorsepowerDf, on="Company")
carsDf
Company | Price | horsepower | |
---|---|---|---|
0 | Toyota | 23845 | 141 |
1 | Honda | 17995 | 80 |
2 | BMV | 135925 | 182 |
3 | Audi | 71400 | 160 |