Python Pandas

Python Pandas#

Pandas is an open-source, BSD-licensed Python library. Pandas is a handy and useful data-structure tool for analyzing large and complex data.

In this tutorial we will practice:

DataFrame,
Data Selection,
Group-By,
Series,
Sorting,
Searching,
statistics.

Let us practice Data analysis using Pandas.

In this exercise, we are using automobile_data.csv for data analysis. This Dataset has different characteristics of an auto such as body-style, wheel-base, engine-type, price, mileage, horsepower, etc.

What included in this Pandas exercise?

It contains 10 questions. The solution is provided for each question. Each question includes a specific Pandas topic you need to learn. When you complete each question, you get more familiar with data analysis using pandas.

Exercise 1: From the given dataset print the first and last five rows#

import pandas as pd

#df = pd.read_csv("C:\\Users\\Milaan\\01_Learn_Python4Data\\10_Python_Pandas_Module\\automobile_data.csv")
df = pd.read_csv("automobile_data.csv")  # just give the name of file only if the file is in the same folder.
df.head(5)

	index	company	body-style	wheel-base	length	engine-type	num-of-cylinders	horsepower	average-mileage	price
0	0	alfa-romero	convertible	88.6	168.8	dohc	four	111	21	13495.0
1	1	alfa-romero	convertible	88.6	168.8	dohc	four	111	21	16500.0
2	2	alfa-romero	hatchback	94.5	171.2	ohcv	six	154	19	16500.0
3	3	audi	sedan	99.8	176.6	ohc	four	102	24	13950.0
4	4	audi	sedan	99.4	176.6	ohc	five	115	18	17450.0

import pandas as pd

df = pd.read_csv("automobile_data.csv")
df.tail(5)

	index	company	body-style	wheel-base	length	engine-type	num-of-cylinders	horsepower	average-mileage	price
56	81	volkswagen	sedan	97.3	171.7	ohc	four	85	27	7975.0
57	82	volkswagen	sedan	97.3	171.7	ohc	four	52	37	7995.0
58	86	volkswagen	sedan	97.3	171.7	ohc	four	100	26	9995.0
59	87	volvo	sedan	104.3	188.8	ohc	four	114	23	12940.0
60	88	volvo	wagon	104.3	188.8	ohc	four	114	23	13415.0

Exercise 2: Clean the dataset and update the CSV file#

Hint: Replace all column values which contain ?, n.a, or NaN.

#df = pd.read_csv("C:\\Users\\Milaan\\01_Learn_Python4Data\\10_Python_Pandas_Module\\automobile_data.csv", na_values={
df = pd.read_csv("automobile_data.csv", na_values={
'price':["?","n.a"],
'stroke':["?","n.a"],
'horsepower':["?","n.a"],
'peak-rpm':["?","n.a"],
'average-mileage':["?","n.a"]})
print (df)


#df.to_csv("C:\\Users\\Milaan\\01_Learn_Python4Data\\10_Python_Pandas_Module\\automobile_data1.csv")
df.to_csv("automobile_data1.csv")

    index      company   body-style  wheel-base  length engine-type  \
     0  alfa-romero  convertible        88.6   168.8        dohc   
     1  alfa-romero  convertible        88.6   168.8        dohc   
     2  alfa-romero    hatchback        94.5   171.2        ohcv   
     3         audi        sedan        99.8   176.6         ohc   
     4         audi        sedan        99.4   176.6         ohc   
..    ...          ...          ...         ...     ...         ...   
   81   volkswagen        sedan        97.3   171.7         ohc   
   82   volkswagen        sedan        97.3   171.7         ohc   
   86   volkswagen        sedan        97.3   171.7         ohc   
   87        volvo        sedan       104.3   188.8         ohc   
   88        volvo        wagon       104.3   188.8         ohc   

   num-of-cylinders  horsepower  average-mileage    price  
            four         111               21  13495.0  
            four         111               21  16500.0  
             six         154               19  16500.0  
            four         102               24  13950.0  
            five         115               18  17450.0  
..              ...         ...              ...      ...  
           four          85               27   7975.0  
           four          52               37   7995.0  
           four         100               26   9995.0  
           four         114               23  12940.0  
           four         114               23  13415.0  

[61 rows x 10 columns]

Exercise 3: Find the most expensive car company name#

Print most expensive car’s company name and price.

import pandas as pd
df = pd.read_csv("automobile_data1.csv")
df = df [['company','price']][df.price==df['price'].max()]
df

#or

#df=df.groupby(["company"]).sum().sort_values("price", ascending=False)
#df

	company	price
35	mercedes-benz	45400.0

Exercise 4: Print All Toyota Cars details#

import pandas as pd

df = pd.read_csv("automobile_data1.csv")
car_Manufacturers = df.groupby('company')
toyotaDf = car_Manufacturers.get_group('toyota')
toyotaDf

#or

#df=pd.DataFrame(pd.read_csv(r'C:\\Users\\Milaan\\01_Learn_Python4Data\\10_Python_Pandas_Module\\automobile_data1.csv'))
#df[(df.company == 'toyota')]

	Unnamed: 0	index	company	body-style	wheel-base	length	engine-type	num-of-cylinders	horsepower	average-mileage	price
48	48	66	toyota	hatchback	95.7	158.7	ohc	four	62	35	5348.0
49	49	67	toyota	hatchback	95.7	158.7	ohc	four	62	31	6338.0
50	50	68	toyota	hatchback	95.7	158.7	ohc	four	62	31	6488.0
51	51	69	toyota	wagon	95.7	169.7	ohc	four	62	31	6918.0
52	52	70	toyota	wagon	95.7	169.7	ohc	four	62	27	7898.0
53	53	71	toyota	wagon	95.7	169.7	ohc	four	62	27	8778.0
54	54	79	toyota	wagon	104.5	187.8	dohc	six	156	19	15750.0

Exercise 5: Count total cars per company#

import pandas as pd

df = pd.read_csv("automobile_data1.csv")
df['company'].value_counts()

toyota           7
bmw              6
nissan           5
mazda            5
volkswagen       4
audi             4
mitsubishi       4
mercedes-benz    4
isuzu            3
alfa-romero      3
honda            3
jaguar           3
chevrolet        3
porsche          3
volvo            2
dodge            2
Name: company, dtype: int64

Exercise 6: Find each company’s Higesht price car#

import pandas as pd

df = pd.read_csv("automobile_data1.csv")
car_Manufacturers = df.groupby('company')
priceDf = car_Manufacturers['company','price'].max()
priceDf

<ipython-input-7-c8f749f7a9ca>:5: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
  priceDf = car_Manufacturers['company','price'].max()

	company	price
company
alfa-romero	alfa-romero	16500.0
audi	audi	18920.0
bmw	bmw	41315.0
chevrolet	chevrolet	6575.0
dodge	dodge	6377.0
honda	honda	12945.0
isuzu	isuzu	6785.0
jaguar	jaguar	36000.0
mazda	mazda	18344.0
mercedes-benz	mercedes-benz	45400.0
mitsubishi	mitsubishi	8189.0
nissan	nissan	13499.0
porsche	porsche	37028.0
toyota	toyota	15750.0
volkswagen	volkswagen	9995.0
volvo	volvo	13415.0

Exercise 7: Find the average mileage of each car making company#

import pandas as pd

df = pd.read_csv("automobile_data1.csv")
car_Manufacturers = df.groupby('company')
mileageDf = car_Manufacturers['company','average-mileage'].mean()
mileageDf


#or

#company_grp=df.groupby('company')
#company_grp.agg({
#    'average-mileage':np.mean
#})

<ipython-input-8-c02acf54079b>:5: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
  mileageDf = car_Manufacturers['company','average-mileage'].mean()

	average-mileage
company
alfa-romero	20.333333
audi	20.000000
bmw	19.000000
chevrolet	41.000000
dodge	31.000000
honda	26.333333
isuzu	33.333333
jaguar	14.333333
mazda	28.000000
mercedes-benz	18.000000
mitsubishi	29.500000
nissan	31.400000
porsche	17.000000
toyota	28.714286
volkswagen	31.750000
volvo	23.000000

Exercise 8: Sort all cars by Price column#

import pandas as pd

carsDf = pd.read_csv("automobile_data1.csv")
carsDf = carsDf.sort_values(by=['price', 'horsepower'], ascending=False)
carsDf.head(5)

	Unnamed: 0	index	company	body-style	wheel-base	length	engine-type	num-of-cylinders	horsepower	average-mileage	price
35	35	47	mercedes-benz	hardtop	112.0	199.2	ohcv	eight	184	14	45400.0
11	11	14	bmw	sedan	103.5	193.8	ohc	six	182	16	41315.0
34	34	46	mercedes-benz	sedan	120.9	208.1	ohcv	eight	184	14	40960.0
46	46	62	porsche	convertible	89.5	168.9	ohcf	six	207	17	37028.0
12	12	15	bmw	sedan	110.0	197.0	ohc	six	182	15	36880.0

Exercise 9: Concatenate two data frames using the following conditions#

Hint: Create two data frames using the following two dictionary.

import pandas as pd

GermanCars = {'Company': ['Ford', 'Mercedes', 'BMV', 'Audi'], 'Price': [23845, 171995, 135925 , 71400]}
carsDf1 = pd.DataFrame.from_dict(GermanCars)

japaneseCars = {'Company': ['Toyota', 'Honda', 'Nissan', 'Mitsubishi '], 'Price': [29995, 23600, 61500 , 58900]}
carsDf2 = pd.DataFrame.from_dict(japaneseCars)

carsDf = pd.concat([carsDf1, carsDf2], keys=["Germany", "Japan"])
carsDf

		Company	Price
Germany	0	Ford	23845
	1	Mercedes	171995
	2	BMV	135925
	3	Audi	71400
Japan	0	Toyota	29995
	1	Honda	23600
	2	Nissan	61500
	3	Mitsubishi	58900

Exercise 10: Merge two data frames using the following condition#

Hint: Create two data frames using the following two Dicts, Merge two data frames, and append the second data frame as a new column to the first data frame.

import pandas as pd

Car_Price = {'Company': ['Toyota', 'Honda', 'BMV', 'Audi'], 'Price': [23845, 17995, 135925 , 71400]}
carPriceDf = pd.DataFrame.from_dict(Car_Price)

car_Horsepower = {'Company': ['Toyota', 'Honda', 'BMV', 'Audi'], 'horsepower': [141, 80, 182 , 160]}
carsHorsepowerDf = pd.DataFrame.from_dict(car_Horsepower)

carsDf = pd.merge(carPriceDf, carsHorsepowerDf, on="Company")
carsDf

	Company	Price	horsepower
0	Toyota	23845	141
1	Honda	17995	80
2	BMV	135925	182
3	Audi	71400	160