Convert Pandas DataFrame to Python dictionary#
In this class, you will learn how to convert Pandas DataFrame into a Python dictionary. It explains creating different kinds of dictionaries from pandas DataFrame.
Data Analyst needs to collect the data from heterogeneous sources like CSV files or SQL tables or Python data structures like a dictionary, list, etc. Such data is converted into pandas DataFrame.
After analyzing the data, we need to convert the resultant DataFrame back to its original format like CSV files or a dictionary. Or sometimes, we need to convert it into some other form.
The DataFrame.to_dict()
function#
Pandas have a DataFrame.to_dict()
function to create a Python dict object from DataFrame.
Syntax:
DataFrame.to_dict(orient='dict', into=<class 'dict'>)
Parameters:
into
: It is used to define the type of resultantdict
. We can give an actual class or an empty instance.orient
: It defines the structure of key-value pairs in the resultantdict
. The below table shows the input parameter, the format in which it creates thedict
and key-value of the resultantdict
.
Note: Abbreviations are allowed.
s
indicates series,sp
indicates split,r
indicates record likewise.
Parameter |
Dict format |
Key |
Value |
---|---|---|---|
|
|
column label |
dict of row index and data |
|
|
column label |
list of data |
|
|
column label |
series of data |
|
|
row index, column labels, data |
list of row index, list of columns labels, list of data |
|
|
column label |
data |
|
|
row index |
dict of column label and data |
Example to convert pandas DataFrame to dict#
In the below example, we read the input from the student_data.csv file and create a DataFrame object. It is then converted into the Python dictionary object.
Input CSV file contains a simple dataset of student data with two columns, Name
and Marks
.
DataFrame is converted into dict
using the default 'dict'
parameter.
import pandas as pd
# create dataframe from csv
studentDf = pd.read_csv("student_data.csv")
print(studentDf)
# create dict from dataframe
studentDict = studentDf.to_dict()
print("\nResult dict: \n", studentDict)
Name Marks
0 Nat 70.88
1 Harry 85.90
2 Joe 91.45
Result dict:
{'Name': {0: 'Nat', 1: 'Harry', 2: 'Joe'}, 'Marks': {0: 70.88, 1: 85.9, 2: 91.45}}
DataFrame to dict with a list of values#
It is a case when we have DataFrame, which needs to be converted into the dictionary object such that column label should be the keys in the dictionary, and all the columns’ data should be added into the resultant dict as a list of values against each key.
In that case, we can use 'list'
parameter of the DataFrame.to_dict()
function.
Syntax:
{column_label : [data]}
Example:
Let’s see how we can use a 'list'
parameter to create DataFrame with a list of values.
import pandas as pd
# create dataframe from csv
studentDf = pd.read_csv("student_data.csv")
print(studentDf)
# create dict from dataframe
studentDict = studentDf.to_dict('list')
print("\nResult dict: \n", studentDict)
Name Marks
0 Nat 70.88
1 Harry 85.90
2 Joe 91.45
Result dict:
{'Name': ['Nat', 'Harry', 'Joe'], 'Marks': [70.88, 85.9, 91.45]}
DataFrame to dict with pandas series of values#
When we need to convert the DataFrame into dict
whereas column name as a key of the dict
. And row index and data as a value in the dict
for the respective keys.
Syntax:
{column_label : Series(row_index data)}
In that case, we can use the 'series'
parameter of DataFrame.to_dict()
function.
Example:
In the below example, dict
is created with two entries, one for Name
column and the other for the Marks
column of the DataFrame.
import pandas as pd
# create dataframe from csv
studentDf = pd.read_csv("student_data.csv")
print(studentDf)
# create dict from dataframe
studentDict = studentDf.to_dict('series')
print("\nResult dict: \n", studentDict)
Name Marks
0 Nat 70.88
1 Harry 85.90
2 Joe 91.45
Result dict:
{'Name': 0 Nat
1 Harry
2 Joe
Name: Name, dtype: object, 'Marks': 0 70.88
1 85.90
2 91.45
Name: Marks, dtype: float64}
DataFrame to dict without header and index#
When we want to collect the data from DataFrame without the column headers or we need to separate the row index and header from the data, we can use the 'split'
parameter of DataFrame.to_dict()
function. It splits the input DataFrame into three parts, i.e., row index, column labels, and actual data.
Syntax:
{'row_index' : [index], 'column_label' : [columns], 'data' : [data]}
Example:
We can get the data without index or header from the resultant dict
using key data
as shown below.
import pandas as pd
# create dataframe from csv
studentDf = pd.read_csv("student_data.csv")
print(studentDf)
studentDict = studentDf.to_dict('split')
print("\n", studentDict)
# print only data
print("\nList of values from DF without index and header: \n", studentDict['data'])
Name Marks
0 Nat 70.88
1 Harry 85.90
2 Joe 91.45
{'index': [0, 1, 2], 'columns': ['Name', 'Marks'], 'data': [['Nat', 70.88], ['Harry', 85.9], ['Joe', 91.45]]}
List of values from DF without index and header:
[['Nat', 70.88], ['Harry', 85.9], ['Joe', 91.45]]
DataFrame to dict by row#
When we have a DataFrame where each row contains data that needs to be store in a separate dictionary object, i.e., we need a data row-wise, we can use the 'records'
parameter of the DataFrame.to_dict()
function.
It returns a list of dictionary objects. A dict
for each row, where the key is a column label, and the value is column data.
Syntax:
{'row_index' : [index], 'column_label' : [columns], 'data' : [data]}
Example:
In the below example, we created list of dictionary for each student data.
# import pandas library
import pandas as pd
# create dataframe from csv
studentDf = pd.read_csv("student_data.csv")
print(studentDf)
# create dict from dataframe
studentDict = studentDf.to_dict('record')
print(studentDict)
Name Marks
0 Nat 70.88
1 Harry 85.90
2 Joe 91.45
[{'Name': 'Nat', 'Marks': 70.88}, {'Name': 'Harry', 'Marks': 85.9}, {'Name': 'Joe', 'Marks': 91.45}]
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py:1549: FutureWarning: Using short name for 'orient' is deprecated. Only the options: ('dict', list, 'series', 'split', 'records', 'index') will be used in a future version. Use one of the above to silence this warning.
warnings.warn(
DataFrame to dict by row index#
When we have a DataFrame with row indexes and if we need to convert the data of each row from DataFrame to dict
, we can use the index
parameter of the DataFrame.to_dict()
function.
It returns a list of dictionary objects. A dict
is created for each row. Where the key is a row index, and the value is dict
of column label and data.
Syntax:
{row_index : {column_label : data}}
Example:
In the below example dict
object is created for each row of student data.
import pandas as pd
# create dataframe from csv
studentDf = pd.read_csv("student_data.csv")
print(studentDf)
# create dict from dataframe
studentDict = studentDf.to_dict('index')
print(studentDict)
Name Marks
0 Nat 70.88
1 Harry 85.90
2 Joe 91.45
{0: {'Name': 'Nat', 'Marks': 70.88}, 1: {'Name': 'Harry', 'Marks': 85.9}, 2: {'Name': 'Joe', 'Marks': 91.45}}
DataFrame to dict with one column as the key#
In this section, we target the use case when we need to create a dict
from DataFrame where one column as a key of dict
and other columns as the value of the dict
.
Suppose we have student DataFrame with two columns, student’s Name, and student’s Marks. And we need to store each student’s data in the dict
where the student name is the Key and their marks as a Value of the dict
.
We can do it in various ways, as shown below:
Using
df.set_index('Col1').to_dict()['Col2']
Using
zip(df.Col1, df.Col2)
Using
df.set_index('Col1').T.to_dict('list')
Example:
Below example uses df.set_index('Col1').to_dict()['Col2']
to get the expected output.
import pandas as pd
# create dataframe from csv
studentDf = pd.read_csv("student_data.csv")
print(studentDf)
# create dict with Name as key and marks as value
studentDict = studentDf.set_index('Name').to_dict()['Marks']
print(studentDict)
Name Marks
0 Nat 70.88
1 Harry 85.90
2 Joe 91.45
{'Nat': 70.88, 'Harry': 85.9, 'Joe': 91.45}
We can also achieve the same result using zip()
the function.
# create dict with Name as key and marks as value
studentDict = dict(zip(studentDf.Name, studentDf.Marks))
If we want to collect the column data into the list, it can be done by applying transpose operation on the DataFrame and then converting it into dict.
import pandas as pd
# create dataframe from csv
studentDf = pd.read_csv("student_data.csv")
print(studentDf)
# create dict with Name as key and marks as value
studentDict = studentDf.set_index('Name').T.to_dict('list')
print(studentDict)
Name Marks
0 Nat 70.88
1 Harry 85.90
2 Joe 91.45
{'Nat': [70.88], 'Harry': [85.9], 'Joe': [91.45]}
DataFrame to dict using into
parameter#
While converting a DataFrame to dict
if we need output dict
to be of a particular type, we can use the parameter into of DataFrame.to_dict()
function. We can specify the class name or the instance of the class for the resultant dict
.
Example:
In the below example, we converted DataFrame to the dict
of type OrderedDict.
# import pandas library
from collections import OrderedDict
import pandas as pd
# create dataframe from csv
studentDf = pd.read_csv("student_data.csv")
print(studentDf)
# convert dataframe to ordered dict
studentDict = studentDf.to_dict(into=OrderedDict)
print(studentDict)
Name Marks
0 Nat 70.88
1 Harry 85.90
2 Joe 91.45
OrderedDict([('Name', OrderedDict([(0, 'Nat'), (1, 'Harry'), (2, 'Joe')])), ('Marks', OrderedDict([(0, 70.88), (1, 85.9), (2, 91.45)]))])