Convert Pandas DataFrame to Python dictionary#

In this class, you will learn how to convert Pandas DataFrame into a Python dictionary. It explains creating different kinds of dictionaries from pandas DataFrame.

Data Analyst needs to collect the data from heterogeneous sources like CSV files or SQL tables or Python data structures like a dictionary, list, etc. Such data is converted into pandas DataFrame.

After analyzing the data, we need to convert the resultant DataFrame back to its original format like CSV files or a dictionary. Or sometimes, we need to convert it into some other form.

The DataFrame.to_dict() function#

Pandas have a DataFrame.to_dict() function to create a Python dict object from DataFrame.

Syntax:

DataFrame.to_dict(orient='dict', into=<class 'dict'>)

Parameters:

  1. into: It is used to define the type of resultant dict. We can give an actual class or an empty instance.

  2. orient: It defines the structure of key-value pairs in the resultant dict. The below table shows the input parameter, the format in which it creates the dict and key-value of the resultant dict.

Note: Abbreviations are allowed. s indicates series, sp indicates split, r indicates record likewise.

Parameter

Dict format

Key

Value

'dict' (Default)

{column_label : {row_index : data}}

column label

dict of row index and data

'list'

{column_label : [data]}

column label

list of data

'series‘

{column_label : Series(data)}

column label

series of data

'split'

{'row_index' : [index], ‘column_label’ : [columns], 'data' : [data]}

row index, column labels, data

list of row index, list of columns labels, list of data

'records'

[{column_label : data}, , {column_label : data}]

column label

data

'index'

{row_index : {column_label : data}}

row index

dict of column label and data

Example to convert pandas DataFrame to dict#

In the below example, we read the input from the student_data.csv file and create a DataFrame object. It is then converted into the Python dictionary object.

Input CSV file contains a simple dataset of student data with two columns, Name and Marks.

DataFrame is converted into dict using the default 'dict' parameter.

import pandas as pd

# create dataframe from csv
studentDf = pd.read_csv("student_data.csv")
print(studentDf)

# create dict from dataframe
studentDict = studentDf.to_dict()
print("\nResult dict: \n", studentDict)
    Name  Marks
0    Nat  70.88
1  Harry  85.90
2    Joe  91.45

Result dict: 
 {'Name': {0: 'Nat', 1: 'Harry', 2: 'Joe'}, 'Marks': {0: 70.88, 1: 85.9, 2: 91.45}}

DataFrame to dict with a list of values#

It is a case when we have DataFrame, which needs to be converted into the dictionary object such that column label should be the keys in the dictionary, and all the columns’ data should be added into the resultant dict as a list of values against each key.

In that case, we can use 'list' parameter of the DataFrame.to_dict() function.

Syntax:

{column_label : [data]}

Example:

Let’s see how we can use a 'list' parameter to create DataFrame with a list of values.

import pandas as pd

# create dataframe from csv
studentDf = pd.read_csv("student_data.csv")
print(studentDf)

# create dict from dataframe
studentDict = studentDf.to_dict('list')
print("\nResult dict: \n", studentDict)
    Name  Marks
0    Nat  70.88
1  Harry  85.90
2    Joe  91.45

Result dict: 
 {'Name': ['Nat', 'Harry', 'Joe'], 'Marks': [70.88, 85.9, 91.45]}

DataFrame to dict with pandas series of values#

When we need to convert the DataFrame into dict whereas column name as a key of the dict. And row index and data as a value in the dict for the respective keys.

Syntax:

{column_label : Series(row_index data)}

In that case, we can use the 'series' parameter of DataFrame.to_dict() function.

Example:

In the below example, dict is created with two entries, one for Name column and the other for the Marks column of the DataFrame.

import pandas as pd

# create dataframe from csv
studentDf = pd.read_csv("student_data.csv")
print(studentDf)

# create dict from dataframe
studentDict = studentDf.to_dict('series')
print("\nResult dict: \n", studentDict)
    Name  Marks
0    Nat  70.88
1  Harry  85.90
2    Joe  91.45

Result dict: 
 {'Name': 0      Nat
1    Harry
2      Joe
Name: Name, dtype: object, 'Marks': 0    70.88
1    85.90
2    91.45
Name: Marks, dtype: float64}

DataFrame to dict without header and index#

When we want to collect the data from DataFrame without the column headers or we need to separate the row index and header from the data, we can use the 'split' parameter of DataFrame.to_dict() function. It splits the input DataFrame into three parts, i.e., row index, column labels, and actual data.

Syntax:

{'row_index' : [index], 'column_label' : [columns], 'data' : [data]}

Example:

We can get the data without index or header from the resultant dict using key data as shown below.

import pandas as pd

# create dataframe from csv
studentDf = pd.read_csv("student_data.csv")
print(studentDf)

studentDict = studentDf.to_dict('split')
print("\n", studentDict)

# print only data
print("\nList of values from DF without index and header: \n", studentDict['data'])
    Name  Marks
0    Nat  70.88
1  Harry  85.90
2    Joe  91.45

 {'index': [0, 1, 2], 'columns': ['Name', 'Marks'], 'data': [['Nat', 70.88], ['Harry', 85.9], ['Joe', 91.45]]}

List of values from DF without index and header: 
 [['Nat', 70.88], ['Harry', 85.9], ['Joe', 91.45]]

DataFrame to dict by row#

When we have a DataFrame where each row contains data that needs to be store in a separate dictionary object, i.e., we need a data row-wise, we can use the 'records' parameter of the DataFrame.to_dict() function.

It returns a list of dictionary objects. A dict for each row, where the key is a column label, and the value is column data.

Syntax:

{'row_index' : [index], 'column_label' : [columns], 'data' : [data]}

Example:

In the below example, we created list of dictionary for each student data.

# import pandas library
import pandas as pd

# create dataframe from csv
studentDf = pd.read_csv("student_data.csv")
print(studentDf)

# create dict from dataframe
studentDict = studentDf.to_dict('record')
print(studentDict)
    Name  Marks
0    Nat  70.88
1  Harry  85.90
2    Joe  91.45
[{'Name': 'Nat', 'Marks': 70.88}, {'Name': 'Harry', 'Marks': 85.9}, {'Name': 'Joe', 'Marks': 91.45}]
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py:1549: FutureWarning: Using short name for 'orient' is deprecated. Only the options: ('dict', list, 'series', 'split', 'records', 'index') will be used in a future version. Use one of the above to silence this warning.
  warnings.warn(

DataFrame to dict by row index#

When we have a DataFrame with row indexes and if we need to convert the data of each row from DataFrame to dict, we can use the index parameter of the DataFrame.to_dict() function.

It returns a list of dictionary objects. A dict is created for each row. Where the key is a row index, and the value is dict of column label and data.

Syntax:

{row_index : {column_label : data}}

Example:

In the below example dict object is created for each row of student data.

import pandas as pd

# create dataframe from csv
studentDf = pd.read_csv("student_data.csv")
print(studentDf)

# create dict from dataframe
studentDict = studentDf.to_dict('index')
print(studentDict)
    Name  Marks
0    Nat  70.88
1  Harry  85.90
2    Joe  91.45
{0: {'Name': 'Nat', 'Marks': 70.88}, 1: {'Name': 'Harry', 'Marks': 85.9}, 2: {'Name': 'Joe', 'Marks': 91.45}}

DataFrame to dict with one column as the key#

In this section, we target the use case when we need to create a dict from DataFrame where one column as a key of dict and other columns as the value of the dict.

Suppose we have student DataFrame with two columns, student’s Name, and student’s Marks. And we need to store each student’s data in the dict where the student name is the Key and their marks as a Value of the dict.

We can do it in various ways, as shown below:

  • Using df.set_index('Col1').to_dict()['Col2']

  • Using zip(df.Col1, df.Col2)

  • Using df.set_index('Col1').T.to_dict('list')

Example:

Below example uses df.set_index('Col1').to_dict()['Col2'] to get the expected output.

import pandas as pd

# create dataframe from csv
studentDf = pd.read_csv("student_data.csv")
print(studentDf)

# create dict with Name as key and marks as value
studentDict = studentDf.set_index('Name').to_dict()['Marks']

print(studentDict)
    Name  Marks
0    Nat  70.88
1  Harry  85.90
2    Joe  91.45
{'Nat': 70.88, 'Harry': 85.9, 'Joe': 91.45}

We can also achieve the same result using zip() the function.

# create dict with Name as key and marks as value
studentDict = dict(zip(studentDf.Name, studentDf.Marks))

If we want to collect the column data into the list, it can be done by applying transpose operation on the DataFrame and then converting it into dict.

import pandas as pd

# create dataframe from csv
studentDf = pd.read_csv("student_data.csv")
print(studentDf)

# create dict with Name as key and marks as value
studentDict = studentDf.set_index('Name').T.to_dict('list')
print(studentDict)
    Name  Marks
0    Nat  70.88
1  Harry  85.90
2    Joe  91.45
{'Nat': [70.88], 'Harry': [85.9], 'Joe': [91.45]}

DataFrame to dict using into parameter#

While converting a DataFrame to dict if we need output dict to be of a particular type, we can use the parameter into of DataFrame.to_dict() function. We can specify the class name or the instance of the class for the resultant dict.

Example:

In the below example, we converted DataFrame to the dict of type OrderedDict.

# import pandas library
from collections import OrderedDict
import pandas as pd

# create dataframe from csv
studentDf = pd.read_csv("student_data.csv")
print(studentDf)

# convert dataframe to ordered dict
studentDict = studentDf.to_dict(into=OrderedDict)
print(studentDict)
    Name  Marks
0    Nat  70.88
1  Harry  85.90
2    Joe  91.45
OrderedDict([('Name', OrderedDict([(0, 'Nat'), (1, 'Harry'), (2, 'Joe')])), ('Marks', OrderedDict([(0, 70.88), (1, 85.9), (2, 91.45)]))])