How to Use SQL in Pandas using pandasql Queries?

Dec-06-2023
Techsolutionstuff
MySQL

Data manipulation might sound complicated, but it is not actually! If you're familiar with SQL and want to work with Pandas, there's a fantastic thing for you: pandasql. We are here to help you connect the dots between SQL and Pandas in a straightforward way.

Throughout this blog, we'll guide you through pandasql, showing you how to apply SQL-style commands to filter, merge, and manage your data in Pandas. It doesn't matter if you're a data enthusiast, analyst, or just someone keen to learn – we'll make using SQL in Pandas a breeze.

Get ready to explore the best of both worlds and supercharge your data skills!

How to Use pandasql?

The pandasql Python library helps you query pandas dataframes using SQL commands, without connecting to an SQL server. It operates using SQLite syntax, automatically identifying any pandas dataframe and treating it as a standard SQL table.

Set Up the Environment

First, install pandasql:

pip install pandasql

Then, import the required packages:

from pandasql import sqldf
import pandas as pd

We imported the 'sqldf' function directly from 'pandasql', and it's essentially the library's most meaningful function. As the name suggests, it allows you to query dataframes using SQL syntax. Besides this function, 'pandasql' includes two straightforward built-in datasets that can be loaded using the self-explanatory functions: 'load_births()' and 'load_meat'.

pandasql Syntax

sqldf(query, env=None)

Here, the "query" parameter is important, and it takes a SQL query as a string. Additionally, there is an optional parameter "env," which is rarely used and can take on the values of either "locals()" or "globals()." This parameter allows sqldf to access the corresponding set of variables in your Python environment.

The sqldf function responds to the query as a pandas dataframe.

Examples of Using pandasql

Now, let’s take a more detailed look at running SQL queries on pandas dataframes using the sqldf function. For practice, let's use a built-in dataset of the seaborn library—penguins:

import seaborn as sns
penguins = sns.load_dataset('penguins')
print(penguins.head())

Output:

No.	Species	Island	bill_length_mm	bill_depth_mm	flipper_length_mm
0	Adelie	Torgersen	39.1	18.7	181.0
1	Adelie	Torgersen	39.5	17.4	186.0
2	Adelie	Torgersen	40.3	18.0	195.0
3	Adelie	Torgersen	NA	NA	NA
4	Adelie	Torgersen	36.7	19.3	193.0

No.	body_mass_g	sex
0	3750.0	Male
1	3800.0	Female
2	3250.0	Female
3	NA	NA
4	3450.0	Female

Extracting Data With pandasql

print(sqldf('''SELECT species, island 
FROM penguins 
LIMIT 5'''))

Output:

No.	Species	Island
0	Adelie	Torgersen
1	Adelie	Torgersen
2	Adelie	Torgersen
3	Adelie	Torgersen
4	Adelie	Torgersen

Here, we have extracted information about the species and location of the first five penguins from the penguins dataframe. Note that running the sqldf function returns a pandas dataframe:

print(type(sqldf('''SELECT species, island 
                      FROM penguins 
                     LIMIT 5''')))

Output:

<class 'pandas.core.frame.DataFrame'>

In pure pandas, it would be:

print(penguins[['species', 'island']].head())

Sorting Data With pandasql

print(sqldf('''SELECT body_mass_g 
                 FROM penguins 
                ORDER BY body_mass_g DESC 
                LIMIT 5''')

Output:

No.	body_mass_g
0	6300.0
1	6050.0
2	6050.0
3	6000.0
4	5950.0

We sorted our penguins by body mass in descending order and displayed the top five values of body mass.

In pandas, it would be:

print(penguins['body_mass_g'].sort_values(ascending=False, 
ignore_index=True).head())

Output:

0	6300.0
1	6050.0
2	6000.0
3	6000.0
4	5950.0

Name: body_mass_g, dtype: float64

Filtering Data With pandasql

Let’s say we have a group of penguins who are male and have flippers longer than 210 mm:

print(sqldf('''SELECT DISTINCT species
                 FROM penguins 
                WHERE sex = 'Male' 
                  AND flipper_length_mm > 210'''))

Output:
species
0 Chinstrap
1 Gentoo

Here, we filtered the data with these two conditions: sex = 'Male' and flipper_length_mm > 210.

The same code in pandas would look a bit more overwhelming:

print(penguins[(penguins['sex'] == 'Male') & (penguins['flipper_length_mm'] > 210)]['species'].unique())

Output:

['Chinstrap' 'Gentoo']

Grouping and Aggregating Data With pandasql

Now, let's try data grouping and aggregation to find each species’ longest bill in the dataframe:

print(sqldf('''SELECT species, MAX(bill_length_mm)
                 FROM penguins 
                GROUP BY species'''))

Output:

No.	species	MAX(bill_length_mm)
0	Adelie	46.0
1	Chinstrap	58.0
2	Gentoo	59.6

Same code in pandas:

print(penguins[['species', 'bill_length_mm']].groupby('species', as_index=False).max())

Output:

No.	species	bill_length_mm
0	Adelie	46.0
1	Chinstrap	58.0
2	Gentoo	59.6

Conclusion

Using SQL within Pandas via the pandasql library is an efficient way to handle data manipulation. The seamless integration of SQL queries with Pandas dataframes empowers users to employ their SQL expertise within the Python environment.

This approach simplifies complex data operations, creating succinct, readable code that enhances data analysis workflows. With the power of SQL and pandasql, you can perform diverse operations with ease, from filtering and aggregating data to handling complex joins.

Ultimately, SQL and Pandas improve the analytical capabilities, creating a smoother and more powerful data processing experience.

You might also like:

Techsolutionstuff | The Complete Guide

I'm a software engineer and the founder of techsolutionstuff.com. Hailing from India, I craft articles, tutorials, tricks, and tips to aid developers. Explore Laravel, PHP, MySQL, jQuery, Bootstrap, Node.js, Vue.js, and AngularJS in our tech stack.

FEATURE POSTS

How to Use SQL in Pandas using pandasql Queries?

Techsolutionstuff | The Complete Guide

RECOMMENDED POSTS

How To Download File On The FTP Server Using PHP

Socialite Login with Facebook Account In Laravel 9

How To Create Dependent Dropdown In Laravel

Next Previous Link Button Pagination Laravel

How to Insert Data in MongoDB using Node.js

How To Install Vue 3 In Laravel 9 Step By Step

How to Create Custom Middleware in Laravel 11

How To Solve The Page Expired 419 Error In Laravel

Change Date Format Using Carbon In Laravel

Exploring Top 10 NPM Packages for Node.js

FEATURE POSTS

Laravel 9 Create Multi Language Website

How to Create Custom Class in Laravel 11

Host is not allowed to connect to this MariaDB server XAMPP

How to Add Primary Key in Laravel 10 Migration

How To Solve The Page Expired 419 Error In Laravel

Converting JSON to CSV in Python using Pandas

CATEGORIES

TOOLS

Online Json Formatter and Viewer

Online Password Generator

Online Character Counter

Online CSS Minifier / Compressor

Online CSS Beautifier / Formatter

POPULAR TAGS

RECENT POSTS

How to Generate QR Codes in Larave...

Integrate Stripe Payment Gateway i...

Laravel 11 REST API With Passport...

Laravel 11 Cron Job Task Schedulin...

Laravel 11 Send Email with Attachm...

Laravel 11 REST API Authentication...

Laravel 11 Ajax Form Submit With V...

Laravel 11 Create, Run and Rollbac...

How to Upload Image using AJAX in...

How to Send Email using Queue in L...

How to Use SQL in Pandas using pandasql Queries?

Techsolutionstuff | The Complete Guide

RECOMMENDED POSTS

FEATURE POSTS

CATEGORIES

TOOLS

POPULAR TAGS

RECENT POSTS

SUBSCRIBE YOUR EMAIL ADDRESS