Home

Learn Pandas in One Article

What is Pandas?

Pandas is one of the most widely used Python packages which is a simple and powerful tool for Data Science.

Importing Pandas

import pandas as pd

Pandas Data Structure

There are two types of data structures in Pandas, Series and DataFrame.

Series

Series is a one-dimensional labeled array that can hold any data type.

Example:

pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])

DataFrame

DataFrame is a two-dimensional and heterogeneous tabular data structure.

So, Series is the data structure for a single column of a DataFrame

Example:

data_mobile = {'Mobile': ['iPhone', 'Samsung', 'Redmi'], 'Color': ['Red', 'White', 'Black'], 'Price': ['High', 'Medium', 'Low']}

df = pd.DataFrame(data_mobile, columns=['Mobile', 'Color', 'Price'])

Reading Files in Pandas:

Pandas library offers a set of functions that can read a wide range of files

pd.read_csv("filename")

pd.read_table("filename")

pd.read_excel("filename")

pd.read_sql(query, connection_object)

pd.read_json(json_string)

Writing Files in Pandas:

Similarly, Pandas library offers many functions which are useful for writing data into a file.

df.to_csv("filename")

df.to_excel("filename")

df.to_sql(table_name, connection_object)

df.to_json("filename")

Creating Test Data

Pandas library equally allows to create test data to test code segments.

pd.DataFrame(np.random.rand(4,3)) # 3 columns and 4 rows of random floats

Pandas Operations

Viewing

df.head(n): looking at first n rows of the DataFrame.

df.tail(n): looking at last n rows of the DataFrame.

df.shape(): gives the number of rows and columns.

df.info(): gives information of Index, Datatype and Memory.

df.describe(): gives summary statistics for numerical columns

Selecting

If we want to select a set of data from our DataFrame. There are two ways to do that. Selecting by position and selecting by label.

Selecting by position with iloc:

df.iloc[0]: selects first row of data frame

df.iloc[1]: selects second row of data frame

df.iloc[-1]: selects last row of data frame

df.iloc[:,0]: selects first column of data frame

df.iloc[:,1]: selects second column of data frame

Selecting by label using loc:

df.loc([0], [column labels]): selects single value by row position & column labels

df.loc[‘row1′:’row3’, ‘column1′:’column3’]: selects and slicing on labels

Sorting

df.sort_index(): sorts by labels along an axis

df.sort_values(column1): sorts values by column1 in ascending order

df.sort_values(column2, ascending=False): sorts values by column2 in descending order

Groupby

By using groupby you can create a group of categories and then it can be helpful while applying a function to the categories.

df.groupby(column): returns a groupby object for values from one column

df.groupby([column1,column2]): returns a groupby object values from multiple columns

df.groupby(column1)[column2].mean(): returns the mean of the values in column2, grouped by the values in column1

df.groupby(column1)[column2].median(): returns the median of the values in column2, grouped by the values in column1

Calculations

df.mean(): mean

df.median(): median

df.std(): standard deviation:

df.max(): Max

df.min(): Min

df.count(): number of non-null values i

df.describe(): summary statistics

Plotting

df.plot.hist(): histogram

df.plot.scatter(x=’column1′,y=’column2′): scatter plot

Data World