Home > Notes > Python Pandas

Pandas DataFrames

Pandas DataFrames

What is Pandas?

Pandas is a Python library used for working with data sets. It has functions for analyzing, cleaning, exploring, and manipulating data. The name “Pandas” has a reference to both “Panel Data”, and “Python Data Analysis” and was created by Wes McKinney in 2008. In general when we deal with data pandas libray is used very commonly due to some important functions in data science data analysis data cleaning data exploration data manipulation Pandas – Panel Data and python data analysis it’s a multidimensional data involving measurements over time Pandas alone cannot perform. It is built on numPy, as it can also handle ndimensional array. So both libraries required Features – series obj & data frame,aligns data, slicing, indexing, subseting, handles missing data, groups by functionality Features – merging & joining, labeling of axes hierarchially, time-series functionality, reshaping & robust input/output too Pandas – great for > 500k rows, works great for tabular data, arbitrary matrix & time series matrix Numpy – < 500k rows, however memory efficinet CodeText

Why Use Pandas?

Pandas allows us to analyze big data and make conclusions based on statistical theories. Pandas can clean messy data sets, and make them readable and relevant. Relevant data is very important in data science.

What Can Pandas Do?

Pandas gives you answers about the data. Like: Is there a correlation between two or more columns? What is average value? Max value? Min value? Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or NULL values. This is called cleaning the data.

What is a DataFrame?

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

 import pandas as pd
 data = {
 "calories": [420, 380, 390],
 "duration": [50, 40, 45]
 }
 #load data into a DataFrame object:
 df = pd.DataFrame(data)
 print(df)

Locate Row

As you can see from the result above, the DataFrame is like a table with rows and columns. Pandas use the loc attribute to return one or more specified row(s)

 print(df.loc[0])
#Example
 #Return row 0 and 1:
 #use a list of indexes:
 print(df.loc[[0, 1]])
'''
Named Indexes
 With the index argument, you can name your own indexes.
'''
import pandas as pd
 data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
 }
 df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
 print(df)

Locate Named Indexes

Use the named index in the loc attribute to return the specified row(s).

#refer to the named index:
print(df.loc["day2"])
#  Return “day2”:

What is a DataFrame?

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

import pandas as pd
 data = {
 "calories": [420, 380, 390],
 "duration": [50, 40, 45]
 }
 #load data into a DataFrame object:
 df = pd.DataFrame(data)
 print(df)

loc functions

Pandas use the loc attribute to return one or more specified row(s)

Example
 #Return row 0 and 1:
 #use a list of indexes:
 print(df.loc[[0, 1]])

Named Indexes

With the index argument, you can name your own indexes.

import pandas as pd
 data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
 }
 df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
 print(df)

Locate Named Indexes

Use the named index in the loc attribute to return the specified row(s)

 #refer to the named index:
 print(df.loc["day2"])