Back to Articles
Pandas 101: The Ultimate Data Science Companion

Pandas 101: The Ultimate Data Science Companion

Pandas module is a well-liked open-source tool for handling and analyzing data. It offers effective tools for manipulating tabular data, such as the ability to read and write data in a variety of formats, clean and transform data, choose and filter data based on a variety of criteria, aggregate and summarize data, and visualize data. This blog will tell you few capabilities about pandas. To explore more you can check out pandas official documentation.

Install Pandas

pip install pandas

Load File

Pandas supports different types of file formats including XML, HTML, JSON, XLSX, CSV, ZIP, TXT.


  import pandas as pd
  dataframe = pd.read_csv('filename.csv') # Loading data from a CSV file
  dataframe = pd.read_excel('filename.xlsx') # Loading data from an Excel file
  dataframe = pd.read_json('filename.json') # Loading data from a Json file
    

Viewing Data

Use the head method to view the first few rows of the DataFrame, and the tail method to view the last few rows.


  dataframe.head() # Displays the first 5 rows
  dataframe.head(10) # Displays the first 10 rows
  dataframe.tail() # Displays the last 5 rows
  dataframe.tail(10) # Displays the last 10 rows
  dataframe.nlargest(2, 'column_name') # Top n rows with the largest values
  dataframe.nsmallest(2, 'column_name') # Top n rows with the smallest values
  dataframe.info() # Displays the summary of the dataframe
  dataframe.describe() # Generates descriptive statistics
    

Additional methods:

  • dataframe.columns - View column names
  • dataframe.index - View index name
  • dataframe['column_name'].value_counts() - Count unique occurrences
  • dataframe['column_name'].tolist() - List column values

Data Selection

Use loc and iloc methods to select specific columns and rows:


  dataframe['column_name'] # Selecting a single column
  dataframe[['column1', 'column2']] # Selecting multiple columns
  dataframe.loc[row_index, 'column_name'] # Label-based selection
  dataframe.iloc[row_index, column_index] # Index-based selection
    

Data Manipulation

Pandas provides powerful methods for data manipulation:


  dataframe['new_column'] = dataframe['column1'] + dataframe['column2'] # Create new column
  dataframe.drop(['column1', 'column2'], axis=1, inplace=True) # Drop columns
  dataframe.rename(columns={'old_name': 'new_name'}, inplace=True) # Rename columns
  dataframe.replace(to_replace='old_value', value='new_value', inplace=True) # Replace values
    

Filtering Data


  dataframe[dataframe['column_name'] > value] # Filter by condition
  dataframe[dataframe['column_name'].isin(['value1', 'value2'])] # Filter by list of values
    

Grouping Data


  dataframe.groupby('category')['value'].mean() # Group by category and calculate mean
    

Sorting Data


  dataframe.sort_values('column_name', ascending=True) # Sort by column
    

Merge, Concat, and Join


  dataframe1.merge(dataframe2, on='column_name', how='inner') # Inner join
  pd.concat([dataframe1, dataframe2], axis=0) # Concatenate
    

String Operations


  dataframe['column_name'].str.lower() # Convert text to lowercase
    

Reshaping Data


  dataframe.pivot_table(values='value', index='index_column', columns='column_name') # Pivot table
    

Handling Time Series Data


  dataframe['date_column'] = pd.to_datetime(dataframe['date_column']) # Convert to datetime
    

In conclusion, pandas is a powerful library for data processing and analysis in Python. It provides a wide range of capabilities for working with tabular data effectively.