Indexing and Selecting Data (loc & iloc) in Python

When working with data in Python, one of the most important skills is data indexing and selection. It allows you to extract specific rows, columns, or values from a dataset efficiently. In data science, this is commonly done using the library pandas, which provides powerful tools like loc and iloc.

Note: These methods belong to pandas, not NumPy. numpy is mainly used for numerical operations, while pandas is designed for structured data like tables.


What is Indexing in Data Analysis?

Indexing means selecting specific parts of a dataset. In real-world data, you rarely use the entire dataset at once. Instead, you extract relevant rows or columns.

For example:

  • Selecting a student’s record from a table
  • Filtering sales data for a specific month
  • Extracting a column like “Salary” or “Age”

This is where loc and iloc become very useful.


Introduction to loc and iloc

1. loc (Label-based indexing)

The loc function is used to select data using labels (names of rows or columns).

Syntax:

df.loc[row_label, column_label]

Example:

import pandas as pddata = {
'Name': ['Amit', 'Riya', 'John'],
'Age': [20, 22, 21]
}df = pd.DataFrame(data, index=['a', 'b', 'c'])print(df.loc['a'])

Output:

Name    Amit
Age 20
Name: a, dtype: object

Key Points of loc:

  • Uses row/column labels
  • Includes both start and end labels when slicing
  • Supports boolean conditions

Example with condition:

df.loc[df['Age'] > 20]

2. iloc (Integer-based indexing)

The iloc function is used for selecting data based on integer position (index numbers).

Syntax:

df.iloc[row_index, column_index]

Example:

print(df.iloc[0])

Output:

Name    Amit
Age 20
Name: a, dtype: object

Key Points of iloc:

  • Uses integer positions (0, 1, 2, …)
  • Works like Python list indexing
  • Does NOT include the end index in slicing

Example:

print(df.iloc[0:2])

This will return the first two rows only.


Difference Between loc and iloc

Featurelociloc
TypeLabel-basedInteger-based
InputNames/labelsIndex numbers
SlicingInclusiveExclusive of end
UsageReal-world labeled dataPosition-based selection

Practical Example

import pandas as pddata = {
'Student': ['Amit', 'Riya', 'John', 'Sara'],
'Marks': [85, 90, 78, 88]
}df = pd.DataFrame(data)# Using loc
print(df.loc[1, 'Student'])# Using iloc
print(df.iloc[1, 0])

Both will output:

Riya

Why loc and iloc are Important?

In data science and machine learning, datasets are often large. Efficient data selection helps in:

  • Cleaning data
  • Filtering useful information
  • Preparing training datasets
  • Performing analysis faster

Without proper indexing, handling large datasets becomes difficult and inefficient.


Common Mistakes to Avoid

  1. Confusing loc and iloc
    • loc → labels
    • iloc → positions
  2. Using string labels in iloc (not allowed)
  3. Forgetting slicing rules:
    • loc includes end value
    • iloc excludes end value

Understanding indexing and selecting data using loc and iloc is essential for anyone learning data analysis with pandas. While numpy is powerful for numerical computations, pandas provides structured data handling features that make data selection simple and efficient.

Mastering these concepts will help you work confidently with datasets, perform analysis faster, and build a strong foundation for data science and machine learning.

For More Information and Updates, Connect With Us

Stay connected and keep learning with Emancipation!

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Social Media Auto Publish Powered By : XYZScripts.com