Есть датафрейм:
import pandas as pd
d = {'Id':[14038.0, 15053.0, 4765.0, 10783.0, 12915.0,5809.0, 11993.0, 5172.0, 10953.0, 11935.0,7917.0],
'Square':[48.0, 65.7, 44.9, 39.6, 80.4,53.4, 80.3, 64.5, 53.8, 64.7, 212.9],
'LifeSquare':[29.4, 40.0, 29.2, 23.8, 46.7,52.7,0 ,0 , 52.4, 0, 211.2]}
df = pd.DataFrame(d)
Задача — Скорректировать параметр LifeSquare перед обучением модели.
Написал функцию для отбора ближайших подобных чисел:
def square_correction(data):
item = 'LifeSquare'
valid = data.loc[~((data[item] > data['Square'] * 0.8) |
(data[item] < data['Square'] * 0.3)|
(data[item]).isna())]
invalid = data.loc[(data[item] > data['Square'] * 0.8) |
(data[item] < data['Square'] * 0.3)|
(data[item]).isna()]
best_feature, item_by_best_feature = best_params(valid, item)
for i in range(0, len(invalid[item])):
flat_id = invalid[item].index[i]
best_feature_meaning = invalid[best_feature][flat_id]
bigger = valid.loc[(valid[best_feature] >= best_feature_meaning)].reset_index().iloc[0]
smoller = valid.loc[(valid[best_feature] <= best_feature_meaning)].reset_index().iloc[-1]
difference_up = (bigger[best_feature] - data[best_feature][flat_id])
difference_down = (data[best_feature][flat_id] - smoller[best_feature])
text = f'flat id:{flat_id}. {item} was changed. {i+1} of {len(invalid[item])} done.'
if difference_up == difference_down:
print(text)
data[item][flat_id] = item_by_best_feature[best_feature_meaning]
elif not difference_up >= difference_down:
print(text)
data[item][flat_id] = bigger[item]
else:
print(text)
data[item][flat_id] = smoller[item]
print(f'best feature: {best_feature}. {len(invalid)} rows was changed.')
return data
запускаем функцию:
df = square_correction(df)
Всё идёт нормально до последней строчки, где jupyter notebook выдает ошибку:
IndexError: single positional indexer is out-of-bounds
Почему ему одно наблюдение из всех так не нравится?
P.S. На учебном датафрейме (10000 наблюдений) выдаёт ту же ошибку:
IndexError Traceback (most recent call last)
<ipython-input-17-d4ceb1216100> in <module>
----> 1 data = square_correction(data)
<ipython-input-16-c8f2bf3d18d3> in square_correction(data)
20
21
---> 22 bigger = valid.loc[(valid[best_feature] >= best_feature_meaning)].reset_index().iloc[0]
23 smoller = valid.loc[(valid[best_feature] <= best_feature_meaning)].reset_index().iloc[-1]
24
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
1498
1499 maybe_callable = com.apply_if_callable(key, self.obj)
-> 1500 return self._getitem_axis(maybe_callable, axis=axis)
1501
1502 def _is_scalar_access(self, key):
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
2228
2229 # validate the location
-> 2230 self._validate_integer(key, axis)
2231
2232 return self._get_loc(key, axis=axis)
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_integer(self, key, axis)
2137 len_axis = len(self.obj._get_axis(axis))
2138 if key >= len_axis or key < -len_axis:
-> 2139 raise IndexError("single positional indexer is out-of-bounds")
2140
2141 def _getitem_tuple(self, tup):
IndexError: single positional indexer is out-of-bounds
Indexing is an essential tool for storing and handling large and complex datasets with rows and columns. In Python, we use index values within square brackets to perform the indexing. If we try to access an index beyond the dimensions of the dataset, we will raise the error: IndexError: single positional indexer is out-of-bounds.
This tutorial will go through the error in detail, and we will go through an example scenario to learn how to solve the error.
Table of contents
- IndexError: single positional indexer is out-of-bounds
- What is an IndexError?
- What is a DataFrame?
- What is iloc()?
- Example : Accessing a Column That Does Not Exist
- Solution
- Summary
IndexError: single positional indexer is out-of-bounds
What is an IndexError?
Python’s IndexError occurs when the index specified does not lie in the range of indices in the bounds of an array. In Python, index numbers start from 0. Let’s look at an example of a typical Python array:
animals = ["lion", "sheep", "whale"]
This array contains three values, and the first element, lion, has an index value of 0. The second element, sheep, has an index value of 1. The third element, whale, has an index value of 2.
If we try to access an item at index position 3, we will raise an IndexError.
print(animals[3])
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) 1 print(animals[3]) IndexError: list index out of range
What is a DataFrame?
A DataFrame is a data structure that organizes data into a 2-dimensional table of rows and columns. The Python module Pandas works with DataFrames.
What is iloc()?
Pandas offers large-scale data analysis functions like the iloc()
function, which enables us to select particular rows, columns, or individual cells of a dataset. The iloc()
function performs integer-based indexing for selection by position. iloc()
will raise “IndexError: single positional indexer is out-of-bounds” if a requested index is out-of-bounds. However, this error will not occur if you use a slice index, for example,
array[:slice_index]
Slice indexing allows for out-of-bounds indexing, which conforms with Python/numpy slice semantics. Let’s look at an example of the IndexError.
Example : Accessing a Column That Does Not Exist
Let’s create a DataFrame and attempt to access a particular column in the DataFrame. The dataset will contain a list of five car owners and will store each car owner’s city of residence and the brand of car they own. First, we must import Pandas and then define the rows that comprise our DataFrame. One row will store names, one will store cities, and one will store cars.
import pandas as pd
df = pd.DataFrame({'Name': ['Jim', 'Lisa', 'Paul', 'Carol', 'Biff'], 'City': ['Lisbon', 'Palermo', 'Sofia', 'Munich', 'Bangkok'], 'Car': ['Mercedes', 'Bentley', 'Ferrari', 'Rolls Royce', 'Aston Martin']})
if we print the DataFrame to the console, we will get the following arrangement of data in three rows and five columns.
print(df)
Name City Car 0 Jim Lisbon Mercedes 1 Lisa Palermo Bentley 2 Paul Sofia Ferrari 3 Carol Munich Rolls Royce 4 Biff Bangkok Aston Martin
Let’s try to access the fifth column of the dataset using iloc(). In this example, it looks like:
print(df.iloc[:,5])
IndexError: single positional indexer is out-of-bounds
We raise the IndexError because we tried to access the fifth column of the dataset, and the fifth column does not exist for this particular dataset.
Solution
To solve this error, we can start by getting the shape of the dataset:
print(df.shape)
(5, 3)
This result tells us that the dataset has five rows and three columns, which means we can only use column index up to 2. Let’s try to take the car column with index 2.
print(df.iloc[:,2])
0 Mercedes 1 Bentley 2 Ferrari 3 Rolls Royce 4 Aston Martin Name: Car, dtype: object
The code runs, and we can extract the car column from the dataset and print it to the console.
We can also access one particular value in the dataset by using two separate pairs of square brackets, one for the row and one for the column. Let’s try to get the car that Jim from Lisbon owns:
# Get particular value in row jim_car = df.iloc[0][2] print(jim_car)
Mercedes
The code runs and prints the value specific to row 0 column 2.
We can take a dataset slice using a colon followed by a comma then the slice. Let’s look at an example of slicing the first two columns of the car dataset:
print(df.iloc[:, 0:2])
Name City 0 Jim Lisbon 1 Lisa Palermo 2 Paul Sofia 3 Carol Munich 4 Biff Bangko
We can also use slice indices out of the bound of the dataset; let’s use slicing to get five columns of the dataset
print(df.iloc[:, 0:5])
Name City Car 0 Jim Lisbon Mercedes 1 Lisa Palermo Bentley 2 Paul Sofia Ferrari 3 Carol Munich Rolls Royce 4 Biff Bangkok Aston Martin
Although the dataset only has three columns, we can use slice indexing for five because slice indexers allow out-of-bounds indexing. Therefore we will not raise the IndexError: single positional indexer is out-of-bounds. Go to the article titled: “How to Get a Substring From a String in Python“.
Summary
Congratulations on reading to the end of this tutorial! The error “Indexerror: single positional indexer is out-of-bounds” occurs when you try to access a row/column with an index value out of the bounds of the pandas DataFrame. To solve this error, you must use index values within the dimensions of the dataset. You can get the dimensionality of a dataset using shape. Once you know the correct index values, you can get specific values using the iloc() function, which does integer-location based indexing.
It is important to note that using a slice with integers in the iloc()
function will not raise the IndexError because slice indexers allow out-of-bounds indexing.
For further reading on Python IndexError, go to the articles:
- How to Solve Python IndexError: list index out of range
- How to Solve Python IndexError: too many indices for array
To learn more about Python for data science and machine learning, go to the online courses page on Python for the most comprehensive courses available.
Indexing in large and complex data sets plays a critical role in storing and handling data. When we deal with compound data types like lists and tuples or data sets having rows and columns in data science, we frequently use index values within square brackets to use them. In this article, we will talk about the index-based error: single positional indexer is out-of-bounds.
What is this “Indexerror: single positional indexer is out-of-bounds” error?
This is an index-based error that pops up when programmers try to access or call or use any memory that is beyond the scope of the index. Let suppose, you have a list that has five elements. This means, your index will start from 0 up till 4. But now, if you try to access or display or change the value of the 7th index, will it be possible? No, because your index range lies within 0 and 4. This is what we called bound. But, accessing elements exceeding the bound is what the Python interpreter calls an out-of-bounds situation.
Indexerror in case of dataset accessing:
Let suppose, you have a dataset Y = Dataset.iloc[:,18].values
In this case, if you are experiencing “Indexing is out of bounds” error, then most probably this is because there are less than 18 columns in your dataset, and you are trying to access something that does not exists. So, column 18 or less does not exist.
Indexerror in case of unknown DataFrame Size:
Such an error also occurs when you have to index a row or a column having a number greater than the dimensions of your DataFrame. For example, if you try to fetch the 7th column from your DataFrame when you have only three columns defined like this.
Error Code:
import pandas as pd
df = pd.DataFrame({'Name': ['Karl', 'Ray', 'Gaurav', 'Dee', 'Sue'],
'City': ['London', 'Montreal', 'Delhi', 'New York', 'Glasgow'],
'Car': ['Maruti', 'Audi', 'Ferrari', 'Rolls Royce', ' Tesla'] })
print(df)
x = df.iloc[0, 8]
print(x)
Output:
raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
This program creates an error because the second size attribute () we want to fetch does not exist.
This also happens if the programmer misunderstood the iloc() function. The iloc() is used to select a particular cell of the dataset or data in a tabular format. Any data that belongs to a particular row or column from a set of values within a dataframe or dataset.
In this function, the value before the comma(,) defines the index of rows & the after ‘,’ represents the index of columns. But if your data does not lie within the range, then iloc() won’t be able to fetch any data and hence will show this error.
Correct code:
import pandas as pd
df = pd.DataFrame({'Name': ['Karl', 'Ray', 'Gaurav', 'Dee', 'Sue'],
'City': ['London', 'Montreal', 'Delhi', 'New York', 'Glasgow'],
'Car': ['Maruti', 'Audi', 'Ferrari', 'Rolls Royce', ' Tesla'] })
print(df)
x = df.iloc[3, 0]
print("n Fetched value using the iloc() is: ", x)
Output:
Name City Car
0 Karl London Maruti
1 Ray Montreal Audi
2 Gaurav Delhi Ferrari
3 Dee New York Rolls Royce
4 Sue Glasgow Tesla
Fetched value using the iloc() is: Dee
Explanation:
First we create the DataFrame (2-D dataset) with three columns and five rows and print it. Here we have mentioned the exact row and column value for which we are not receiving any error. Therefore, to resolve such “indexerror single positional indexer is out-of-bounds” error, we have to first check the outer bound of the rows and columns existing in our dataset.
Conclusion:
To eliminate such error messages and not to encounter such errors repeatedly, programmers need to focus on the retrieval of particular count of row and columns. Also, programmers should focus on checking the valid range of index values. Also it is easy and comfortable to use «iloc()» for retrieving any value a programmer wants. But the programmer needs to make sure that they refer to the correct index values, otherwise, “Indexerror: single positional indexer is out-of-bounds” error will pop up.
In Python, an IndexError occurs when you try to access an index that is outside the valid index range of a data structure like a list, tuple, or dataframe. This error can be frustrating, especially when you are working with large datasets. In this tutorial, we will discuss how to fix the “IndexError: single positional indexer is out-of-bounds” error that occurs when you try to access an index outside the valid index range in a dataframe in Python.
Understanding the Error
Before we dive into the solution, let’s first understand the error message. The “IndexError: single positional indexer is out-of-bounds” error occurs when you try to access an index that is outside the valid index range of a dataframe. For example, if you have a dataframe with 5 rows and you try to access the 6th row, you will get this error.
Let’s reproduce this error in an example.
import pandas as pd # create a pandas dataframe df = pd.DataFrame({ 'Name': ['Jim', 'Dwight', 'Oscar', 'Tobi', 'Angela'], 'Age': [26, 30, 28, 38, 31], 'Department': ['Sales', 'Sales', 'Accounting', 'HR', 'Accounting'] }) # try to access the 6th row, row at index 5 print(df.iloc[5])
Output:
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) Cell In[9], line 11 4 df = pd.DataFrame({ 5 'Name': ['Jim', 'Dwight', 'Oscar', 'Tobi', 'Angela'], 6 'Age': [26, 30, 28, 38, 31], 7 'Department': ['Sales', 'Sales', 'Accounting', 'HR', 'Accounting'] 8 }) 10 # try to access the 6th row, row at index 5 ---> 11 print(df.iloc[5]) File ~/miniforge3/envs/dsp/lib/python3.8/site-packages/pandas/core/indexing.py:931, in _LocationIndexer.__getitem__(self, key) 928 axis = self.axis or 0 930 maybe_callable = com.apply_if_callable(key, self.obj) --> 931 return self._getitem_axis(maybe_callable, axis=axis) File ~/miniforge3/envs/dsp/lib/python3.8/site-packages/pandas/core/indexing.py:1566, in _iLocIndexer._getitem_axis(self, key, axis) 1563 raise TypeError("Cannot index by location index with a non-integer key") 1565 # validate the location -> 1566 self._validate_integer(key, axis) 1568 return self.obj._ixs(key, axis=axis) File ~/miniforge3/envs/dsp/lib/python3.8/site-packages/pandas/core/indexing.py:1500, in _iLocIndexer._validate_integer(self, key, axis) 1498 len_axis = len(self.obj._get_axis(axis)) 1499 if key >= len_axis or key < -len_axis: -> 1500 raise IndexError("single positional indexer is out-of-bounds") IndexError: single positional indexer is out-of-bounds
We get the IndexError: single positional indexer is out-of-bounds
error.
Fixing the error
To fix the “IndexError: single positional indexer is out-of-bounds” error, you need to make sure that you are accessing a valid index in the dataframe. Here are some ways to do that:
1) Use an index within the index range
If the index that you’re trying to use lies within the index range (that is, it’s a valid index in the dataframe), you’ll not get this error. For example, in the above dataframe, if we use the index 4, representing the row 5, we’ll not get an error.
# try to access the 5th row, row at index 4 print(df.iloc[4])
Output:
Name Angela Age 31 Department Accounting Name: 4, dtype: object
But we cannot always know beforehand whether an index is a valid index or not.
2) Check if the index is within the valid range using If statement
One way to avoid this error is to use conditional statements to check if the index is within the valid range before accessing it. Here’s an example:
# try to access the 6th row, row at index 5 index = 5 if index < len(df): print(df.iloc[index]) else: print("Index out of range")
Output:
Index out of range
In the above example, we first check if the row index we’re trying to access is less than the dataframe’s length. If it is, we access the row at the given index using the iloc function. If it’s not, we print a message saying that the index is out of range.
3) Using try-except
Alternatively, you can also use exception handling to handle this error.
# try to access the 6th row, row at index 5 try: index = 5 print(df.iloc[index]) except IndexError: print("Index out of range")
Output:
Index out of range
Conclusion
The “IndexError: single positional indexer is out-of-bounds” error occurs when you try to access an index outside the valid index range in a dataframe in Python. To fix the error, you need to make sure that you are accessing a valid index in the dataframe. You can check the index range, use conditional statements, or error handling to avoid this error.
You might also be interested in –
- Understand and Fix IndexError in Python
- How to Fix – IndexError list assignment index out of range
- Pandas – Get Rows by their Index and Labels
-
Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.
View all posts
Often you will get an error IndexError: single positional indexer is out-of-bounds that is referencing a row that does not exist based on its index value.
When you want to look at a particular row in Python, there is a way that you can reference the row and then the values within it.
Lets break it down further to understand how the error occurs and why and how to fix it.
How the error occurs?
When we look at the below code, it throws out the error we are trying to fix.
Digging deeper lets look at the file we are importing, and the values contained within them. From the CSV file:
Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.
View all posts