Columns must be same length as key ошибка

To solve this error, check the shape of the object you’re trying to assign the df columns (using np.shape). The second (or the last) dimension must match the number of columns you’re trying to assign to. For example, if you try to assign a 2-column numpy array to 3 columns, you’ll see this error.

A general workaround (for case 1 and case 2 below) is to cast the object you’re trying to assign to a DataFrame and join() it to df, i.e. instead of (1), use (2).

df[cols] = vals   # (1)
df = df.join(vals) if isinstance(vals, pd.DataFrame) else df.join(pd.DataFrame(vals))  # (2)

If you’re trying to replace values in an existing column and got this error (case 3(a) below), convert the object to list and assign.

df[cols] = vals.values.tolist()

If you have duplicate columns (case 3(b) below), then there’s no easy fix. You’ll have to make the dimensions match manually.


This error occurs in 3 cases:

Case 1: When you try to assign a list-like object (e.g. lists, tuples, sets, numpy arrays, and pandas Series) to a list of DataFrame column(s) as new arrays1 but the number of columns doesn’t match the second (or last) dimension (found using np.shape) of the list-like object. So the following reproduces this error:

df = pd.DataFrame({'A': [0, 1]})
cols, vals = ['B'], [[2], [4, 5]]
df[cols] = vals # number of columns is 1 but the list has shape (2,)

Note that if the columns are not given as list, pandas Series, numpy array or Pandas Index, this error won’t occur. So the following doesn’t reproduce the error:

df[('B',)] = vals # the column is given as a tuple

One interesting edge case occurs when the list-like object is multi-dimensional (but not a numpy array). In that case, under the hood, the object is cast to a pandas DataFrame first and is checked if its last dimension matches the number of columns. This produces the following interesting case:

# the error occurs below because pd.DataFrame(vals1) has shape (2, 2) and len(['B']) != 2
vals1 = [[[2], [3]], [[4], [5]]]
df[cols] = vals1

# no error below because pd.DataFrame(vals2) has shape (2, 1) and len(['B']) == 1
vals2 = [[[[2], [3]]], [[[4], [5]]]]
df[cols] = vals2

Case 2: When you try to assign a DataFrame to a list (or pandas Series or numpy array or pandas Index) of columns but the respective numbers of columns don’t match. This case is what caused the error in the OP. The following reproduce the error:

df = pd.DataFrame({'A': [0, 1]})
df[['B']] = pd.DataFrame([[2, 3], [4]]) # a 2-column df is trying to be assigned to a single column

df[['B', 'C']] = pd.DataFrame([[2], [4]]) # a single column df is trying to be assigned to 2 columns

Case 3: When you try to replace the values of existing column(s) by a DataFrame (or a list-like object) whose number of columns doesn’t match the number of columns it’s replacing. So the following reproduce the error:

# case 3(a)
df1 = pd.DataFrame({'A': [0, 1]})
df1['A'] = pd.DataFrame([[2, 3], [4, 5]]) # df1 has a single column named 'A' but a 2-column-df is trying to be assigned

# case 3(b): duplicate column names matter too
df2 = pd.DataFrame([[0, 1], [2, 3]], columns=['A','A'])
df2['A'] = pd.DataFrame([[2], [4]]) # df2 has 2 columns named 'A' but a single column df is being assigned

1: df.loc[:, cols] = vals may overwrite data inplace, so this won’t produce the error but will create columns of NaN values.

Estimated reading time: 3 minutes

Are you looking to learn python, and in the process coming across this error and trying to understand why it occurs?

In essence, this usually occurs when you have more than one data frames and in the process of writing your program you are trying to use the data frames and their data, but there is a mismatch in the no of items in each that the program cannot process until it is fixed.

A common scenario where this may happen is when you are joining data frames or splitting out data, these will be demonstrated below.

Scenario 1 – Joining data frames

Where we have df1[[‘a’]] = df2 we are assigning the values on the left side of the equals sign to what is on the right.

When we look at the right-hand side it has three columns, the left-hand side has one.

As a result the error “ValueError: Columns must be same length as key” will appear, as per the below.

import pandas as pd

list1 = [1,2,3]
list2 = [[4,5,6],[7,8,9]]

df1 = pd.DataFrame(list1,columns=['column1'])
df2 = pd.DataFrame(list2,columns=['column2','column3','column4'])

df1[['a']] = df2

The above code throws the below error:

The objective here is to have all the columns from the right-hand side, beside the columns from the left-hand side as follows:

What we have done is make both sides equal regards the no of columns to be shown from df2
Essentially we are taking the column from DF1, and then bringing in the three columns from DF2.
The columna, columnb, columnc below correspond to the three columns in DF2, and will store the data from them.

The fix for this issue is : df1[[‘columna’,’columnb’,’columnc’]] = df2

print (df1)

Scenario 2 – Splitting out data

There may be an occasion when you have a python list, and you need to split out the values of that list into separate columns.

new_list1 = ['1 2 3']
df1_newlist = pd.DataFrame(new_list1,columns=['column1'])

In the above, we have created a list, with three values that are part of one string. Here what we are looking to do is create a new column with the below code:

df1_newlist[["column1"]] = df1_newlist["column1"].str.split(" ", expand=True) #Splitting based on the space between the values.

print(df1_newlist)

When we run the above it throws the following valueerror:

The reason it throws the error is that the logic has three values to be split out into three columns, but we have only defined one column in df1_newlist[[“column1”]]

To fix this, we run the below code:

df1_newlist[["column1","column2","column3"]] = df1_newlist["column1"].str.split(" ", expand=True) #Splitting based on the space between the values.

print(df1_newlist)

This returns the following output, with the problem fixed!

To fix the ValueError: columns must be same length as key error in Pandas, make sure that the number of keys and the number of values in each row match and that each key corresponds to a unique value.

Python raises a “ValueError: columns must be same length as key” error in Pandas when you try to create a DataFrame, and the number of columns and keys do not match.

Why ValueError occurs in Pandas?

  1. When you attempt to assign a list-like object (For example lists, tuples, sets, numpy arrays, and pandas Series) to a list of DataFrame columns as new arrays but the number of columns doesn’t match the second (or last) dimension (found using np.shape) of the list-like object.
  2. When you attempt to assign a DataFrame to a list (or pandas Series or numpy array or pandas Index) of columns but the respective numbers of columns don’t match.
  3. When you attempt to replace the values of an existing column with a DataFrame (or a list-like object) whose number of columns doesn’t match the number of columns it’s replacing.

Python code that generates the error

import pandas as pd

list1 = [11, 21, 19]
list2 = [[46, 51, 61], [71, 81, 91]]

df1 = pd.DataFrame(list1, columns=['column1'])
df2 = pd.DataFrame(list2, columns=['column2', 'column3', 'column4'])

df1[['a']] = df2

Output

ValueError - columns must be same length as key in Pandas

In the above code example, the interpreter raised a ValueError: Columns must be same length as key error because the number of columns in df2(3 columns) is different from the number of rows in df1(1 row).

Code that fixes the error

Pandas DataFrame requires that the number of columns matches the number of values for each row.

import pandas as pd

list1 = [11, 21, 19]
list2 = [[46, 51, 61], [71, 81, 91]]

df1 = pd.DataFrame(list1, columns=['column1'])

# Increase the number of rows in df1 to match the number of columns in df2
df1 = pd.concat([df1] * len(list2), ignore_index=True)

df2 = pd.DataFrame(list2, columns=['column2','column3','column4'])

df1[['column2', 'column3', 'column4']] = df2

print(df1)

Output

   column1  column2  column3  column4
0    11      46.0     51.0     61.0
1    21      71.0     81.0     91.0
2    19      NaN      NaN      NaN
3    11      NaN      NaN      NaN
4    21      NaN      NaN      NaN
5    19      NaN      NaN      NaN

In this code example, a new DataFrame df1 with the same number of rows as df2 by concatenating df1 with itself multiple times and then adding the columns from df2 to df1. This ensures that the number of columns and rows match and averts the ValueError from being raised.

If the values are not there in the column, NaN will be placed.

You can also check the shape of the object you’re trying to assign the df columns using the np.shape.

The second (or the last) dimension must match the number of columns you’re trying to assign to. For example, if you try to assign a 2-column numpy array to 3 columns, you’ll see the ValueError.

I hope this article helped you resolve your error.

This error happens when you try to assign a data frame as columns to another data frame, and the number of column names provided is not equal to the number of columns of the assignee data frame. e.g. given a simple data frame as follows:

df = pd.DataFrame({'x': [1,2,3]})

df
#   x
#0  1
#1  2
#2  3

And a second data frame:

df1 = pd.DataFrame([[1, 2], [3, 4], [5, 6]])

df1
#   0  1
#0  1  2
#1  3  4
#2  5  6

It is possible to assign df1 to df as columns as follows:

df[['a', 'b']] = df1

df
#   x  a  b
#0  1  1  2
#1  2  3  4
#2  3  5  6

Notice how the content of df1 has been assigned as two separate columns to df.

Now given a third data frame with only one column, on the other hand:

df2 = pd.DataFrame([[1], [3], [5]])
df2
#   0
#0  1
#1  3
#2  5

If you do df[['a', 'b']] = df2, you will get an error:

ValueError: Columns must be same length as key

The reason being that df2 has only one column, but on the left side of the assignment, you are expecting two columns to be assigned, hence the error.

Solution:

Whenever you encounter this error, just make sure you check the number of columns to be assigned to is the same of the number of columns in the assignee data frame. In our above example, for instance, if you have ['a', 'b‘] on the left side as columns, then make sure you also have a data frame that has two columns on the right side.


Bonus: Want to play around pandas and python in the browser without going through any complex set up, try the PyConsole browser extension:

  1. Install chrome extension
  2. Install firefox extension

Please consider writing a review and sharing it with other people if you like it 🙂

PyConsole - Run Python in your browser | Product Hunt

metrics_df[[
    "same_intervals_between_requests"

пропущена запятая, посмотрите внимательнее. Если же причина не в этом (ошибка то очевидна), может ваше исключение до этого выскакивает, то опубликуйте задачу (что вы хотите сделать.). Возьмите маленький фрейм и во что он должен превратится. По тому что как можно поправлять код? Без задачи (что вы хотите что бы этот код делал).

Потом вот здесь тоже ошибка

df[["time_local", "id_session"]].groupby("id_session").apply(count_metric_using_shift)

вы выбираете сабсет и потом группируете, откуда ему взять колонки на которых запускать вашу функцию (весьма сомнительную).

Вот я на маленьком фрейме сконструировал такую же ошибку, как у вас
вот так будет ошибка потому что колонок где запускать функцию нет

df = pd.DataFrame({
    'Cat':['A','A','B','A','B'],
    'Num1':[1,2,3,4,5],
    'Num2':[6,7,8,9,10]
})
df[['A','B']] = df[['Cat']].groupby('Cat').apply('mean')

да ее можно убрать просто оставив df[[‘A’,’B’]] =df.groupby(‘Cat’).apply(‘mean’) Но естествеено смысла в этом нет. Оно nan вернет. Нужно все переписывать, по этому и спрашиваю, какая задача.

Понравилась статья? Поделить с друзьями:
  • Cocoa ошибка 1 что это
  • Cmos error ошибка при запуске
  • Civilization 5 при запуске игры произошла ошибка
  • Citizen cl s521 ошибка condition error
  • Chk wiring jvc выдает ошибку