Unknown label type continuous ошибка - ErrorsMaster.ru - большая энциклопедия ошибок и их решений

I’ve seen other posts talking about this but anyone of these can help me. I am using jupyter notebook with Python 3.6.0 on windows x6 machine.
I have a large dataset but I keep only a piece of it to run my models:

This is a piece of code that i used:

df = loan_2.reindex(columns= ['term_clean','grade_clean', 'annual_inc', 'loan_amnt', 'int_rate','purpose_clean','installment','loan_status_clean'])
df.fillna(method= 'ffill').astype(int)
from sklearn.preprocessing import Imputer
from sklearn.preprocessing import StandardScaler
imp = Imputer(missing_values='NaN', strategy='median', axis=0)
array = df.values
y = df['loan_status_clean'].values
imp.fit(array)
array_imp = imp.transform(array)

y2= y.reshape(1,-1)
imp.fit(y2)
y_imp= imp.transform(y2)
X = array_imp[:,0:4]
Y = array_imp[:,4]
validation_size = 0.20
seed = 7
X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size, random_state=seed)
seed = 7
scoring = 'accuracy'

from sklearn import model_selection
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import  BernoulliNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.ensemble import AdaBoostClassifier
from sklearn.neural_network import MLPClassifier

# Spot Check Algorithms
models = []
models.append(('LR', LogisticRegression()))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier()))
models.append(('BNB', BernoulliNB()))
models.append(('RF', RandomForestClassifier()))
models.append(('GBM', AdaBoostClassifier()))
models.append(('NN', MLPClassifier()))
models.append(('SVM', SVC()))

# evaluate each model in turn
results = []
names = []
for name, model in models:
    kfold = model_selection.KFold(n_splits=10, random_state=seed)
    cv_results = model_selection.cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring)
    results.append(cv_results)
    names.append(name)
    msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())
    print(msg)

When I run the last one piece of code this error comes up:

ValueError                                Traceback (most recent call last)
<ipython-input-262-1e6860ba615b> in <module>()
      4 for name, model in models:
      5         kfold = model_selection.KFold(n_splits=10, random_state=seed)
----> 6         cv_results = model_selection.cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring)
      7         results.append(cv_results)
      8         names.append(name)

C:UsersdalilaAnacondalibsite-packagessklearnmodel_selection_validation.py in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
    138                                               train, test, verbose, None,
    139                                               fit_params)
--> 140                       for train, test in cv_iter)
    141     return np.array(scores)[:, 0]
    142 

C:UsersdalilaAnacondalibsite-packagessklearnexternalsjoblibparallel.py in __call__(self, iterable)
    756             # was dispatched. In particular this covers the edge
    757             # case of Parallel used with an exhausted iterator.
--> 758             while self.dispatch_one_batch(iterator):
    759                 self._iterating = True
    760             else:

C:UsersdalilaAnacondalibsite-packagessklearnexternalsjoblibparallel.py in dispatch_one_batch(self, iterator)
    606                 return False
    607             else:
--> 608                 self._dispatch(tasks)
    609                 return True
    610 

C:UsersdalilaAnacondalibsite-packagessklearnexternalsjoblibparallel.py in _dispatch(self, batch)
    569         dispatch_timestamp = time.time()
    570         cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 571         job = self._backend.apply_async(batch, callback=cb)
    572         self._jobs.append(job)
    573 

C:UsersdalilaAnacondalibsite-packagessklearnexternalsjoblib_parallel_backends.py in apply_async(self, func, callback)
    107     def apply_async(self, func, callback=None):
    108         """Schedule a func to be run"""
--> 109         result = ImmediateResult(func)
    110         if callback:
    111             callback(result)

C:UsersdalilaAnacondalibsite-packagessklearnexternalsjoblib_parallel_backends.py in __init__(self, batch)
    324         # Don't delay the application, to avoid keeping the input
    325         # arguments in memory
--> 326         self.results = batch()
    327 
    328     def get(self):

C:UsersdalilaAnacondalibsite-packagessklearnexternalsjoblibparallel.py in __call__(self)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132 
    133     def __len__(self):

C:UsersdalilaAnacondalibsite-packagessklearnexternalsjoblibparallel.py in <listcomp>(.0)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132 
    133     def __len__(self):

C:UsersdalilaAnacondalibsite-packagessklearnmodel_selection_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, error_score)
    236             estimator.fit(X_train, **fit_params)
    237         else:
--> 238             estimator.fit(X_train, y_train, **fit_params)
    239 
    240     except Exception as e:

C:UsersdalilaAnacondalibsite-packagessklearnlinear_modellogistic.py in fit(self, X, y, sample_weight)
   1172         X, y = check_X_y(X, y, accept_sparse='csr', dtype=np.float64,
   1173                          order="C")
-> 1174         check_classification_targets(y)
   1175         self.classes_ = np.unique(y)
   1176         n_samples, n_features = X.shape

C:UsersdalilaAnacondalibsite-packagessklearnutilsmulticlass.py in check_classification_targets(y)
    170     if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
    171             'multilabel-indicator', 'multilabel-sequences']:
--> 172         raise ValueError("Unknown label type: %r" % y_type)
    173 
    174 

ValueError: Unknown label type: 'continuous'

Brief assumption: my data are clean from NaN and Missing Value in general.

Источник

17 авг. 2022 г.
читать 1 мин

Одна распространенная ошибка, с которой вы можете столкнуться в Python:

ValueError : Unknown label type: 'continuous'

Эта ошибка обычно возникает, когда вы пытаетесь использовать sklearn для соответствия модели классификации, такой как логистическая регрессия , и значения, которые вы используете для переменной ответа, являются непрерывными, а не категориальными.

В следующем примере показано, как использовать этот синтаксис на практике.

Как воспроизвести ошибку

Предположим, мы пытаемся использовать следующий код для соответствия модели логистической регрессии:

import numpy as np
from sklearn. linear_model import LogisticRegression

#define values for predictor and response variables
x = np.array([[2, 2, 3], [3, 4, 3], [5, 6, 6], [7, 5, 5]])
y = np.array([0, 1.02, 1.02, 0])

#attempt to fit logistic regression model
classifier = LogisticRegression()
classifier. fit (x, y)

ValueError : Unknown label type: 'continuous'

Мы получаем ошибку, потому что в настоящее время значения для нашей переменной ответа непрерывны.

Напомним, что модель логистической регрессии требует, чтобы значения переменной ответа были категориальными , например:

0 или 1
«Да или нет»
«Пройдено» или «Не пройдено»

В настоящее время наша переменная ответа содержит непрерывные значения, такие как 0 и 1,02 .

Как исправить ошибку

Способ устранения этой ошибки — просто преобразовать непрерывные значения переменной ответа в категориальные значения с помощью функции LabelEncoder() из sklearn :

from sklearn import preprocessing
from sklearn import utils

#convert y values to categorical values
lab = preprocessing. LabelEncoder ()
y_transformed = lab. fit_transform (y)

#view transformed values
print(y_transformed)

[0 1 1 0]

Каждое из исходных значений теперь кодируется как 0 или 1 .

Теперь мы можем подобрать модель логистической регрессии:

#fit logistic regression model
classifier = LogisticRegression()
classifier. fit (x, y_transformed)

На этот раз мы не получаем никакой ошибки, потому что значения ответа для модели являются категориальными.

Дополнительные ресурсы

В следующих руководствах объясняется, как исправить другие распространенные ошибки в Python:

Как исправить: ValueError: Индекс содержит повторяющиеся записи, не может изменить форму
Как исправить: ошибка типа: ожидаемая строка или байтовый объект
Как исправить: TypeError: объект ‘numpy.float64’ не вызывается

Источник

There are two types of supervised learning algorithms, regression and classification. Classification problems require categorical or discrete response variables (y variable). If you try to train a scikit-learn imported classification model with a continuous variable, you will encounter the error ValueError: Unknown label type: ‘continuous’.

To solve this error, you can encode the continuous y variable into categories using Scikit-learn’s preprocessing.LabelEncoder or if it is a regression problem use a regression model suitable for the data.

This tutorial will go through the error in detail and how to solve it with code examples.

ValueError: Unknown label type: ‘continuous’
- What Does Continuous Mean?
- What is the Difference Between Regression and Classification?
Example #1: Evaluating the Data
- Solution
Example #2: Evaluating the Model
- Solution
Summary

ValueError: Unknown label type: ‘continuous’

In Python, a value is a piece of information stored within a particular object. You will encounter a ValueError in Python when you use a built-in operation or function that receives an argument with the right type but an inappropriate value. In this case, the y variable data has continuous values instead of discrete or categorical values.

What Does Continuous Mean?

There are two categories of data:

Discrete data: categorical data, for example, True/False, Pass/Fail, 0/1 or count data, for example, number of students in a class.
Continuous data: Data that we can measure on an infinite scale; it can take any value between two numbers, no matter how small. For example, the length of a string can be 1.00245 centimetres.

However, you cannot have 1.5 of a student in a class; count is a discrete measure. Measures of time, height, and temperature are all examples of continuous data.

What is the Difference Between Regression and Classification?

We can classify supervised learning algorithms into two types: Regression and Classification. For regression, the response variable or label is continuous, for example, weight, height, price, or time. In each case, a regression model seeks to predict a continuous quantity.

For classification, the response variable or label is categorical, for example, Pass or Fail, True or False. A classification model seeks to predict a class label.

Example #1: Evaluating the Data

Let’s look at an example of training a Logistic Regression model to perform classification on arrays of integers. First, let’s look at the data. We will import numpy to create our explanatory variable data X and our response variable data y. Note that the data used here has no real relationship and is only for explaining purposes.

import numpy as np

# Values for Predictor and Response variables
X = np.array([[2, 4, 1, 7], [3, 5, 9, 1], [5, 7, 1, 2], [7, 4, 2, 8], [4, 2, 3, 8]])
y = np.array([0, 1.02, 1.02, 0, 0])

Next, we will import the LogisticRegression class and create an object of this class, our logistic regression model. We will then fit the model using the values for the predictor and response variables.

from sklearn.linear_model import LogisticRegression

# Attempt to fit Logistic Regression Model
cls = LogisticRegression()
cls.fit(X, y)

Let’s run the code to see what happens:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-556cca8758bd> in <module>
      3 # Attempt to fit Logistic Regression Model
      4 cls = LogisticRegression()
----> 5 cls.fit(X, y)

~/opt/anaconda3/lib/python3.8/site-packages/sklearn/linear_model/_logistic.py in fit(self, X, y, sample_weight)
   1514             accept_large_sparse=solver not in ["liblinear", "sag", "saga"],
   1515         )
-> 1516         check_classification_targets(y)
   1517         self.classes_ = np.unique(y)
   1518 

~/opt/anaconda3/lib/python3.8/site-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
    195         "multilabel-sequences",
    196     ]:
--> 197         raise ValueError("Unknown label type: %r" % y_type)
    198 
    199 

ValueError: Unknown label type: 'continuous'

The error occurs because logistic regression is a classification problem that requires the values of the response variable to be categorical or discrete such as: “Yes” or “No”, “True” or “False”, 0 or 1. In the above code, our response variable values contain continuous values 1.02.

Solution

To solve this error, we can convert the continuous values of the response variable y to categorical values using the LabelEncoder class under sklearn.preprocessing. Let’s look at the revised code:

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing

# Values for Predictor and Response variables
X = np.array([[2, 4, 1, 7], [3, 5, 9, 1], [5, 7, 1, 2], [7, 4, 2, 8], [4, 2, 3, 8]])

y = np.array([0, 1.02, 1.02, 0, 0])

# Create label encoder object
labels = preprocessing.LabelEncoder()

# Convert continous y values to categorical
y_cat = labels.fit_transform(y)

print(y_cat)

[0 1 1 0 0]

We have encoded the original values as 0 or 1. Now, we can fit the logistic regression model and perform a prediction on test data:

# Attempt to fit Logistic Regression Model
cls = LogisticRegression()
cls.fit(X, y_cat)

X_pred = np.array([5, 6, 9, 1])

X_pred = X_pred.reshape(1, -1)

y_pred = cls.predict(X_pred)

print(y_pred)

Let’s run the code to get the result:

[1]

We successfully fit the model and used it to predict unseen data.

Example #2: Evaluating the Model

Let’s look at an example where we want to train a k-Nearest Neighbours classifier to fit on some data. The data, which we will store in a file called regression_data.csv looks like this:

Avg.Session Length,TimeonApp,TimeonWebsite,LengthofMembership,Yearly Amount Spent
34.497268,12.655651,39.577668,4.082621,587.951054
31.926272,11.109461,37.268959,2.664034,392.204933
33.000915,11.330278,37.110597,4.104543,487.547505
34.305557,13.717514,36.721283,3.120179,581.852344
33.330673,12.795189,37.536653,4.446308,599.406092
33.871038,12.026925,34.476878,5.493507,637.102448
32.021596,11.366348,36.683776,4.685017,521.572175

Next, we will import the data into a DataFrame. We will define four columns as the explanatory variables and the last column as the response variable. Then, we will split the data into training and test data:

import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv('regression_data.csv')

X = df[['Avg.Session Length', 'TimeonApp','TimeonWebsite', 'LengthofMembership']]

y = df['Yearly Amount Spent']

 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

Next, we will define a KNeighborsClassifier model and fit to the data:

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=1)

knn.fit(X_train,y_train)

Let’s run the code to see what happens:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-889312abc571> in <module>
----> 1 knn.fit(X_train,y_train)

~/opt/anaconda3/lib/python3.8/site-packages/sklearn/neighbors/_classification.py in fit(self, X, y)
    196         self.weights = _check_weights(self.weights)
    197 
--> 198         return self._fit(X, y)
    199 
    200     def predict(self, X):

~/opt/anaconda3/lib/python3.8/site-packages/sklearn/neighbors/_base.py in _fit(self, X, y)
    418                     self.outputs_2d_ = True
    419 
--> 420                 check_classification_targets(y)
    421                 self.classes_ = []
    422                 self._y = np.empty(y.shape, dtype=int)

~/opt/anaconda3/lib/python3.8/site-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
    195         "multilabel-sequences",
    196     ]:
--> 197         raise ValueError("Unknown label type: %r" % y_type)
    198 
    199 

ValueError: Unknown label type: 'continuous'

The error occurs because the k-nearest neighbors classifier is a classification algorithm and therefore requires categorical data for the response variable. The data we provide in the df['Yearly Amount Spent'] series is continuous.

Solution

We can interpret this problem as a regression problem, not a classification problem because the response variable is continuous and it is not intuitive to encode “Length of membership” into categories. We need to use the regression algorithm KNeighborsRegressor instead of KNeighborsClassifier to solve this error. Let’s look at the revised code:

from sklearn.neighbors import KNeighborsRegressor

knn = KNeighborsRegressor(n_neighbors=1)

knn.fit(X_train,y_train)

Once we have fit to the data we can get our predictions with the test data.

y_pred = knn.predict(X_test)
print(y_pred)

Let’s run the code to see the result:

[599.406092 487.547505 521.572175]

We successfully predicted three “Yearly Amount Spent” values for the test data.

Summary

Congratulations on reading to the end of this tutorial! The ValueError: Unknown label type: ‘continuous’ occurs when you try to use continuous values for your response variable in a classification problem. Classification requires categorical or discrete values of the response variable. To solve this error, you can re-evaluate the response variable data and encode it to categorical. Alternatively, you can re-evaluate the model and use a regression model instead of a classification model.

Although “regression” is in the name, logistic regression is a classification algorithm that attempts to classify observations from a dataset into discrete categories. Whenever you want to perform logistic regression, ensure the response variable data is categorical.

For further reading on Scikit-learn, go to the article: How to Solve Python ValueError: input contains nan, infinity or a value too large for dtype(‘float64’).

Go to the online courses page on Python to learn more about coding in Python for data science and machine learning.

Have fun and happy researching!

Источник

One common error you may encounter in Python is:

ValueError: Unknown label type: 'continuous'

This error usually occurs when you attempt to use sklearn to fit a classification model like logistic regression and the values that you use for the response variable are continuous instead of categorical.

The following example shows how to use this syntax in practice.

How to Reproduce the Error

Suppose we attempt to use the following code to fit a logistic regression model:

import numpy as np
from sklearn.linear_model import LogisticRegression

#define values for predictor and response variables
x = np.array([[2, 2, 3], [3, 4, 3], [5, 6, 6], [7, 5, 5]])
y = np.array([0, 1.02, 1.02, 0])

#attempt to fit logistic regression model
classifier = LogisticRegression()
classifier.fit(x, y)

ValueError: Unknown label type: 'continuous'

We receive an error because currently the values for our response variable are continuous.

Recall that a logistic regression model requires the values of the response variable to be categorical such as:

0 or 1
“Yes” or “No”
“Pass” or “Fail”

Currently our response variable contains continuous values such as 0 and 1.02.

How to Fix the Error

The way to resolve this error is to simply convert the continuous values of the response variable to categorical values using the LabelEncoder() function from sklearn:

from sklearn import preprocessing
from sklearn import utils

#convert y values to categorical values
lab = preprocessing.LabelEncoder()
y_transformed = lab.fit_transform(y)

#view transformed values
print(y_transformed)

[0 1 1 0]

Each of the original values is now encoded as a 0 or 1.

We can now fit the logistic regression model:

#fit logistic regression model
classifier = LogisticRegression()
classifier.fit(x, y_transformed)

This time we don’t receive any error because the response values for the model are categorical.

Additional Resources

The following tutorials explain how to fix other common errors in Python:

How to Fix: ValueError: Index contains duplicate entries, cannot reshape
How to Fix: Typeerror: expected string or bytes-like object
How to Fix: TypeError: ‘numpy.float64’ object is not callable

Источник

Summary: Use SKLearn’s LogisticRegression Model for classification problems only. The Y variable is a category (e.g., binary [0,1]), not continuous (e.g. float numbers 3.4, 7.9). If the Y variable is non-categorical (i.e., continuous), the potential fixes are as follows.

Re-examine the data. Try to encode the continuous Y variable into categories (e.g., use SKLearn’s LabelEncoder preprocessor).
Re-examine the model. Try to use another model such as a regressor makes sense (e.g., Linear Regression).

How to Avoid Errors like “Unknown label type: ‘continuous’” in sklearn LogisticRegression

Note: All the solutions provided below have been verified using Python 3.9.0b5

Problem Formulation

When using scikit-learn’s LogisticRegression classifier, how does one fix the following error?

$ python lr1.py
Traceback (most recent call last):
  File ".../SKLearnLogicReg/lr1.py", line 14, in <module>
    clf.fit(trainingData, trainingScores)
  File ".../lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 1347, in fit
    check_classification_targets(y)
  File ".../lib/python3.9/site-packages/sklearn/utils/multiclass.py", line 183, in check_classification_targets
    raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'

Background

Machine Learning is one of the hottest topics of our age. Various entities use Machine Learning models to perform complex operations on data. Complex Operations such as…

Data Analysis
Data Classification
Data Prediction
Data Extrapolation.

Python’s scikit-learn library is an open-source Machine Learning library. It supports supervised and unsupervised learning. The scikit-learn library provides excellent tools for model fitting, selection, and evaluation. It also provides many helpful utilities for data preprocessing and analysis.

One has to be careful about choosing the Machine Learning model. One also has to be careful when one examines the data; to ask, what one is attempting to learn from it. This blog discusses Logistic Regression, but the nature of the error is more general. It urges the reader to go back to basics and answer the following…

What do we want to learn from the data? What are we looking for in it?
Is this the right machine learning model we should use?
Are we feeding the data to the model in a proper manner?
Is the data in the correct format to use with the model?
Are you taking enough mental breaks?
Are you pumping the blood in your body? That is—stretch, walk, run, exercise?
Are you nourishing your body? Eating vegetables, fruits, good quality coffee?

Wow!! You Talk Too Much!! Can You Just Tell Me The Darned Answer?

The straightforward way to fix the error is to take a break and go for a walk and eat a fruit.

While this error is frustrating, it is also common among new machine learners. It stems from the single fact that sklearn’s LogisticRegression class is a “classifier”. That is, use scikit-learn’s LogisticRegression for classification problems only. This means that while the X variables can be floats etc., the Y variable has to be a “category”. Category, meaning [0,1], or [yes, no], [true, false], [Apples, Oranges, Pears], and so on. The Y variable cannot be a continuous value such as a float (3.5, 7.9, 89.6, etc.).

Let’s see how this works with some simple naive data. The data we use in the example below has no meaning other than to illustrate the problem.

For this first example we use floats as target vectors (i.e., y_variables). This will cause an error in the fit() method of Logistic Regression.

$ python
Python 3.9.0b5 (default, Oct 19 2020, 11:11:59) 
>>>
>>> ## Import the needed libraries and Modules.
>>> import numpy as np
>>> from sklearn.linear_model import LogisticRegression
>>> 
>>> ## Define some training data. We will call this the X-Variable.
>>> x_variables = np.array([[5.7, 2.5, 7.7],
...                         [8.4, 0.6, 3.6],
...                         [5.3, 4.5, 2.7],
...                         [5.1, 2.4, 6.3]])
>>> 
>>> ## Define the target vector. We will call this the Y-Variable.
>>> ## Note that the values are floats. This will cause the error!!
>>> y_variables = np.array([4.2, 6.8, 3.4, 1.9])
>>> 
>>> ## Define another set of target vectors. Note how these are ints.
>>> ## They are simply rounded versions of the above float numbers.
>>> ## y_variables = np.array([4, 7, 3, 2])
>>> 
>>> ## Define some new, yet unknown data. We will call this the U-Variable.
>>> u_variables  = np.array([[4.8, 6.4, 3.2],
...                          [5.3, 2.3, 7.4]])
>>> 
>>> ## Instantiate the Logistic Regression Machine Learning Model.
>>> lr = LogisticRegression()
>>> 
>>> ## Fit the Model to the Data.  i.e. Make the Model Learn.
>>> lr.fit(x_variables, y_variables)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/gsrao/.virtualenvs/Upwork25383745/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py", line 1347, in fit
    check_classification_targets(y)
  File "/Users/gsrao/.virtualenvs/Upwork25383745/lib/python3.9/site-packages/sklearn/utils/multiclass.py", line 183, in check_classification_targets
    raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'

For this next example, we use integers as target vectors (i.e., y_variables). Just a simple change!! Everything else is the same. The code goes to completion!!

>>> ## Import the needed libraries and Modules.
>>> import numpy as np
>>> from sklearn.linear_model import LogisticRegression
>>> 
>>> ## Define some training data. We will call this the X-Variable.
>>> x_variables = np.array([[5.7, 2.5, 7.7],
...                         [8.4, 0.6, 3.6],
...                         [5.3, 4.5, 2.7],
...                         [5.1, 2.4, 6.3]])
>>> 
>>> ## Define the target vector. We will call this the Y-Variable.
>>> ## Note that the values are floats. This will cause the error!!
>>> y_variables = np.array([4.2, 6.8, 3.4, 1.9])
>>> 
>>> ## Define another set of target vectors. Note how these are ints.
>>> ## They are simply rounded versions of the above float numbers.
>>> y_variables = np.array([4, 7, 3, 2])
>>> 
>>> ## Define some new, yet unknown data. We will call this the U-Variable.
>>> u_variables  = np.array([[4.8, 6.4, 3.2],
...                          [5.3, 2.3, 7.4]])
>>> 
>>> ## Instantiate the Logistic Regression Machine Learning Model.
>>> lr = LogisticRegression()
>>> 
>>> ## Fit the Model to the Data.  i.e. Make the Model Learn.
>>> lr.fit(x_variables, y_variables)
LogisticRegression()
>>> 
>>> ## Finally Predict the outcome for the Unknown Data!!
>>> print("This is the Prediction for the Unknown Data in u_variables!!")
This is the Prediction for the Unknown Data in u_variables!!
>>> print(lr.predict(u_variables))
[3 4]
>>>

This illustrates the point that was made earlier, “Use LogisticRegression for classification problems *only*”!! The target vector has to be categorical, *not* continuous!!

Ah!! I Get It Now!! Anything Else?

The reader needs to re-examine the data to see if it makes sense to use classification models. It is possible that the data is better served with regression or clustering models. One needs to always ask…

What is the question we are asking about the data?
What are we looking for in the data?
What are we attempting to learn from the data?

Here is a simple example taken from the “Python One-Liners” book by Dr. Chris Mayer. The example correlates cigarette consumption with lung cancer probability. It illustrates how Logistic Regression works well with categorical data.

>>> ## Import the needed libraries and Modules.
>>> import numpy as np
>>> from sklearn.linear_model import LogisticRegression
>>> 
>>> ## Define some training data. We will call this the X-Variable.
>>> ## This array contains the number of cigarettes smoked in a day.
>>> x_variables = np.array([[0], [10], [15], [60], [90]])
>>> 
>>> ## Define the target vector. We will call this the Y-Variable.
>>> ## This array contains the outcome i.e. if patient has lung-Cancer.
>>> y_variables = np.array(["No", "No", "Yes", "Yes", "Yes"])
>>> 
>>> ## Define some new, yet unknown data. We will call this the U-Variable.
>>> ## This correlates to the number of cigarettes smoked in a day. Given
>>> ## this new data, the model will try to predict the outcome.
>>> u_variables  = np.array([[2], [12], [13], [40], [90]])
>>> 
>>> ## Instantiate the Logistic Regression Machine Learning Model.
>>> lr = LogisticRegression()
>>> ## Fit the Model to the Data.  i.e. Make the Model Learn.
>>> lr.fit(x_variables, y_variables)
LogisticRegression()
>>> 
>>> ## Finally Predict the outcome for the Unknown Data!!
>>> print("This is the Prediction for the Unknown Data in u_variables!!")
This is the Prediction for the Unknown Data in u_variables!!
>>> print(lr.predict(u_variables))
['No' 'No' 'Yes' 'Yes' 'Yes']
>>> 
>>> ## Based on the Training Data (i.e. x_variables and y_variables),
>>> ## SKLearn decided the change-over from "No" lung-cancer to "Yes"
>>> ## lung-cancer is somewhere around 12 to 13 cigarettes smoked per
>>> ## day. The predict_proba() method shows the probability values 
>>> ## for "No" v/s "Yes" (i.e. target vector Y) for various values of
>>> ## X (i.e. Number of Cigarettes smoked per day).
>>> for i in range(20):
...   print("x=" + str(i) + " --> " + str(lr.predict_proba([[i]])))
... 
x=0 --> [[9.99870972e-01 1.29027714e-04]]
x=1 --> [[9.99735913e-01 2.64086966e-04]]
x=2 --> [[9.99459557e-01 5.40442542e-04]]
x=3 --> [[0.99889433 0.00110567]]
x=4 --> [[0.99773928 0.00226072]]
x=5 --> [[0.99538318 0.00461682]]
x=6 --> [[0.99059474 0.00940526]]
x=7 --> [[0.98093496 0.01906504]]
x=8 --> [[0.96173722 0.03826278]]
x=9 --> [[0.92469221 0.07530779]]
x=10 --> [[0.85710998 0.14289002]]
x=11 --> [[0.74556647 0.25443353]]
x=12 --> [[0.58873015 0.41126985]]
x=13 --> [[0.4115242 0.5884758]]
x=14 --> [[0.25463283 0.74536717]]
x=15 --> [[0.14301871 0.85698129]]
x=16 --> [[0.07538097 0.92461903]]
x=17 --> [[0.03830145 0.96169855]]
x=18 --> [[0.01908469 0.98091531]]
x=19 --> [[0.00941505 0.99058495]]

Conclusion

So there, you have it!! To Recap…

Use SKLearn’s LogisticRegression Model for Classification problems *only*, i.e., the Y variable is a category (e.g. binary [0,1]), *not continuous* (e.g. float numbers 3.4, 7.9).

If the Y variable is non-categorical (i.e. continuous), the potential fixes are as follows.

Re-examine the Data. Maybe encode the continuous Y variable into categories (e.g. use SKLearn’s LabelEncoder preprocessor).
Re-examine the Model. Maybe another model such as a regressor makes sense (e.g. Linear Regression).

Finxter Academy

This blog was brought to you by Girish Rao, a student of Finxter Academy. You can find his Upwork profile here.

Reference

All research for this blog article was done using Python Documents, the Google Search Engine, and the shared knowledge-base of the Finxter Academy, scikit-learn, and the Stack Overflow Communities.

The Lung-Cancer Example was adapted from “Python One-Liners” by Dr. Chris Mayer.

Источник

Causes of ValueError: Unknown label type: 'continuous' in Python
Use Scikit’s LabelEncoder() Function to Fix ValueError: Unknown label type: 'continuous'
Evaluate the Data to Fix ValueError: Unknown label type: 'continuous'

This article will tackle the causes and solutions to the ValueError: Unknown label type: 'continuous' error in Python.

Causes of `ValueError: Unknown label type: 'continuous'` in Python

Python interpreter throws this error when we try to train sklearn imported classifier on the continuous target variable.

Classifiers such as K Nearest Neighbor, Decision Tree, Logistic Regression, etc., predict the class of input variables. Class variables are in discrete or categorical forms such that 0 or 1, True or False, and Pass or Fail.

If sklearn imported classification algorithm, i.e., Logistic Regression is trained on the continuous target variable, it throws ValueError: Unknown label type:'continuous'.

Code:

import numpy as np
from sklearn.linear_model import LogisticRegression
input_var=np.array([[1.1,1.2,1.5,1.6],[0.5,0.9,0.6,0.8]])
target_var=np.array([1.4,0.4])
classifier_logistic_regression=LogisticRegression()
classifier_logistic_regression.fit(input_var,target_var)

Output:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [6], in <module>
----> 1 lr.fit(x,y)

File c:usershp 840 g3appdatalocalprogramspythonpython39libsite-packagessklearnlinear_model_logistic.py:1516, in LogisticRegression.fit(self, X, y, sample_weight)
   1506     _dtype = [np.float64, np.float32]
   1508 X, y = self._validate_data(
   1509     X,
   1510     y,
   (...)
   1514     accept_large_sparse=solver not in ["liblinear", "sag", "saga"],
   1515 )
-> 1516 check_classification_targets(y)
   1517 self.classes_ = np.unique(y)
   1519 multi_class = _check_multi_class(self.multi_class, solver, len(self.classes_))

File c:usershp 840 g3appdatalocalprogramspythonpython39libsite-packagessklearnutilsmulticlass.py:197, in check_classification_targets(y)
    189 y_type = type_of_target(y)
    190 if y_type not in [
    191     "binary",
    192     "multiclass",
   (...)
    195     "multilabel-sequences",
    196 ]:
--> 197     raise ValueError("Unknown label type: %r" % y_type)

ValueError: Unknown label type: 'continuous'

Float values as target label y are passed to the logistic regression classifier, which accepts categorical or discrete class labels. As a result, the code throws an error at the lr.fit() function, and the model refuses to train on the given data.

Use Scikit’s `LabelEncoder()` Function to Fix `ValueError: Unknown label type: 'continuous'`

LabelEncoder() Function encodes the continuous target variables into discrete or categorical labels.

The classifier now accepts these values. The classifier trains on the given data and predicts the output class.

Code:

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing
from sklearn import utils
input_var=np.array([[1.1,1.2,1.5,1.6],[0.5,0.9,0.6,0.8]])
target_var=np.array([1.4,0.4])
predict_var=np.array([ [1.3, 1.7, 1.8,1.4], [0.2, 0.6, 0.3, 0.4] ])
encoded = preprocessing.LabelEncoder()
encoded_target= encoded.fit_transform(target_var)
print(encoded_target)
classifier_logistic_regression=LogisticRegression()
classifier_logistic_regression.fit(input_var,encoded_target)
predict=classifier_logistic_regression.predict(predict_var)
print(predict)

Output:

Float values of target variable target_var are encoded into discrete or categorical i.e. encoded_target value using LabelEncoder() function.

The classifier now accepts these values. The classifier is trained to predict the class of new data, denoted by predict_var.

Evaluate the Data to Fix `ValueError: Unknown label type: 'continuous'`

Sometimes the data must be carefully examined to determine whether the issue is one of regression or classification. Some output variables, such as house price, cannot be classified or discretized.

In such cases, the issue is one of regression. Because the regression model accepts continuous target variables, the target variable does not need to be encoded.

Code:

import numpy as np
from sklearn.linear_model import LinearRegression
input_var=np.array([[1.1,1.2,1.5,1.6],[0.5,0.9,0.6,0.8]])
target_var=np.array([1.4,0.4])
predict_var=np.array([ [1.3, 1.7, 1.8,1.4], [0.2, 0.6, 0.3, 0.4] ])
linear_Regressor_model=LinearRegression()
linear_Regressor_model.fit(input_var,target_var)
predict=linear_Regressor_model.predict(predict_var)
print(predict)

Output:

Float values in the output variable target_var shows that the problem is the regression. The model must predict the value of the input variable rather than its class.

A linear regression model is trained and predicts the outcome value of new data.

Источник

In this article, we will learn the error valueerror: unknown label type: ‘continuous’ in Python programming.

Also, we will provide examples, solutions, and answers to frequently asked questions related to this error.

If you are encountering this issue, read on to find practical solutions and gain a better understanding of how to handle this error effectively.

Why Does this Valueerror: Unknown Label Type: Continuous Error Occur?

The “ValueError: Unknown label type: continuous” error typically occurs because we are trying to use a machine learning algorithm or function that expects discrete or categorical labels, but instead, you have provided continuous labels.

What is the Valueerror Unknown Label Type ‘Continuous’ Error?

The ValueError Unknown label type: ‘continuous’ error is encountered in Python when attempting to apply a classification algorithm or method to a target variable that is continuous type.

The error message demonstrates that the classifier cannot handle continuous labels since it expects discrete categorical labels.

Therefore, it is important to ensure that the target variable is properly categorized before using a classification algorithm.

Examples of the ValueError: Unknown label type ‘continuous’ Error

Here are the examples to understand more about the valueerror, let’s consider a few examples:

Let’s have a look at the example:

from sklearn.svm import SVC
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=100, n_features=2, noise=0.1)
svm = SVC()
svm.fit(X, y)

In this example, we are using the Support Vector Classifier (SVC) from the scikit-learn library.

The make_regression function is generating a dataset with continuous labels (y).

When fitting the SVM classifier, we encounter the error:

ValueError: Unknown label type: ‘continuous’

Another example:

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=100, n_features=2, noise=0.1)
dt = DecisionTreeClassifier()
dt.fit(X, y)

Here, we are using the Decision Tree Classifier from scikit-learn. Again, the make_regression function generates continuous labels (y).

When trying to fit the decision tree classifier, we encounter the same “ValueError” due to the continuous label type.

Traceback (most recent call last):
File “C:UsersDellPycharmProjectsPython-Code-Examplemain.py”, line 6, in
dt.fit(X, y)
File “C:UsersDellPycharmProjectsPython-Code-Examplevenvlibsite-packagessklearntree_classes.py”, line 889, in fit
super().fit(
File “C:UsersDellPycharmProjectsPython-Code-Examplevenvlibsite-packagessklearntree_classes.py”, line 224, in fit
check_classification_targets(y)
File “C:UsersDellPycharmProjectsPython-Code-Examplevenvlibsite-packagessklearnutilsmulticlass.py”, line 218, in check_classification_targets
raise ValueError(“Unknown label type: %r” % y_type)
ValueError: Unknown label type: ‘continuous’

What are the Common Causes of the Error?

The valueerror: unknown label type: ‘continuous error usually occurs due to the following reasons:

Mismatched data types
Incorrect dataset
Missing preprocessing steps

Note: It is important to identify the cause of the error to determine the correct solution.

How to Fix the Valueerror: Unknown Label Type: Continuous?

To resolve the valueerror: unknown label type: continuous error, you can follow these solutions:

Solution 1: Check the target variable type

You can check the type of your target variable (y). Ensure that it is categorical (e.g., represented as strings or integers) and not continuous.

You can use the type() function in Python to determine the variable type.

For example:

# Assuming you have already defined your target variable 'y'

# Check the type of the target variable
target_type = type(y)

# Print the type of the target variable
print("Target variable type:", target_type)

In this code, the type() function is used to determine the type of the target variable y.

The resulting type is stored in the target_type variable, and finally, it is printed to the console using the print() function.

Solution 2: Perform label encoding

If your target variable is continuous, you need to convert it into categorical labels suitable for classification.

Label encoding can be used to assign unique integers to each category.

You can utilize the LabelEncoder class from the scikit-learn library to perform this encoding.

Example:

from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
y = label_encoder.fit_transform(y)

Solution 3: Apply one-hot encoding

If your target variable represents multiple categories, one-hot encoding can be used to transform it into binary features.

This encoding creates binary columns for each category, where a value of 1 indicates membership in a specific category, and 0 indicates non-membership.

Let’s take a look at the example:

from sklearn.preprocessing import OneHotEncoder

one_hot_encoder = OneHotEncoder()
y = one_hot_encoder.fit_transform(y.reshape(-1, 1))

Solution 4: Check the dataset

If the error persists, make sure that your dataset is correctly formatted.

You can that the target variable is separate from the features and properly labeled.

Also, confirm that the features contain of numeric or categorical values compatible with the chosen classification algorithm.

More Resources

Here are the following resources that will help you to understand more about VALUEERRORS:

Valueerror: plot_confusion_matrix only supports classifiers
Valueerror length of values does not match length of index
valueerror: invalid mode: ‘ru’ while trying to load binding.gyp
Valueerror: bad marshal data unknown type code

Conclusion

This valueerror unknown label type ‘continuous’ error usually occurs when we are attempting to train a classifier or perform classification tasks with labels that are not recognized or properly encoded.

By following the examples and solutions provided in this article, you can resolve this error effectively.

Remember to check the type of your target variable, perform appropriate encoding techniques, and choose suitable classification algorithms to handle continuous labels.

FAQs

How can I determine the type of my target variable?

You can use the type() function in Python to determine the type of your target variable. For example, type(y) will return the type of the variable y.

Can I convert a continuous variable into a categorical one?

Yes, you can convert a continuous variable into a categorical one. Label encoding and one-hot encoding are common techniques used for this purpose.

Are there any alternative algorithms that can handle continuous labels?

Yes, there are algorithms specifically designed for regression tasks that can handle continuous labels effectively.

Some examples include linear regression, decision tree regression, and random forest regression.

Источник

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and
privacy statement. We’ll occasionally send you account related emails.

Already on GitHub?
Sign in
to your account

Closed

KamodaP opened this issue

Oct 31, 2016

· 20 comments

Comments

Description

DecisionTreeClassifier crashes with unknown label type: 'continuous-multioutput'. I’ve tried loading csv file using csv.reader, pandas.read_csv and some other stuff like parsing line-by-line.

Steps/Code to Reproduce

from sklearn import tree
feature_df = pd.read_csv(os.path.join(_PATH, 'features.txt'))
target_df = pd.read_csv(os.path.join(_PATH, 'target.txt'))
feature_df = feature_df._get_numeric_data()
target_df = target_df._get_numeric_data()
feature_df = feature_df.fillna(0)
target_df = target_df.fillna(0)
clf = tree.DecisionTreeClassifier()
clf_o = clf.fit(feature_df, target_df)

features.txt
target.txt

Expected Results

Error thrown informs user what REALLY is wrong, that f.e. his data set does not folllow assumptions (and what are those)

Actual Results

Traceback (most recent call last):
  File "D:PiotrDocumentsunibapBAPFingerprintLocalisationmain.py", line 19,
 in <module>
    decision_tree.treeClassification()
  File "D:PiotrDocumentsunibapBAPFingerprintLocalisationcodedecision_tree
.py", line 56, in treeClassification
    clf_o = clf.fit(feature_df, target_df)
  File "C:Python35libsite-packagessklearntreetree.py", line 182, in fit
    check_classification_targets(y)
  File "C:Python35libsite-packagessklearnutilsmulticlass.py", line 172, in
 check_classification_targets
    raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous-multioutput'

Versions

Windows-10-10.0.14393-SP0
Python 3.5.1 (v3.5.1:37a07cee5969, Dec  6 2015, 01:54:25) [MSC v.1900 64 bit (AMD64)]
NumPy 1.11.0
SciPy 0.17.1
Scikit-Learn 0.18

Update:

I’ve changed number of target variables to one, just to simplify things

clf_o = clf.fit(feature_df, target_df.ix[:,1])

Output: Unknown label type: 'continuous'

You should be using DecisionTreeRegressor

AndiDomi, mutaku, AminSaqi, aqifilyaskhan, pavelKatk, Aminoragit, shyamgupta196, Kundhan007, drskoolie, ZachariahRosenberg, and 4 more reacted with thumbs up emoji
UMMEATHIYA and RoxanaMgd reacted with hooray emoji

Again, documentation lacks information on how many classes can classification handle. I can see that my dataset has waaaay too many classes, but your error message mentioned something like ‘labels’ which was confusing enough to forget how the dataset actually look like and meddle with methods of passing datasets.
I’ve updated the issue and ask you to reopen it.

Classification targets should be represented as integers or as strings. You can ask Pandas to read the target data in as a string and you’ll be fine.

Or use a DecisionTreeRegressor

anaahatulhe reacted with thumbs up emoji
KamodaP, belhaji, mutaku, thisisppn, mustafahoda, beyhangl, drik98, droman93, nithinreganti, KonScanner, and 6 more reacted with thumbs down emoji

See ‘Expected Results’ section of my issue

You’re right that the error message could be more useful, but the documentation for fit does say «class labels in classification». Feel free to submit a clearer issue about needing to document the expected data type for classification ys, and another for raising appropriate error messages when float data is passed as y to a classifier.

Let me cite the whole section of documentation documenting parameter y of function fit in class DecisionTreeClassifier

The target values (class labels in classification, real numbers in regression). In the regression case, use dtype=np.float64 and order=’C’ for maximum efficiency.

That does not say that classes have a cap. What makes a target variable labeled continuous? How many classes have to be there to be considered regression type target variable? If it sais about regression, then can I do regression using DecisionTreeClassifier? Why not? Etc…

As for your previous comment:

Classification targets should be represented as integers or as strings. You can ask Pandas to read the target data in as a string and you’ll be fine.

Does that mean that classes can’t be represented as floats? Or as dicts? Lists? Tuples? Longs? Doubles? bytes? I know it is logical to represent classes as integers or strings, since they should not be plenty. But do they have to? What are the limitations?

And as to creating new ticket, isn’t that useless since we’ve had quite a talk in here? Creating new ticket just to explain other guy the same thing?

It’s not number of classes. It’s use of non-integers and non-strings.

I like the issue descriptions to be focused. Your concern as raised here
seemed to be more of a usage problem.

And please don’t hassle me about what I suggest. This isn’t the only issue
I’m dealing with.

On 2 November 2016 at 00:24, Piotr Kamoda notifications@github.com wrote:

Let me cite the whole section of documentation documenting parameter y of
function fit in class DecisionTreeClassifier

The target values (class labels in classification, real numbers in
regression). In the regression case, use dtype=np.float64 and order=’C’ for
maximum efficiency.

That does not say that classes have a cap. What makes a target variable
labeled continuous? How many classes have to be there to be considered
regression type target variable? If it sais about regression, then can I do
regression using DecisionTreeClassifier? Why not? Etc…

And as to creating new ticket, isn’t that useless since we’ve had quite a
talk in here? Creating new ticket just to explain other guy the same thing?

—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#7801 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz63zNA0Qc4lzgHttXx-4VFkJFwFaDks5q5z16gaJpZM4KlSFw
.

You don’t have to solve it today, I’m only trying to make the issue of bad error descriptions and bad documentation on tree classifier and regressor to become active and a task for future releases.

For the error message would «Unsupported output type: ‘continuous-multioutput'» be better? That is the real issue. Also see #7809 for the docstring.

That’s better. But still I don’t understand why you won’t name it as it is. Because literature mostly calls that ‘Target’ variables, and output could be mistaken with function output. Exception was thrown from function ‘check_classification_targets’, so even you say that’s ‘target’ variable, and still you want to call it ‘label’ or ‘output’. I’m not a member of scikit-learn member, so you will do as you please, but I would recommend to use words ‘Target variable’ in doscstring and error message. And I ask you to describe anywhere rules that input data (or target) should follow. A short sentence — ‘Target variable (parameter y) has to be int or str’.

Maybe it’s worth mentioning in/alongside the new section (45cb11d / #7519)
on multiclass and multilabel fitting in the tutorial. Or maybe this all
belongs in a section of the user guide on data representation conventions,
describing input/output formats for all standard methods…?

On 2 November 2016 at 20:56, Piotr Kamoda notifications@github.com wrote:

That’s better. But still I don’t understand why you won’t name it as it
is. Because literature mostly calls that ‘Target’ variables, and output
could be mistaken with function output. Exception was thrown from function
‘check_classification_targets’, so even you say that’s ‘target’ variable,
and still you want to call it ‘label’ or ‘output’. I’m not a member of
scikit-learn member, so you will do as you please, but I would recommend to
use words ‘Target variable’ in doscstring and error message. And I ask you
to describe anywhere rules that input data (or target) should follow. A
short sentence — ‘Target variable (parameter y) has to be int or str’.

—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#7801 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz68AqYGWrP1C-BinLXGGHNt0VtV_qks5q6F5rgaJpZM4KlSFw
.

I’ve scanned the document and it seems a good place to mention those conventions. Also if you don’t want to obfuscate the error messages too much then idea of putting that information in user guide isn’t bad as well.
Well, the final solution (if any) will be as you wish it to be, I’m just saying that the idea seems ok, but you have your conventions. I wont make you do something.

‘Target variable (parameter y) has to be int or str’. is not right, because we support multi-label and multi-output multi-target

Also, arbitrary objects that are not floats are supported as class labels, they don’t have to be integers or strings.

If we put as imput training_data_X, training_scores_Y to fit method it cause error. To avoid it we will convert and encode labels

from sklearn import preprocessing
from sklearn import utils
lab_enc = preprocessing.LabelEncoder()
y_train = lab_enc.fit_transform(y_train)
print(y_train)
print(utils.multiclass.type_of_target(y_train))
print(utils.multiclass.type_of_target(y_train.astype(‘int’)))
print(utils.multiclass.type_of_target(y_train))

I’m having this same issue, is there a fix for it?

ValueError Traceback (most recent call last)
in
1 Dec_tree_class = DecisionTreeClassifier()
—-> 2 Dec_tree_class.fit(X_train,y_train)

~Anaconda3libsite-packagessklearntreetree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
814 sample_weight=sample_weight,
815 check_input=check_input,
—> 816 X_idx_sorted=X_idx_sorted)
817 return self
818

~Anaconda3libsite-packagessklearntreetree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
152
153 if is_classification:
—> 154 check_classification_targets(y)
155 y = np.copy(y)
156

~Anaconda3libsite-packagessklearnutilsmulticlass.py in check_classification_targets(y)
167 if y_type not in [‘binary’, ‘multiclass’, ‘multiclass-multioutput’,
168 ‘multilabel-indicator’, ‘multilabel-sequences’]:
—> 169 raise ValueError(«Unknown label type: %r» % y_type)
170
171

ValueError: Unknown label type: ‘continuous’

Источник

Как воспроизвести ошибку

Как исправить ошибку

Дополнительные ресурсы

Table of contents

ValueError: Unknown label type: ‘continuous’

What Does Continuous Mean?

What is the Difference Between Regression and Classification?

Example #1: Evaluating the Data

Solution

Example #2: Evaluating the Model

Solution

Summary

How to Reproduce the Error

How to Fix the Error

Additional Resources

Problem Formulation

Background

Wow!! You Talk Too Much!! Can You Just Tell Me The Darned Answer?

Ah!! I Get It Now!! Anything Else?

Conclusion

Finxter Academy

Reference

Causes of ValueError: Unknown label type: 'continuous' in Python

Use Scikit’s LabelEncoder() Function to Fix ValueError: Unknown label type: 'continuous'

Evaluate the Data to Fix ValueError: Unknown label type: 'continuous'

Why Does this Valueerror: Unknown Label Type: Continuous Error Occur?

What is the Valueerror Unknown Label Type ‘Continuous’ Error?

Examples of the ValueError: Unknown label type ‘continuous’ Error

What are the Common Causes of the Error?

How to Fix the Valueerror: Unknown Label Type: Continuous?

Solution 1: Check the target variable type

Solution 2: Perform label encoding

Solution 3: Apply one-hot encoding

Solution 4: Check the dataset

More Resources

Conclusion

FAQs

Comments

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Update:

Возможно, вам также будет интересно:

Causes of `ValueError: Unknown label type: 'continuous'` in Python

Use Scikit’s `LabelEncoder()` Function to Fix `ValueError: Unknown label type: 'continuous'`

Evaluate the Data to Fix `ValueError: Unknown label type: 'continuous'`