No columns to parse from file ошибка

I have created a list datatype which has the path of three folders where each folder has a lot of .txt files.
I am trying to work with each file in the folder by making it a pandas dataframe but I am getting the error as listed.

CODE-

for l in list: 
    for root, dirs, files in os.walk(l, topdown=False):
        for name in files:
            #print(os.path.join(root, name))

            df = pd.read_csv(os.path.join(root, name))

ERROR-

Traceback (most recent call last):
      File "feature_drebin.py", line 18, in <module>
        df = pd.read_csv(os.path.join(root, name))
      File "E:anacondalibsite-packagespandasioparsers.py", line 709, in parser_f
        return _read(filepath_or_buffer, kwds)
      File "E:anacondalibsite-packagespandasioparsers.py", line 449, in _read
        parser = TextFileReader(filepath_or_buffer, **kwds)
      File "E:anacondalibsite-packagespandasioparsers.py", line 818, in __init__
        self._make_engine(self.engine)
      File "E:anacondalibsite-packagespandasioparsers.py", line 1049, in _make_engine
        self._engine = CParserWrapper(self.f, **self.options)
      File "E:anacondalibsite-packagespandasioparsers.py", line 1695, in __init__
        self._reader = parsers.TextReader(src, **kwds)
      File "pandas/_libs/parsers.pyx", line 565, in pandas._libs.parsers.TextReader.__cinit__
    pandas.errors.EmptyDataError: No columns to parse from file

.txt file

Источник

Handling data and working with file formats like CSV, JSON, and Excel is a common task for developers. However, sometimes you might encounter an EmptyDataError while working with these file formats. This error occurs when there are no columns to parse from the file, which usually means the file is empty or contains only whitespace. In this guide, we’ll explore the reasons behind this error and provide a step-by-step solution to resolve the ‘No Columns to Parse from File’ issue.

Understanding the EmptyDataError
How to Resolve the EmptyDataError
Step 1: Check the File Path
Step 2: Inspect the File Content
Step 3: Clean the File Content
Step 4: Use the skip_blank_lines Parameter
FAQ
Related Resources

Understanding the EmptyDataError

The EmptyDataError is often encountered while using the pandas library in Python. Pandas is an open-source data analysis and data manipulation library that provides data structures and functions needed to work with structured data seamlessly. The error usually occurs when you’re trying to read an empty or whitespace-only file using functions like pd.read_csv(), pd.read_json(), or pd.read_excel().

Here’s an example of the error message you might see:

EmptyDataError: No columns to parse from file

How to Resolve the EmptyDataError

To resolve the EmptyDataError, follow these steps:

Step 1: Check the File Path

Make sure you’re using the correct file path while reading the file. If the file path is incorrect, Python might be trying to read a non-existent file, leading to the error. You can use os.path to verify the file’s existence.

import os

file_path = "path/to/your/file.csv"

if os.path.exists(file_path):
    print("File exists")
else:
    print("File not found")

Step 2: Inspect the File Content

Check the file contents to ensure it contains data. Open the file using a text editor or a spreadsheet application and inspect the content. If the file is empty or contains only whitespace, it will cause the EmptyDataError.

Step 3: Clean the File Content

Before reading the file using pandas, ensure that the file contains valid data. Remove any unnecessary whitespace or empty rows and columns from the file. You can use a text editor or a spreadsheet application to clean the file manually. Alternatively, you can use Python’s built-in functions to remove whitespace and empty lines programmatically.

with open("path/to/your/file.csv", "r") as file:
    lines = file.readlines()
    cleaned_lines = [line.strip() for line in lines if line.strip()]

with open("path/to/your/cleaned_file.csv", "w") as file:
    file.writelines(cleaned_lines)

Step 4: Use the `skip_blank_lines` Parameter

When reading a CSV file using pandas, you can use the skip_blank_lines parameter to ignore empty lines in the file. Set the parameter to True while using pd.read_csv().

import pandas as pd

data_frame = pd.read_csv("path/to/your/cleaned_file.csv", skip_blank_lines=True)

By following these steps, you should be able to resolve the EmptyDataError issue.

FAQ

1. What is pandas in Python?

Pandas is an open-source data analysis and data manipulation library for Python. It provides data structures and functions needed to work with structured data seamlessly. Pandas is widely used for data cleaning, transformation, analysis, and visualization.

2. What causes the EmptyDataError in pandas?

The EmptyDataError occurs when there are no columns to parse from the file. This usually means that the file is empty or contains only whitespace.

3. How to check if a file exists in Python?

You can use the os.path.exists() function to check if a file exists in Python. Pass the file path as a parameter, and the function will return True if the file exists and False otherwise.

4. How do I skip blank lines while reading a CSV file using pandas?

You can use the skip_blank_lines parameter while reading a CSV file using pandas. Set the parameter to True while using pd.read_csv() to skip blank lines in the file.

5. Can I use pandas with other file formats like JSON and Excel?

Yes, pandas can be used to work with various file formats like CSV, JSON, Excel, and more. You can use functions like pd.read_json() and pd.read_excel() to read JSON and Excel files, respectively.

Pandas Official Documentation
Working with CSV Files in Python
Python File Handling: Create, Open, Append, Read, Write

Источник

sir i find the reproducible code
import pandas as pd
from glob import glob
files = glob(‘*.asc’)
files.sort()
print(files)
data=pd.concat( (pd.read_csv(file) for file in files),ignore_index=False)

it is the code it is sorting my files but at the end

i am finding errror
Traceback (most recent call last):
File «/home/user/Documents/extracted/satya/aa.py», line 6, in
data=pd.concat( (pd.read_csv(file) for file in files),ignore_index=False)
File «/home/user/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py», line 311, in wrapper
return func(*args, **kwargs)
File «/home/user/anaconda3/lib/python3.9/site-packages/pandas/core/reshape/concat.py», line 294, in concat
op = _Concatenator(
File «/home/user/anaconda3/lib/python3.9/site-packages/pandas/core/reshape/concat.py», line 348, in init
objs = list(objs)
File «/home/user/Documents/extracted/satya/aa.py», line 6, in
data=pd.concat( (pd.read_csv(file) for file in files),ignore_index=False)
File «/home/user/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py», line 311, in wrapper
return func(*args, **kwargs)
File «/home/user/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py», line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File «/home/user/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py», line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File «/home/user/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py», line 811, in init
self._engine = self._make_engine(self.engine)
File «/home/user/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py», line 1040, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File «/home/user/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py», line 69, in init
self._reader = parsers.TextReader(self.handles.handle, **kwds)
File «pandas/_libs/parsers.pyx», line 549, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file

Источник

Problem Description:

I have a string object («textData») which contains CSV data.

I’m able to save it as CSV by:

    with open(fileName, "w") as text_file:
        print(textData, file=text_file)

but I would like to work with the data in pandas before saving the csv. So I’m trying to get the data into a pandas df.

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO(textData), sep=",")

I get this error: EmptyDataError: No columns to parse from file

This is a the textData string:

R$M21,2021-01-26,1.3265,1.3265,1.3265,1.3265,0,0
R$M21,2021-01-27,1.3263,1.3263,1.3263,1.3263,0,0
R$M21,2021-01-28,1.3319,1.3319,1.3319,1.3319,0,0
R$M21,2021-01-29,1.3287,1.3287,1.3287,1.3287,0,0
R$M21,2021-02-01,1.3315,1.3315,1.3315,1.3315,0,0
R$M21,2021-02-02,1.3328,1.3328,1.3328,1.3328,0,0
R$M21,2021-02-03,1.3331,1.3331,1.3331,1.3331,0,0
R$M21,2021-02-04,1.3361,1.3361,1.3361,1.3361,0,0
R$M21,2021-02-05,1.3383,1.3383,1.3383,1.3383,0,0
R$M21,2021-02-08,1.3354,1.3354,1.3354,1.3354,0,0
R$M21,2021-02-09,1.3279,1.3279,1.3279,1.3279,0,0
R$M21,2021-02-10,1.3259,1.3259,1.3259,1.3259,0,0
R$M21,2021-02-11,1.3253,1.3253,1.3253,1.3253,0,0
R$M21,2021-02-12,1.3272,1.3272,1.3272,1.3272,0,0
R$M21,2021-02-15,1.3224,1.3224,1.3224,1.3224,0,0
R$M21,2021-02-16,1.3232,1.3232,1.3232,1.3232,0,0
R$M21,2021-02-17,1.329,1.329,1.329,1.329,0,0
R$M21,2021-02-18,1.3275,1.3275,1.3275,1.3275,0,0
R$M21,2021-02-19,1.3246,1.3246,1.3246,1.3246,0,0
R$M21,2021-02-22,1.3235,1.3235,1.3235,1.3235,0,0
R$M21,2021-02-23,1.3216,1.3216,1.3216,1.3216,0,0
R$M21,2021-02-24,1.321,1.321,1.321,1.321,0,0
R$M21,2021-02-25,1.3181,1.3181,1.3181,1.3181,0,0
R$M21,2021-02-26,1.3313,1.3313,1.3313,1.3313,0,0
R$M21,2021-03-01,1.3323,1.3323,1.3323,1.3323,0,0
R$M21,2021-03-02,1.3315,1.3315,1.3315,1.3315,0,0
R$M21,2021-03-03,1.3309,1.3309,1.3309,1.3309,0,0
R$M21,2021-03-04,1.3328,1.3328,1.3328,1.3328,0,0
R$M21,2021-03-05,1.3417,1.3417,1.3417,1.3417,0,0
R$M21,2021-03-08,1.3479,1.3479,1.3479,1.3479,0,0
R$M21,2021-03-09,1.345,1.345,1.345,1.345,0,0
R$M21,2021-03-10,1.3476,1.3476,1.3476,1.3476,0,0
R$M21,2021-03-11,1.3403,1.3403,1.3403,1.3403,0,0
R$M21,2021-03-12,1.3463,1.3463,1.3463,1.3463,0,0
R$M21,2021-03-15,1.3456,1.3456,1.3456,1.3456,35,35
R$M21,2021-03-16,1.3455,1.3456,1.3452,1.3454,85,20
R$M21,2021-03-17,1.3457,1.3479,1.3451,1.3479,0,20
R$M21,2021-03-18,1.3432,1.3432,1.3432,1.3432,0,20
R$M21,2021-03-19,1.3425,1.3425,1.3425,1.3425,20,0
R$M21,2021-03-22,1.3434,1.3434,1.3405,1.3405,20,0
R$M21,2021-03-23,1.3433,1.3433,1.3433,1.3433,0,0
R$M21,2021-03-24,1.3461,1.3461,1.3461,1.3461,6,6
R$M21,2021-03-25,1.3476,1.3476,1.3472,1.3472,0,6
R$M21,2021-03-26,1.3477,1.3477,1.3477,1.3477,0,6
R$M21,2021-03-29,1.3467,1.3467,1.3467,1.3467,0,6
R$M21,2021-03-30,1.3483,1.3483,1.3483,1.3483,0,6
R$M21,2021-03-31,1.3448,1.3448,1.3448,1.3448,0,6
R$M21,2021-04-01,1.3461,1.3461,1.3461,1.3461,0,6
R$M21,2021-04-02,1.3442,1.3442,1.3442,1.3442,0,6
R$M21,2021-04-05,1.3446,1.3446,1.3446,1.3446,0,6
R$M21,2021-04-06,1.3418,1.3418,1.3418,1.3418,10,11
R$M21,2021-04-07,1.339,1.3398,1.3389,1.3389,0,11
R$M21,2021-04-08,1.3406,1.3406,1.3406,1.3406,0,11
R$M21,2021-04-09,1.3411,1.3411,1.3411,1.3411,23,28
R$M21,2021-04-12,1.3427,1.3427,1.3406,1.3406,3,31
R$M21,2021-04-13,1.3425,1.3431,1.3425,1.3431,20,51
R$M21,2021-04-14,1.3374,1.3378,1.3374,1.3375,0,51
R$M21,2021-04-15,1.335,1.335,1.335,1.335,217,222
R$M21,2021-04-16,1.3358,1.3358,1.3337,1.3337,416,407
R$M21,2021-04-19,1.3344,1.3346,1.331,1.331,370,428
R$M21,2021-04-20,1.3305,1.3316,1.3265,1.3283,5,431
R$M21,2021-04-21,1.3291,1.3302,1.3291,1.3302,100,422
R$M21,2021-04-22,1.3304,1.3304,1.3279,1.3279,10,427
R$M21,2021-04-23,1.3277,1.3277,1.3274,1.3274,16,437
R$M21,2021-04-26,1.3273,1.3273,1.3256,1.326,204,438
R$M21,2021-04-27,1.3259,1.3267,1.3255,1.3257,79,429
R$M21,2021-04-28,1.3274,1.3278,1.3262,1.3262,22,441
R$M21,2021-04-29,1.326,1.3265,1.3245,1.3255,16,457
R$M21,2021-04-30,1.3266,1.3277,1.3266,1.3277,60,457
R$M21,2021-05-03,1.328,1.3341,1.328,1.3318,8,458
R$M21,2021-05-04,1.3298,1.3366,1.3298,1.3366,110,466
R$M21,2021-05-05,1.3376,1.3387,1.3351,1.3358,0,466
R$M21,2021-05-06,1.3349,1.3349,1.3349,1.3349,1,467
R$M21,2021-05-07,1.332,1.332,1.3316,1.3316,25,466
R$M21,2021-05-10,1.3263,1.3263,1.3247,1.3247,187,480
R$M21,2021-05-11,1.3244,1.3276,1.3244,1.3251,6,486
R$M21,2021-05-12,1.329,1.329,1.3287,1.3287,119,586
R$M21,2021-05-13,1.3312,1.3366,1.3294,1.3343,270,738
R$M21,2021-05-14,1.3346,1.3371,1.3338,1.3338,392,841
R$M21,2021-05-17,1.3332,1.3361,1.3319,1.3356,99,835
R$M21,2021-05-18,1.3358,1.3358,1.3295,1.33,93,785
R$M21,2021-05-19,1.3295,1.333,1.3287,1.3328,25,784
R$M21,2021-05-20,1.335,1.3354,1.3326,1.3329,26,773
R$M21,2021-05-21,1.3309,1.3309,1.3301,1.3301,25,777
R$M21,2021-05-24,1.3298,1.3318,1.3298,1.3301,39,767
R$M21,2021-05-25,1.3293,1.3293,1.3253,1.3254,28,782
R$M21,2021-05-26,1.3249,1.3249,1.323,1.3235,48,770
R$M21,2021-05-27,1.3245,1.3247,1.3229,1.3229,51,805
R$M21,2021-05-28,1.3238,1.3247,1.323,1.3244,76,826
R$M21,2021-05-31,1.3237,1.3237,1.3223,1.3226,16,826
R$M21,2021-06-01,1.3194,1.3227,1.3194,1.3227,34,808
R$M21,2021-06-02,1.323,1.3248,1.322,1.3248,50,785
R$M21,2021-06-03,1.3235,1.3245,1.3228,1.3244,137,720
R$M21,2021-06-04,1.3276,1.3285,1.3274,1.3285,219,564
R$M21,2021-06-07,1.3251,1.3252,1.3232,1.3232,42,544
R$M21,2021-06-08,1.3236,1.3238,1.3226,1.3237,290,343
R$M21,2021-06-09,1.3232,1.3243,1.3231,1.3233,48,343
R$M21,2021-06-10,1.3239,1.3253,1.3238,1.3244,406,292
R$M21,2021-06-11,1.3249,1.3261,1.3217,1.324,107,0
R$M21,2021-06-14,1.3252,1.3271,1.3252,1.3261,107,0

What am I doing wrong?
Thanks

Solution – 1

The error is in the parts you aren’t showing us, because your code works fine. I’m guessing you don’t have newlines separating the lines.

C:tmp>type x.py

textData="""
R$M21,2021-06-08,1.3236,1.3238,1.3226,1.3237,290,343
R$M21,2021-06-09,1.3232,1.3243,1.3231,1.3233,48,343
R$M21,2021-06-10,1.3239,1.3253,1.3238,1.3244,406,292
R$M21,2021-06-11,1.3249,1.3261,1.3217,1.324,107,0
R$M21,2021-06-14,1.3252,1.3271,1.3252,1.3261,107,0"""

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO(textData), sep=",")
print(df)

C:tmp>python x.py
   R$M21  2021-06-08  1.3236  1.3238  1.3226  1.3237  290  343
0  R$M21  2021-06-09  1.3232  1.3243  1.3231  1.3233   48  343
1  R$M21  2021-06-10  1.3239  1.3253  1.3238  1.3244  406  292
2  R$M21  2021-06-11  1.3249  1.3261  1.3217  1.3240  107    0
3  R$M21  2021-06-14  1.3252  1.3271  1.3252  1.3261  107    0

C:tmp>

Solution – 2

First, make sure to add newline after each line, best through os.linesep.

Then set the StringIO buffer «head position» to start, aka 0, before passing it to pandas:

import os
import pandas as pd
from io import StringIO

buffer = StringIO()
buffer.write('hello,23,2022,bye' + os.linesep)
buffer.write('world,43,2025,then' + os.linesep)
buffer.seek(0)
df = pd.read_csv(buffer, sep=',', header=None)

print(df)

This will yield:

       0   1     2     3
0  hello  23  2022   bye
1  world  43  2025  then

[Python-3.9]

Источник

Streamlit

Источник

Table of Contents

Understanding the EmptyDataError

How to Resolve the EmptyDataError

Step 1: Check the File Path

Step 2: Inspect the File Content

Step 3: Clean the File Content

Step 4: Use the `skip_blank_lines` Parameter

FAQ

1. What is pandas in Python?

2. What causes the EmptyDataError in pandas?

3. How to check if a file exists in Python?

4. How do I skip blank lines while reading a CSV file using pandas?

5. Can I use pandas with other file formats like JSON and Excel?

Solution – 1

Solution – 2

Возможно, вам также будет интересно:

Table of Contents

Understanding the EmptyDataError

How to Resolve the EmptyDataError

Step 1: Check the File Path

Step 2: Inspect the File Content

Step 3: Clean the File Content

Step 4: Use the skip_blank_lines Parameter

FAQ

1. What is pandas in Python?

2. What causes the EmptyDataError in pandas?

3. How to check if a file exists in Python?

4. How do I skip blank lines while reading a CSV file using pandas?

5. Can I use pandas with other file formats like JSON and Excel?

Solution – 1

Solution – 2

Возможно, вам также будет интересно:

Step 4: Use the `skip_blank_lines` Parameter