Python UnicodeDecodeError utf-8 codec can t decode byte 0xa0 in position 10 invalid start byte

+5 votes

Unable to import this file it shows an error. My code was:

import pandas as pd
a = pd.read_csv("filename.csv")
Jul 11, 2019 in Python by Yadu
335,546 views

4 answers to this question.

+11 votes
Best answer

You have to use the encoding as latin1 to read this file as there are some special character in this file, use the below code snippet to read the file. Try this:

import pandas as pd
data=pd.read_csv("C:\\Users\\akashkumar\\Downloads\\Customers.csv",encoding='latin1')
print(data.head())


Hope it helps!!

If you need to know more about Python, It's recommended to join Python course today.

Thanks!

answered Jul 11, 2019 by Ritu

selected Dec 11, 2019 by Kalgi
Thank you. Worked well for me ....
Glad to hear it :)

Please upvote.

thanks. it worked. 

encoding='latin1' part solved my problem.
thanks worked well
Why does it work ?
Hey guys, how's the corona Quarantine going on? You guys working from home too?
Hey, @Book

It will help if you could just elaborate on your doubt.
Thank you..:) bro
>>> file = open('policydocs/policy/0_policy_Test Backoffice.pdf', encoding='latin1')
>>> result = k.set_contents_from_file(file)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "D:\treatufair\venv\lib\site-packages\boto\s3\key.py", line 1307, in set_contents_from_file
    self.send_file(fp, headers=headers, cb=cb, num_cb=num_cb,
  File "D:\treatufair\venv\lib\site-packages\boto\s3\key.py", line 760, in send_file
    self._send_file_internal(fp, headers=headers, cb=cb, num_cb=num_cb,
  File "D:\treatufair\venv\lib\site-packages\boto\s3\key.py", line 957, in _send_file_internal
    resp = self.bucket.connection.make_request(
  File "D:\treatufair\venv\lib\site-packages\boto\s3\connection.py", line 667, in make_request
    return super(S3Connection, self).make_request(
  File "D:\treatufair\venv\lib\site-packages\boto\connection.py", line 1070, in make_request
    return self._mexe(http_request, sender, override_num_retries,
  File "D:\treatufair\venv\lib\site-packages\boto\connection.py", line 939, in _mexe
    response = sender(connection, request.method, request.path,
  File "D:\treatufair\venv\lib\site-packages\boto\s3\key.py", line 895, in sender
    raise provider.storage_response_error(
boto.exception.S3ResponseError: S3ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>BadDigest</Code><Message>The Content-MD5 you specified did not match what we received.</Message><ExpectedDigest>4a5e3dabb8dd3747f239ddf71050f327</ExpectedDigest><
CalculatedDigest>1NIHgiuwNo7w3xjQKn7WLg==</CalculatedDigest><RequestId>485E4F3BD00D8865</RequestId><HostId>tFC75KSl6s4bTxW3WFcTPUUfxNmvhOMrWYPn7dfnRlPXlxI0X15zyvYZAEgLO/EsUBe4
0BEtMIc=</HostId></Error>
>>> file = open('policydocs/policy/0_policy_Test Backoffice.pdf', encoding="utf-8")

Thank you. it worked for me too.
Thanks a million. I've been to YouTube and Google searching for a way out.
worked for me also thanks
OMG! Thank you so much. I was stuck on this for a while!! :D
Thank you!!! that additional encoding='latin1' entirely fixes this issue

Thanks a lot. I was really stuck with this problem. 

Alternatively we can use "encoding='unicode_escape'" with the same effect.

import pandas as pd
data=pd.read_csv("C:\\Users\\akashkumar\\Downloads\\Customers.csv",encoding=''unicode_escape')
print(data.head())

worked for me as well. thanks very much indeed
0 votes

tl;dr / quick fix

  1. Don't decode/encode willy nilly.
  2. Don't assume your strings are UTF-8 encoded.
  3. Try to convert strings to Unicode strings as soon as possible in your code.
  4. Fix your locale: How to solve UnicodeDecodeError in Python 3.6?
  5. Don't be tempted to use quick reload hacks.

Ready to unlock the power of data? Join our Data Science with Python Course and gain the skills to analyze, visualize, and make data-driven decisions.

answered Dec 11, 2020 by Roshni
• 10,480 points
+1 vote
str = unicode(str, errors='replace')

or

str = unicode(str, errors='ignore')

Note: This will strip out (ignore) the characters in question returning the string without them.

For me this is ideal case since I'm using it as protection against non-ASCII input which is not allowed by my application.

Alternatively: Use the open method from the codecs module to read in the file:

import codecs
with codecs.open(file_name, 'r', encoding='utf-8',
                 errors='ignore') as fdata:

answered Dec 11, 2020 by Gitika
• 65,770 points
0 votes

Python bytes decode() function is used to convert bytes to string object. Both these functions allow us to specify the error handling scheme to use for encoding/decoding errors. The default is 'strict' meaning that encoding errors raise a UnicodeEncodeError.

The UnicodeDecodeError normally happens when decoding an str string from a certain coding. Since codings map only a limited number of str strings to unicode characters, an illegal sequence of str characters will cause the coding-specific decode() to fail

answered Dec 11, 2020 by Rajiv
• 8,870 points

Related Questions In Python

0 votes
0 answers

utf-8' codec can't decode byte 0xa0 in position 10: invalid start byte

my code import wordcloud import numpy as np from matplotlib ...READ MORE

Mar 29, 2020 in Python by anonymous
• 120 points
5,253 views
0 votes
2 answers

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 16: invalid start byte

Thanks, This answer was helpful. READ MORE

answered Jul 11, 2020 in Python by Prashant Chhatrashali
16,807 views
0 votes
1 answer

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

Hi, @hala, Regarding your query, you can go ...READ MORE

answered Jun 29, 2020 in Python by Niroj
• 82,840 points
17,573 views
0 votes
2 answers

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Hey,  @Himanshu. It's still most likely gzipped data. ...READ MORE

answered Jul 27, 2020 in Python by Gitika
• 65,770 points
24,957 views
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 4,434 views
0 votes
1 answer
0 votes
1 answer

Error is "invalid literal for int() with base 10: ' ' "

This error is caused because we try ...READ MORE

answered Oct 15, 2020 in Python by Gitika
• 65,770 points
2,888 views
0 votes
1 answer
+3 votes
2 answers

UnicodeDecodeError: "utf-8" codec can't decode byte in position : invalid start byte

You have to use the encoding as latin1 ...READ MORE

answered Jul 23, 2019 in Python by Kunal
247,677 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP