utf-8 codec can t decode byte 0x82 in position 16 invalid start byte

0 votes

hello,

I'm working on a sentiment analysis project where I'm dealing with the Arabic language. I downloaded an excel sheet that contains two columns, text and labels. and I'm getting this error 'utf-8' codec can't decode byte 0x82 in position 16: invalid start byte. The file itself can open, but when I want to tokenize the text the error occurs!

please help me very soon!!!

this is my code

import nltk
nltk.download('punkt')
token_data= open("data try.xlsx").read()
tokens = nltk.sent_tokenize(token_data)
sent_tokenize(token_data)
Jun 27, 2020 in Python by zena
• 140 points

edited Jun 29, 2020 by MD 15,915 views

1 answer to this question.

0 votes

Hi@zena,

The error is because there is some non-ASCII character and it can't be encoded/decoded.  One simple way to avoid this error is to encode such strings. You can use the below line in your code. 

token_data= open("data try.xlsx",encoding="utf8").read()
answered Jun 29, 2020 by MD
• 95,460 points

Related Questions In Python

+3 votes
2 answers

UnicodeDecodeError: "utf-8" codec can't decode byte in position : invalid start byte

You have to use the encoding as latin1 ...READ MORE

answered Jul 23, 2019 in Python by Kunal
247,687 views
0 votes
0 answers

utf-8' codec can't decode byte 0xa0 in position 10: invalid start byte

my code import wordcloud import numpy as np from matplotlib ...READ MORE

Mar 29, 2020 in Python by anonymous
• 120 points
5,253 views
0 votes
1 answer

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

Hi, @hala, Regarding your query, you can go ...READ MORE

answered Jun 29, 2020 in Python by Niroj
• 82,840 points
17,574 views
+1 vote
1 answer

'utf-8' codec can't decode byte 0xa9 in position 12527: invalid start byte

Hello, You can always safely read in binary ...READ MORE

answered Jun 30, 2020 in Python by Niroj
• 82,840 points
14,392 views
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 4,434 views
0 votes
1 answer
+5 votes
6 answers

Lowercase in Python

You can simply the built-in function in ...READ MORE

answered Apr 11, 2018 in Python by hemant
• 5,790 points
4,067 views
0 votes
2 answers

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 16: invalid start byte

Thanks, This answer was helpful. READ MORE

answered Jul 11, 2020 in Python by Prashant Chhatrashali
16,809 views
0 votes
2 answers

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Hey,  @Himanshu. It's still most likely gzipped data. ...READ MORE

answered Jul 27, 2020 in Python by Gitika
• 65,770 points
24,957 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP