Reading different format files from s3 having decoding issues using boto3

0 votes
Am Trying to read text from different format files such as pdf,docx,doc,rtf from s3 using boto3.

import boto3

s3 = boto3.client('s3')

bucket = 'my-bucket'
#key = 'file2.doc'
#key =  'file3.docx'
key = 'file1.pdf'  

obj = s3.get_object(Bucket=bucket, Key=key)

file_content = obj['Body'].read().decode('utf-8')
print(file_content)

AM not getting actual file text properly.i did tried converting to binary to text and also different encodind formats but none worked.is there any way which works for all files??
May 17, 2024 in Python by anonymous

edited Mar 5 4 views

No answer to this question. Be the first to respond.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP