Am Trying to read text from different format files such as pdf,docx,doc,rtf from s3 using boto3.
import boto3
s3 = boto3.client('s3')
bucket = 'my-bucket'
#key = 'file2.doc'
#key = 'file3.docx'
key = 'file1.pdf'
obj = s3.get_object(Bucket=bucket, Key=key)
file_content = obj['Body'].read().decode('utf-8')
print(file_content)
AM not getting actual file text properly.i did tried converting to binary to text and also different encodind formats but none worked.is there any way which works for all files??