How to create Athena tables for dynamic S3 paths using AWS Crawler

Question

Below are given my S3 paths under which multiple folders are present. Each folder contains a CSV file each with a different schema.

The values within the curly braces {} will be dynamic.

s3://test_bucket/{val1}/data/{val2}/input/latest/

s3://test_bucket/{val1}/data/{val2}/input/archived/timestamp={val3}/

I want to create the Athena tables using AWS Glue Crawler. We can have a separate database for input_data both for current and archive.

The tables formed should be such that it's partitioned over val1 and val2 both for the current and archive. And, an additional partition should be present in the table, that is, val3, in the case of the archived.

score 0 · Answer 1 · Feb 16, 2022

The simplest and most efficient way to use partition projection. It speeds up the query processing of highly partitioned tables and automate partition management. Partition values and locations are calculated from configuration rather than the repository like AWS Glue Data Catalogue. In memory operations are faster than remote operations