Incremental append in Sqoop

Question

While studying I came across Incremental append --last-value command

For example, let's say I already imported 'Account' table from RDBMS to HDFS using Sqoop. Now that table in RDBMS has new records and some old records also updated.

So to apply below command to import and append to the existing table we need to know the last value in that table. In real time how does this work?

$Sqoop import --connect jdbc:mysql://localhost/dbname --username uname --password pwd --incremental append --null-non-string --table tablename --target-dir '/location' --check-column colname --last-value number

Once we import to HDFS we can't delete records as per my understanding, so are we going to keep a flag to identify terminated or new or modified records

Please help to understand this.

Omkar · Answer 1 · Dec 31, 2018

You are right. As Hadoop follows WORM principle i.e write once and read many times. So, once the data gets uploaded to HDFS then it cannot be deleted or terminated anymore.

append is used when rows in a source table in DB get inserted regularly and the table must have a numeric primary key, if not then a numeric –split-by column that is used in absence of the numeric primary key. And that's how we keep track of the last value in the table. For e.g.

$sqoop import –connect jdbc://mysql:/localhost/DB_name –user username –password pasword –table tablename –incremental append –check-column colname –last-value 100