Will re-running ETL IMPORT overwrite old data in staging area?

A run of the ETL pipeline had errors when accessing MongoDB and Postgres in the IMPORT step, but the processing continued until it failed in SUMMARIZE. I fixed the DB access problems. Now, is there a way to tell if the IMPORT created bad data in the staging area? If I re-run IMPORT, will it overwrite bad data, or will it skip IMPORT processing if the output files already exist? Is there a way to remove just files created by IMPORT? Or do I need to delete staging and start over?

Brian K.

Hi Brian,

Rerunning a job should output and overwrite the files from the previous run of that job.

If you know the INDEX job didn’t run with the correct data then you can just start the ETL pipeline from that job and it will overwrite and continue with the correct data.

./release.sh -j INDEX- <release_name>
1 Like

Just to be specific, the first step of a job is to clean the output of any previous run.