Best Practices for Micro-Batch Loading on Amazon Redshift Article by AWS blog
I work with Redshift everyday now at Amazon. It’s very useful big data warehouse tool.
Here’s a blog post about loading data into it. It’s very s3 dependent and heavy use of the Copy command.
Some quick notes:
-It’s faster to drop and load big tables into staging areas.
-Split input files in to pieces and load in parallel.
-COPY option ‘STATUPDATE OFF.’
-Avoid Vacuum of tables when possible
You could just read the main points in the how to guide.
here’s quick and eas do the following in a single transaction:
1. Create staging table “tablename_staging” like main table
2. Copy data from S3 into staging table
3. Delete rows in main table that are already present in staging table
4. Copy all rows from staging table to main table
5. Drop staging table