Best Practices for Micro-Batch Loading on Amazon Redshift

Best Practices for Micro-Batch Loading on Amazon Redshift Article by AWS blog

I work with Redshift everyday now at Amazon. It’s very useful big data warehouse tool.
Here’s a blog post about loading data into it. It’s very s3 dependent and heavy use of the Copy command.

Some quick notes:
-It’s faster to drop and load big tables into staging areas.
-Split input files in to pieces and load in parallel.
-COPY option ‘STATUPDATE OFF.’
-Avoid Vacuum of tables when possible

You could just read the main points in the how to guide.

here’s quick and eas do the following in a single transaction:
1. Create staging table “tablename_staging” like main table
2. Copy data from S3 into staging table
3. Delete rows in main table that are already present in staging table
4. Copy all rows from staging table to main table
5. Drop staging table

Advertisements
This entry was posted in big data, data wrangling, etl and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s