Category Archives: data wrangling

The reality of a data worker.

Taken from a Dataiku meetup slide.  This picture hit close to home.

Posted in data wrangling, Uncategorized, wheresthejoke | Tagged | Leave a comment

Amazon Redshift’s Unsupported Features of PostGres

Redshift is based off branch of PostGreSQL 8.0.2 [ PostgreSQL 8.0.2 was released in 2005] here’s all the unsupported fancy PostGres Stuff: taken directly from amazon’s manual. The bigs ones are: No Store Procedures, No Constraints enforcement, No triggers and no … Continue reading

Posted in data wrangling, mpp databases | Tagged , , , | Leave a comment

Best Practices for Micro-Batch Loading on Amazon Redshift

Best Practices for Micro-Batch Loading on Amazon Redshift Article by AWS blog I work with Redshift everyday now at Amazon. It’s very useful big data warehouse tool. Here’s a blog post about loading data into it. It’s very s3 dependent … Continue reading

Posted in big data, data wrangling, etl | Tagged , , , | Leave a comment

basic database table creation and load from csv using mysql and postgres

Basic database table creation with MySql and PostGreSQL. The starting point to most data applications is getting the data feeds and populating the tables. here’s an example of the process I’m loading a stock_history table from yahoo finance api source. … Continue reading

Posted in data wrangling, relational databases | Tagged , , , , | Leave a comment

Why PostgreSQL is the better MySQL

Ever since MySql has been purchased by Oracle, it has been lagging in development in the open source space. MariaDB , Percona, Aurora are spin offs that try to address it.  MySql is the original M of the LAMP stack. … Continue reading

Posted in data wrangling, relational databases | Tagged , , , , , | Leave a comment

The Report or Dashbard is taking forever to load

A general rule with BI reports and dashboards is 10 seconds or less for a report or 30 seconds for dashboards. But quite often an analyst will run a report and it never comes back. They’ll say something like so … Continue reading

Posted in BI reporting, data wrangling | Tagged , , | Leave a comment

SQL tip: To get first 10 Rows from a Table and profile the columns

For people working with database tables: Most will want to check out the columns in the table and do a quick scan to get 10 rows to sample data in the table. Here’s the SQL syntax for doing that with … Continue reading

Posted in data wrangling | Tagged , , | Leave a comment

Data Wranglers

Data wranglers deserve more respect in this 'data science' obsessed world. Slide from DC data wranglers meetup. pic.twitter.com/X3eyUjOPMq — mike xia (@xento) August 24, 2016 A lot of time is spent cleaning data before the data science can begin. Discovery … Continue reading

Posted in data wrangling | Tagged | Leave a comment