Category Archives: data wrangling
The reality of a data worker.
Taken from a Dataiku meetup slide. This picture hit close to home.
Amazon Redshift’s Unsupported Features of PostGres
Redshift is based off branch of PostGreSQL 8.0.2 [ PostgreSQL 8.0.2 was released in 2005] here’s all the unsupported fancy PostGres Stuff: taken directly from amazon’s manual. The bigs ones are: No Store Procedures, No Constraints enforcement, No triggers and no … Continue reading
Best Practices for Micro-Batch Loading on Amazon Redshift
Best Practices for Micro-Batch Loading on Amazon Redshift Article by AWS blog I work with Redshift everyday now at Amazon. It’s very useful big data warehouse tool. Here’s a blog post about loading data into it. It’s very s3 dependent … Continue reading
basic database table creation and load from csv using mysql and postgres
Basic database table creation with MySql and PostGreSQL. The starting point to most data applications is getting the data feeds and populating the tables. here’s an example of the process I’m loading a stock_history table from yahoo finance api source. … Continue reading
Why PostgreSQL is the better MySQL
Ever since MySql has been purchased by Oracle, it has been lagging in development in the open source space. MariaDB , Percona, Aurora are spin offs that try to address it. MySql is the original M of the LAMP stack. … Continue reading
The Report or Dashbard is taking forever to load
A general rule with BI reports and dashboards is 10 seconds or less for a report or 30 seconds for dashboards. But quite often an analyst will run a report and it never comes back. They’ll say something like so … Continue reading
SQL tip: To get first 10 Rows from a Table and profile the columns
For people working with database tables: Most will want to check out the columns in the table and do a quick scan to get 10 rows to sample data in the table. Here’s the SQL syntax for doing that with … Continue reading
Data Wranglers
Data wranglers deserve more respect in this 'data science' obsessed world. Slide from DC data wranglers meetup. pic.twitter.com/X3eyUjOPMq — mike xia (@xento) August 24, 2016 A lot of time is spent cleaning data before the data science can begin. Discovery … Continue reading