Monthly Archives: August 2016

Why PostgreSQL is the better MySQL

Ever since MySql has been purchased by Oracle, it has been lagging in development in the open source space. MariaDB , Percona, Aurora are spin offs that try to address it.  MySql is the original M of the LAMP stack. … Continue reading

Posted in data wrangling, relational databases | Tagged , , , , , | Leave a comment

Big Data == teens talking about sex

Dan Ariely. Duke University Professor. Big data may be sexy. But companies I’ve worked with or interviewed are still dealing with small-to-medium size data management problems. #smalldataproblems

Posted in big data, Uncategorized | Tagged , , | Leave a comment

OPEN SOURCE BIG DATA SOFTWARE AS HOW I CURRENTLY UNDERSTAND IT. part 2

https://github.com/mxento/Big-Data-Dictionary I created a big data dictionary for myself and anyone else who gets confused by all the big data applications out there and just want a simple one or two sentence descriptions for reference.

Posted in big data | Leave a comment

The Report or Dashbard is taking forever to load

A general rule with BI reports and dashboards is 10 seconds or less for a report or 30 seconds for dashboards. But quite often an analyst will run a report and it never comes back. They’ll say something like so … Continue reading

Posted in BI reporting, data wrangling | Tagged , , | Leave a comment

Some Google Trends Research of Big Data Vendors

https://ssl.gstatic.com/trends_nrtr/680_RC25/embed_loader.js trends.embed.renderExploreWidget(“TIMESERIES”, {“comparisonItem”:[{“keyword”:”Cloudera”,”geo”:””,”time”:”2008-07-27 2016-08-27″},{“keyword”:”mapR”,”geo”:””,”time”:”2008-07-27 2016-08-27″},{“keyword”:”hortonworks”,”geo”:””,”time”:”2008-07-27 2016-08-27″},{“keyword”:”hbase”,”geo”:””,”time”:”2008-07-27 2016-08-27″},{“keyword”:”bigtable”,”geo”:””,”time”:”2008-07-27 2016-08-27″}],”category”:0,”property”:””}, {}); Cloudera’s peaking EMR’s actually declining. https://ssl.gstatic.com/trends_nrtr/680_RC25/embed_loader.js trends.embed.renderExploreWidget(“TIMESERIES”, {“comparisonItem”:[{“keyword”:”/m/05ynw”,”geo”:””,”time”:”all”},{“keyword”:”/m/04y3k”,”geo”:””,”time”:”all”},{“keyword”:”/m/0120vr”,”geo”:””,”time”:”all”},{“keyword”:”/m/0120tv”,”geo”:””,”time”:”all”},{“keyword”:”/m/01vw9z”,”geo”:””,”time”:”all”}],”category”:0,”property”:””}, {}); The decline of the web 2.0 Relational Data Stores. Looks like RDS peaked at web 2.0 when google ipo’ed. https://ssl.gstatic.com/trends_nrtr/680_RC25/embed_loader.js https://ssl.gstatic.com/trends_nrtr/680_RC25/embed_loader.js trends.embed.renderExploreWidget(“TIMESERIES”, … Continue reading

Posted in data analysis | Tagged , , , | Leave a comment

Open Source big data software as how I currently understand it.

Since I’m moving away from relational databases and data warehousing (so 2008). I’m self learning Big Data Architecture. This Open Source big data software list as how I currently understand it. Hadoop : hdfs is the basic distributed file store; … Continue reading

Posted in big data | Tagged , | Leave a comment

SQL tip: To get first 10 Rows from a Table and profile the columns

For people working with database tables: Most will want to check out the columns in the table and do a quick scan to get 10 rows to sample data in the table. Here’s the SQL syntax for doing that with … Continue reading

Posted in data wrangling | Tagged , , | Leave a comment

Data Wranglers

Data wranglers deserve more respect in this 'data science' obsessed world. Slide from DC data wranglers meetup. pic.twitter.com/X3eyUjOPMq — mike xia (@xento) August 24, 2016 A lot of time is spent cleaning data before the data science can begin. Discovery … Continue reading

Posted in data wrangling | Tagged | Leave a comment

Some things I learned playing the role of ‘data scientist’

In general: Most of my career

Posted in data science | Tagged , , | Leave a comment