Monthly Archives: August 2016
Why PostgreSQL is the better MySQL
Ever since MySql has been purchased by Oracle, it has been lagging in development in the open source space. MariaDB , Percona, Aurora are spin offs that try to address it. MySql is the original M of the LAMP stack. … Continue reading
Big Data == teens talking about sex
Dan Ariely. Duke University Professor. Big data may be sexy. But companies I’ve worked with or interviewed are still dealing with small-to-medium size data management problems. #smalldataproblems
OPEN SOURCE BIG DATA SOFTWARE AS HOW I CURRENTLY UNDERSTAND IT. part 2
https://github.com/mxento/Big-Data-Dictionary I created a big data dictionary for myself and anyone else who gets confused by all the big data applications out there and just want a simple one or two sentence descriptions for reference.
The Report or Dashbard is taking forever to load
A general rule with BI reports and dashboards is 10 seconds or less for a report or 30 seconds for dashboards. But quite often an analyst will run a report and it never comes back. They’ll say something like so … Continue reading
Some Google Trends Research of Big Data Vendors
https://ssl.gstatic.com/trends_nrtr/680_RC25/embed_loader.js trends.embed.renderExploreWidget(“TIMESERIES”, {“comparisonItem”:[{“keyword”:”Cloudera”,”geo”:””,”time”:”2008-07-27 2016-08-27″},{“keyword”:”mapR”,”geo”:””,”time”:”2008-07-27 2016-08-27″},{“keyword”:”hortonworks”,”geo”:””,”time”:”2008-07-27 2016-08-27″},{“keyword”:”hbase”,”geo”:””,”time”:”2008-07-27 2016-08-27″},{“keyword”:”bigtable”,”geo”:””,”time”:”2008-07-27 2016-08-27″}],”category”:0,”property”:””}, {}); Cloudera’s peaking EMR’s actually declining. https://ssl.gstatic.com/trends_nrtr/680_RC25/embed_loader.js trends.embed.renderExploreWidget(“TIMESERIES”, {“comparisonItem”:[{“keyword”:”/m/05ynw”,”geo”:””,”time”:”all”},{“keyword”:”/m/04y3k”,”geo”:””,”time”:”all”},{“keyword”:”/m/0120vr”,”geo”:””,”time”:”all”},{“keyword”:”/m/0120tv”,”geo”:””,”time”:”all”},{“keyword”:”/m/01vw9z”,”geo”:””,”time”:”all”}],”category”:0,”property”:””}, {}); The decline of the web 2.0 Relational Data Stores. Looks like RDS peaked at web 2.0 when google ipo’ed. https://ssl.gstatic.com/trends_nrtr/680_RC25/embed_loader.js https://ssl.gstatic.com/trends_nrtr/680_RC25/embed_loader.js trends.embed.renderExploreWidget(“TIMESERIES”, … Continue reading
Open Source big data software as how I currently understand it.
Since I’m moving away from relational databases and data warehousing (so 2008). I’m self learning Big Data Architecture. This Open Source big data software list as how I currently understand it. Hadoop : hdfs is the basic distributed file store; … Continue reading
SQL tip: To get first 10 Rows from a Table and profile the columns
For people working with database tables: Most will want to check out the columns in the table and do a quick scan to get 10 rows to sample data in the table. Here’s the SQL syntax for doing that with … Continue reading
Data Wranglers
Data wranglers deserve more respect in this 'data science' obsessed world. Slide from DC data wranglers meetup. pic.twitter.com/X3eyUjOPMq — mike xia (@xento) August 24, 2016 A lot of time is spent cleaning data before the data science can begin. Discovery … Continue reading
Some things I learned playing the role of ‘data scientist’
In general: Most of my career