Category Archives: big data

Improve Amazon Redshift table performance fast & easy.

Assuming Redshift tables are un-compressed because most people don’t do it. list all the big tables by size, here’s a script run analyze compression analyze compression public.report_table_name; Example results: table column encoding est_reduction_pct report_table_name rowid raw 0 report_table_name create_date zstd 9.49 … Continue reading

Posted in big data | Tagged , | Leave a comment

Best Practices for Micro-Batch Loading on Amazon Redshift

Best Practices for Micro-Batch Loading on Amazon Redshift Article by AWS blog I work with Redshift everyday now at Amazon. It’s very useful big data warehouse tool. Here’s a blog post about loading data into it. It’s very s3 dependent … Continue reading

Posted in big data, data wrangling, etl | Tagged , , , | Leave a comment

Amazon Redshift is an amazing database product

Redshift is : Fast like Ferrari Cheap like a Ford Fiesta Useful like a Minivan Self Driving Auto-magics like Tesla with Autopilot Key features: Really fancy features under-the-hood: -interleaved sort keys -columnar distributed storage -smart parallel execution -IO optimization (return … Continue reading

Posted in big data, Business Intelligence, Cloud, data analysis, relational databases | Tagged , , , , , | Leave a comment

DevFestDC : key takeaways of the Google Cloud Products

TensorFLow : google’s machine learning api can be super powerful. Just make json REST calls to the end-point and get results based on google’s machine learning lib. Uses cases: 1. identify an image (image classification) 2. parse speech into text … Continue reading

Posted in big data, machine learning, Uncategorized | Tagged , | Leave a comment

Big Data == teens talking about sex

Dan Ariely. Duke University Professor. Big data may be sexy. But companies I’ve worked with or interviewed are still dealing with small-to-medium size data management problems. #smalldataproblems

Posted in big data, Uncategorized | Tagged , , | Leave a comment

OPEN SOURCE BIG DATA SOFTWARE AS HOW I CURRENTLY UNDERSTAND IT. part 2

https://github.com/mxento/Big-Data-Dictionary I created a big data dictionary for myself and anyone else who gets confused by all the big data applications out there and just want a simple one or two sentence descriptions for reference.

Posted in big data | Leave a comment

Open Source big data software as how I currently understand it.

Since I’m moving away from relational databases and data warehousing (so 2008). I’m self learning Big Data Architecture. This Open Source big data software list as how I currently understand it. Hadoop : hdfs is the basic distributed file store; … Continue reading

Posted in big data | Tagged , | Leave a comment