Category Archives: big data
Assuming Redshift tables are un-compressed because most people don’t do it. list all the big tables by size, here’s a script run analyze compression analyze compression public.report_table_name; Example results: table column encoding est_reduction_pct report_table_name rowid raw 0 report_table_name create_date zstd 9.49 … Continue reading
Best Practices for Micro-Batch Loading on Amazon Redshift Article by AWS blog I work with Redshift everyday now at Amazon. It’s very useful big data warehouse tool. Here’s a blog post about loading data into it. It’s very s3 dependent … Continue reading
Redshift is : Fast like Ferrari Cheap like a Ford Fiesta Useful like a Minivan Self Driving Auto-magics like Tesla with Autopilot Key features: Really fancy features under-the-hood: -interleaved sort keys -columnar distributed storage -smart parallel execution -IO optimization (return … Continue reading
TensorFLow : google’s machine learning api can be super powerful. Just make json REST calls to the end-point and get results based on google’s machine learning lib. Uses cases: 1. identify an image (image classification) 2. parse speech into text … Continue reading
Dan Ariely. Duke University Professor. Big data may be sexy. But companies I’ve worked with or interviewed are still dealing with small-to-medium size data management problems. #smalldataproblems
https://github.com/mxento/Big-Data-Dictionary I created a big data dictionary for myself and anyone else who gets confused by all the big data applications out there and just want a simple one or two sentence descriptions for reference.
Since I’m moving away from relational databases and data warehousing (so 2008). I’m self learning Big Data Architecture. This Open Source big data software list as how I currently understand it. Hadoop : hdfs is the basic distributed file store; … Continue reading