Category Archives: big data
Improve Amazon Redshift table performance fast & easy.
Assuming Redshift tables are un-compressed because most people don’t do it. list all the big tables by size, here’s a script run analyze compression analyze compression public.report_table_name; Example results: table column encoding est_reduction_pct report_table_name rowid raw 0 report_table_name create_date zstd 9.49 … Continue reading
Best Practices for Micro-Batch Loading on Amazon Redshift
Best Practices for Micro-Batch Loading on Amazon Redshift Article by AWS blog I work with Redshift everyday now at Amazon. It’s very useful big data warehouse tool. Here’s a blog post about loading data into it. It’s very s3 dependent … Continue reading
Amazon Redshift is an amazing database product
Redshift is : Fast like Ferrari Cheap like a Ford Fiesta Useful like a Minivan Self Driving Auto-magics like Tesla with Autopilot Key features: Really fancy features under-the-hood: -interleaved sort keys -columnar distributed storage -smart parallel execution -IO optimization (return … Continue reading
DevFestDC : key takeaways of the Google Cloud Products
TensorFLow : google’s machine learning api can be super powerful. Just make json REST calls to the end-point and get results based on google’s machine learning lib. Uses cases: 1. identify an image (image classification) 2. parse speech into text … Continue reading
Big Data == teens talking about sex
Dan Ariely. Duke University Professor. Big data may be sexy. But companies I’ve worked with or interviewed are still dealing with small-to-medium size data management problems. #smalldataproblems
OPEN SOURCE BIG DATA SOFTWARE AS HOW I CURRENTLY UNDERSTAND IT. part 2
https://github.com/mxento/Big-Data-Dictionary I created a big data dictionary for myself and anyone else who gets confused by all the big data applications out there and just want a simple one or two sentence descriptions for reference.
Open Source big data software as how I currently understand it.
Since I’m moving away from relational databases and data warehousing (so 2008). I’m self learning Big Data Architecture. This Open Source big data software list as how I currently understand it. Hadoop : hdfs is the basic distributed file store; … Continue reading