Author Archives: mx

Slide Share’s full of useful knowledge

Best practices for PySpark programming Programming in Spark using PySpark from Mostafa Elzoghbi History of SQL and all the advanced features over the last 30 years among big vendors. Modern SQL in Open Source and Commercial Databases from Markus Winand

Posted in spark, Uncategorized | Tagged , , , | Leave a comment

What I think of every time I hear Stakeholders

What I think of every time I hear Major Stakeholder Continue reading

Posted in Business Intelligence, wheresthejoke | Tagged | Leave a comment

Re-Blog: 10 Risks that Beset Data Programmes

Credits to: Peter James Thomas: https://www.linkedin.com/pulse/10-risks-beset-data-programmes-peter-james-thomas Not establishing a dedicated team. The team never escapes from “the day job” or legacy / BAU issues; the past prevents the future from being built. Staff lack skills and prior experience of data … Continue reading

Posted in Business Intelligence | Tagged , | Leave a comment

The reality of a data worker.

Taken from a Dataiku meetup slide.  This picture hit close to home.

Posted in data wrangling, Uncategorized, wheresthejoke | Tagged | Leave a comment

Things to note when migrating web hosts

Things to note when migrating web hosts Continue reading

Posted in Uncategorized | Leave a comment

New Year, New Site

I switched to wordpress.com as my host. I will most likely switch to AWS later.

Posted in Uncategorized | Leave a comment

Amazon Redshift’s Unsupported Features of PostGres

Redshift is based off branch of PostGreSQL 8.0.2 [ PostgreSQL 8.0.2 was released in 2005] here’s all the unsupported fancy PostGres Stuff: taken directly from amazon’s manual. The bigs ones are: No Store Procedures, No Constraints enforcement, No triggers and no … Continue reading

Posted in data wrangling, mpp databases | Tagged , , , | Leave a comment

Best Practices for Micro-Batch Loading on Amazon Redshift

Best Practices for Micro-Batch Loading on Amazon Redshift Article by AWS blog I work with Redshift everyday now at Amazon. It’s very useful big data warehouse tool. Here’s a blog post about loading data into it. It’s very s3 dependent … Continue reading

Posted in big data, data wrangling, etl | Tagged , , , | Leave a comment

Amazon Redshift is an amazing database product

Redshift is : Fast like Ferrari Cheap like a Ford Fiesta Useful like a Minivan Self Driving Auto-magics like Tesla with Autopilot Key features: Really fancy features under-the-hood: -interleaved sort keys -columnar distributed storage -smart parallel execution -IO optimization (return … Continue reading

Posted in big data, Business Intelligence, Cloud, data analysis, relational databases | Tagged , , , , , | Leave a comment

Review of two New Cloud BI tools : Snowflake and Looker

Snowflake: data warehouse in the cloud (specially amazon) Snowflake compute is basically an analytics computing database that has scalability. Data is stored / shared on AWS S3 buckets instead of in snowfalek. You spin up snowflake tool injest and load into it’s proprietary … Continue reading

Posted in BI reporting, Business Intelligence, Cloud | Tagged , , , | Leave a comment