Author Archives: mx
Best practices for PySpark programming Programming in Spark using PySpark from Mostafa Elzoghbi History of SQL and all the advanced features over the last 30 years among big vendors. Modern SQL in Open Source and Commercial Databases from Markus Winand
What I think of every time I hear Major Stakeholder Continue reading
Credits to: Peter James Thomas: https://www.linkedin.com/pulse/10-risks-beset-data-programmes-peter-james-thomas Not establishing a dedicated team. The team never escapes from “the day job” or legacy / BAU issues; the past prevents the future from being built. Staff lack skills and prior experience of data … Continue reading
Taken from a Dataiku meetup slide. This picture hit close to home.
I switched to wordpress.com as my host. I will most likely switch to AWS later.
Redshift is based off branch of PostGreSQL 8.0.2 [ PostgreSQL 8.0.2 was released in 2005] here’s all the unsupported fancy PostGres Stuff: taken directly from amazon’s manual. The bigs ones are: No Store Procedures, No Constraints enforcement, No triggers and no … Continue reading
Best Practices for Micro-Batch Loading on Amazon Redshift Article by AWS blog I work with Redshift everyday now at Amazon. It’s very useful big data warehouse tool. Here’s a blog post about loading data into it. It’s very s3 dependent … Continue reading
Redshift is : Fast like Ferrari Cheap like a Ford Fiesta Useful like a Minivan Self Driving Auto-magics like Tesla with Autopilot Key features: Really fancy features under-the-hood: -interleaved sort keys -columnar distributed storage -smart parallel execution -IO optimization (return … Continue reading
Snowflake: data warehouse in the cloud (specially amazon) Snowflake compute is basically an analytics computing database that has scalability. Data is stored / shared on AWS S3 buckets instead of in snowfalek. You spin up snowflake tool injest and load into it’s proprietary … Continue reading