Author Archives: mx
The Data Engineering 2021
I consider these tools the New Data Engineering the current standard : Scheduling Tools: Airflow ETL-adjacent processes: dbt Data Quality Testing: Great Expectations Infrastructure: Terraform Data Catalog/Discovery: Amundsen Here’s a visual guideline for modern data engineer roadmap https://github.com/datastacktv/data-engineer-roadmap credit to … Continue reading
GCP for AWS professionals
Someone made a GCP lookup list for AWS cloud people Service comparisons The following table provides a side-by-side comparison of the various services available on AWS and Google Cloud. Service Category Service AWS Google Cloud Compute IaaS Amazon Elastic Compute … Continue reading
Modern Data Engineering is Complicated
Modern Data Engineering is Complicated. There are so many things to know to be good. Languages : SQL , Python, Scala Operating Systems : Linux, bash shell Cloud : AWS, Azure, GCP Data Pipelines : Airflow, Kubeflow DevOps : Kubernetes, … Continue reading
My other hobby is Stocks
My other hobby is investing and picking stocks.Here’s my other blog: http://hunandelightmd.com/
How do I unlock blocking queries in Amazon Redshift?
I noticed my dimension table was stuck when I query it for etl job and the truncate statement fails to return. “Locking is a protection mechanism that controls how many sessions can access a table at the same time and … Continue reading
Creating a Tableau Report level View Filter (1 report varying view depending on tableau user logged in)
Creating a Report level View Filter in Tableau (1 report varying view depending on tableau user logged in) use case: You want a report to show an employee’s daily sales and for the employee to only see his data and … Continue reading
Deep Learning versus Machine Learning in One Picture
Found on: http://www.datasciencecentral.com/ and https://www.linkedin.com/pulse/logistic-regression-vs-deep-neural-networks-david-young/ Sometimes a simple regression model will do.
mysql tricks: do instant table swap to mitigate mysql deadlock error.
mysql tricks: do instant table swap to mitigate mysql deadlock error. Continue reading
Improve Amazon Redshift table performance fast & easy.
Assuming Redshift tables are un-compressed because most people don’t do it. list all the big tables by size, here’s a script run analyze compression analyze compression public.report_table_name; Example results: table column encoding est_reduction_pct report_table_name rowid raw 0 report_table_name create_date zstd 9.49 … Continue reading