Author Archives: mx

The Data Engineering 2021

I consider these tools the New Data Engineering the current standard : Scheduling Tools: Airflow ETL-adjacent processes: dbt Data Quality Testing: Great Expectations Infrastructure: Terraform Data Catalog/Discovery: Amundsen Here’s a visual guideline for modern data engineer roadmap https://github.com/datastacktv/data-engineer-roadmap credit to … Continue reading

Posted in Uncategorized | Leave a comment

How Netflix does data in AWS

https://netflixtechblog.com/byte-down-making-netflixs-data-infrastructure-cost-effective-fee7b3235032

Posted in Uncategorized | Leave a comment

GCP for AWS professionals

Someone made a GCP lookup list for AWS cloud people Service comparisons The following table provides a side-by-side comparison of the various services available on AWS and Google Cloud. Service Category Service AWS Google Cloud Compute IaaS Amazon Elastic Compute … Continue reading

Posted in Uncategorized | Leave a comment

Modern Data Engineering is Complicated

Modern Data Engineering is Complicated. There are so many things to know to be good. Languages : SQL , Python, Scala Operating Systems : Linux, bash shell Cloud : AWS, Azure, GCP Data Pipelines : Airflow, Kubeflow DevOps : Kubernetes, … Continue reading

Posted in Uncategorized | Leave a comment

My other hobby is Stocks

My other hobby is investing and picking stocks.Here’s my other blog: http://hunandelightmd.com/

Posted in Uncategorized | Leave a comment

How do I unlock blocking queries in Amazon Redshift?

I noticed my dimension table was stuck when I query it for etl job and the truncate statement fails to return. “Locking is a protection mechanism that controls how many sessions can access a table at the same time and … Continue reading

Posted in etl | Leave a comment

Creating a Tableau Report level View Filter (1 report varying view depending on tableau user logged in)

Creating a Report level View Filter in Tableau (1 report varying view depending on tableau user logged in) use case: You want a report to show an employee’s daily sales and for the employee to only see his data and … Continue reading

Posted in BI reporting, Business Intelligence, tableau | Tagged , | Leave a comment

Deep Learning versus Machine Learning in One Picture

Found on: http://www.datasciencecentral.com/  and https://www.linkedin.com/pulse/logistic-regression-vs-deep-neural-networks-david-young/ Sometimes a simple regression model will do.

Posted in data science, machine learning | Tagged | Leave a comment

mysql tricks: do instant table swap to mitigate mysql deadlock error.

mysql tricks: do instant table swap to mitigate mysql deadlock error. Continue reading

Posted in etl, relational databases, Uncategorized | Tagged , , , , | Leave a comment

Improve Amazon Redshift table performance fast & easy.

Assuming Redshift tables are un-compressed because most people don’t do it. list all the big tables by size, here’s a script run analyze compression analyze compression public.report_table_name; Example results: table column encoding est_reduction_pct report_table_name rowid raw 0 report_table_name create_date zstd 9.49 … Continue reading

Posted in big data | Tagged , | Leave a comment