Category Archives: Uncategorized
I consider these tools the New Data Engineering the current standard : Scheduling Tools: Airflow ETL-adjacent processes: dbt Data Quality Testing: Great Expectations Infrastructure: Terraform Data Catalog/Discovery: Amundsen Here’s a visual guideline for modern data engineer roadmap https://github.com/datastacktv/data-engineer-roadmap credit to … Continue reading
Someone made a GCP lookup list for AWS cloud people Service comparisons The following table provides a side-by-side comparison of the various services available on AWS and Google Cloud. Service Category Service AWS Google Cloud Compute IaaS Amazon Elastic Compute … Continue reading
Modern Data Engineering is Complicated. There are so many things to know to be good. Languages : SQL , Python, Scala Operating Systems : Linux, bash shell Cloud : AWS, Azure, GCP Data Pipelines : Airflow, Kubeflow DevOps : Kubernetes, … Continue reading
My other hobby is investing and picking stocks.Here’s my other blog: http://hunandelightmd.com/
mysql tricks: do instant table swap to mitigate mysql deadlock error. Continue reading
Best practices for PySpark programming Programming in Spark using PySpark from Mostafa Elzoghbi History of SQL and all the advanced features over the last 30 years among big vendors. Modern SQL in Open Source and Commercial Databases from Markus Winand
Taken from a Dataiku meetup slide. This picture hit close to home.
I switched to wordpress.com as my host. I will most likely switch to AWS later.