My other hobby is investing and picking stocks.
Here’s my other blog: http://hunandelightmd.com/
My other hobby is investing and picking stocks.
I noticed my dimension table was stuck when I query it for etl job and the truncate statement fails to return.
“Locking is a protection mechanism that controls how many sessions can access a table at the same time and which operations can be performed in those sessions. Most relational databases use row-level locks. Amazon Redshift uses table-level locks. You might experience locking conflicts if you perform frequent DDL operations on user tables, such as ALTER, DROP or TRUNCATE, or DML changes such as UPDATE, DELETE, or INSERT. ”
I use this handy check what’s the locking blocked pids with this code:
use the blocking pid column from this output to find the jobs to kill.
select a.txn_owner, a.txn_db, a.xid, a.pid, a.txn_start, a.lock_mode, a.relation as table_id,nvl(trim(c."name"),d.relname) as tablename, a.granted,b.pid as blocking_pid ,datediff(s,a.txn_start,getdate())/86400||' days '||datediff(s,a.txn_start,getdate())%86400/3600||' hrs '||datediff(s,a.txn_start,getdate())%3600/60||' mins '||datediff(s,a.txn_start,getdate())%60||' secs' as txn_duration from svv_transactions a left join (select pid,relation,granted from pg_locks group by 1,2,3) b on a.relation=b.relation and a.granted='f' and b.granted='t' left join (select * from stv_tbl_perm where slice=0) c on a.relation=c.id left join pg_class d on a.relation=d.oid where a.relation is not null;
Remember amazon redshift locks the whole table.
Creating a Tableau Report level View Filter (1 report varying view depending on tableau user logged in)
Creating a Report level View Filter in Tableau (1 report varying view depending on tableau user logged in)
You want a report to show an employee’s daily sales and for the employee to only see his data and manager to see all his employees’ data.
1. create mapping table with user_manager along with the daily sales data
table 1: user_manager
table 2: user_sales
date, sales_amt, user_name
2018/03/04, $20, mike_s
2018/03/03, $20, mike_s
join two objects together in tableau or before it.
select a.user_name, a.manager_name, b.date, b._sales_amt from user_manager a join user_sales b on a.user_name= b.user_name
now we have: user_name, manager_name, date, sales_amt
2. create tableau to import the entire object of two tables.
3. create calculated field called permission similar. *most important*
with formula username() = manager_name
4. add and apply permissions filter to tableau workbook
source of this example:
Found on: http://www.datasciencecentral.com/ and
Sometimes a simple regression model will do.
A deadlock in MySQL happens when two or more transactions mutually hold and request for locks, creating a cycle of dependencies. … InnoDB automatically detects transaction deadlocks, rollbacks a transaction immediately and returns an error. It uses a metric to pick the easiest transaction to rollback. MyISAM engine locks the entire table from read/write requests.
For example you have a near realtime table that you need to update via etl every 10 min , and that same table is used by your clients to read the current status of say batch processes status.
Avoid the mysql deadlock error by re-loading table in near-realtime. First to check if your tables are locked run as root or db admin :
show processlist ;
How do you write to a table while someone’s reading it without deadlock? Simple: load a staging table of the same table structure and do a rename swap in mysql.
Here’s the steps in list format
- create a table with the same structure of the original one.
- have etl load that table instead.
- do some data validity checks.
- do rename table swap code: a->c, b->a, c->b (a = prod, b = stage, c=temp)
example mysql code:
create table current_status_stage like current_status_stage;
— load table from source system via mysql command line or otherwise. ex csv
LOAD DATA INFILE "/tmp/current_status.csv" INTO TABLE current_status_stage COLUMNS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\n';
RENAME TABLE reporting.current_status TO reporting.current_status_old, reporting.current_status_stage To reporting.current_status;RENAME TABLE reporting.current_status_old TO reporting.current_status_stage
Assuming Redshift tables are un-compressed because most people don’t do it.
- list all the big tables by size, here’s a script
- run analyze compression
analyze compression public.report_table_name;
table column encoding est_reduction_pct report_table_name rowid raw 0 report_table_name create_date zstd 9.49 report_table_name create_day delta 38.72 report_table_name create_date_key zstd 28.98 report_table_name type zstd 67.59 report_table_name type_desc zstd 58.25 report_table_name created_by zstd 35.17
- recreate table with new compression encoding and see performance gains.
DROP TABLE IF EXISTS public.report_table_name cascade;
CREATE TABLE public.report_table_name
( rowid INT IDENTITY(1,1),
create_date timestamp encode zstd,
create_day date encode zstd,
type integer encode zstd,
type_desc varchar(28) encode zstd,
created_by varchar(30) encode zstd,
data_refresh_date date encode raw
Amazon manual regarding compression command:
Best practices for PySpark programming
History of SQL and all the advanced features over the last 30 years among big vendors.