I. Loading data from S3 with Gzip
copy table_name from 's3://<bucket_name>/<object_prefix>' credentials 'aws_access_key_id=***;aws_secret_access_key=***' DELIMITER ',' GZIP REMOVEQUOTES STATUPDATE ON;
II. Unloading data to S3 with Gzip
UNLOAD ('select * from test')
to 's3://bucket' CREDENTIALS
'aws_access_key_id=**********;aws_secret_access_key=**************' GZIP DELIMITER ',' MANIFEST ALLOWOVERWRITE;
III. Show on query is running
select pid, user_name, starttime, query
#We can cancel query
IV. The way to get the size of a database in Redshift.
SELECT id table_id
, SUM(ROWS) ROW_COUNT
JOIN pg_class ON pg_class.oid = stv_tbl_perm.id
JOIN pg_namespace ON pg_namespace.oid = relnamespace
JOIN pg_database ON pg_database.oid = stv_tbl_perm.db_id
GROUP BY id, datname, nspname, relname
ORDER BY id, datname, nspname, relname
SELECT tbl table_id,ROUND(CONVERT(REAL, COUNT(*))/1024,2) size_in_gb
FROM stv_blocklist bloc
GROUP BY tbl
ON tbl_size.table_id = tbl_det.table_id;
V. Knowledge about Redshift
1. What is VACUUM?
VACUUM means reorganized data: Run the VACUUM command on the tables in the query to reclaim space and re-sort rows. Running VACUUM helps if the unsorted region is large and the query uses the sort key in a join or in the predicate.
When we use UPDATE schema , REDSHIFT we delete and insert new record.
The rows are marked for deletion, but not removed.
When you use SELECT RedShift will scan record delete and undelete therefore performance of query will reduce.
a. VACUUM to remove unnecessary records were deleted.
b. VACUUM to RE-INDEX of records
c. VACUUM to SORT recrods
All of them to increase performance when query table.
2. What is ANALYZE?
The ANALYZE command obtains a sample of rows from the table, does some calculations, and saves resulting column statistics. By default, Amazon Redshift runs a sample pass for the DISTKEY column and another sample pass for all of the other columns in the table. If you want to generate statistics for a subset of columns, you can specify a comma-separated column list.
Run the ANALYZE command on the tables in the query to make sure statistics are up to date. Running ANALYZE helps if any of the tables in the query have recently changed a lot in size. If running a full ANALYZE command will take too long, run ANALYZE on a single column to reduce processing time. This approach will still update the table size statistics; table size is a significant factor in query planning.
STATUPDATE ON : When COPY command , by default the COPY command performs an analysis after it loads data into an empty table.
Using above systax mean analyze the columns that are frequently used in the following: Sorting and grouping operations,Joins,Query predicates