Learn Architecture of SQL - Link
credit - https://towardsdatascience.com/sql-query-cheatsheet-for-postgres-96eba6e9e419
First, we’ll create a database and insert some data, then we’ll execute every SQL query on that data and investigate the output. Contents:
- Setup
- Selects and Counts
- Limit, Offset and Order By
- Joins
- Intersect, Union and Except
- Aliasing
- Aggregating Data
- Modifying Selected Values
- Where Clauses
Setup
Create a few related tables with different data types.
Constraints
The following constraints are commonly used in SQL:
• NOT NULL - Ensures that a column cannot have a NULL value
• UNIQUE - Ensures that all values in a column are different
• PRIMARY KEY - A combination of a NOT NULL and UNIQUE. Uniquely identifies each row in a table
• FOREIGN KEY - Prevents actions that would destroy links between tables
• CHECK - Ensures that the values in a column satisfies a specific condition
• DEFAULT - Sets a default value for a column if no value is specified
• CREATE INDEX - Used to create and retrieve data from the database very quickly
-- users whose information the company hascreate table users ( id serial primary key, first_name varchar (50), location varchar (50), created_at TIMESTAMP);-- users who are on the company mailing listcreate table mailing_lists ( id serial primary key, first_name varchar (50), email varchar (50), created_at TIMESTAMP);-- products the company sellscreate table products ( id serial primary key, name varchar(50), manufacturing_cost int, data jsonb, created_at TIMESTAMP)-- sales transactions of products by userscreate table sales ( id serial primary key, user_id int, product_id int, sale_price int, created_at TIMESTAMP);
-- EXAMPLE ConstraintsCREATE TABLE Orders ( OrderID int NOT NULL, OrderNumber int NOT NULL, PersonID int, PRIMARY KEY (OrderID), FOREIGN KEY (PersonID) REFERENCES Persons(PersonID));
CREATE TABLE Persons ( id SERIAL PRIMARY KEY, LastName varchar(255) NOT NULL, FirstName varchar(255), Age int, first_name VARCHAR (50), last_name VARCHAR (50), birth_date DATE CHECK (birth_date > '1900-01-01'), joined_date DATE CHECK (joined_date > birth_date), salary numeric CHECK(salary > 0), City varchar(255) DEFAULT 'Sandnes');
Insert Data
insert into users (first_name, location, created_at)values ('Liam', 'Toronto', '2010-01-01'), ('Ava', 'New York', '2011-01-01'), ('Emma', 'London', '2012-01-01'), ('Noah', 'Singapore', '2012-01-01'), ('William', 'Tokyo', '2014-01-01'), ('Oliver', 'Beijing', '2015-01-01'), ('Olivia', 'Moscow', '2014-01-01'), ('Mia', 'Toronto', '2015-01-01');
insert into mailing_lists (first_name, email, created_at)values ('Liam', 'liam@fake.com', '2010-01-01'), ('Ava', 'ava@fake.com', '2011-01-01');
insert into products (name, manufacturing_cost, data, created_at)values ('laptop', 500, '{"in_stock":1}', '2010-01-01'), ('smart phone', 200, '{"in_stock":10}', '2010-01-01'), ('TV', 1000, '{}', '2010-01-01');
insert into sales (user_id, product_id, sale_price, created_at)values (1, 1, 900, '2015-01-01'), (1, 2, 450, '2016-01-01'), (1, 3, 2500, '2017-01-01'), (2, 1, 800, '2017-01-01'), (2, 2, 600, '2017-01-01'), (3, 3, 2500, '2018-01-01'), (4, 3, 2400, '2018-01-01'), (null, 3, 2500, '2018-01-01');
Selects and Counts
Select
This is the basic query around which everything later will be based.
Get all sales data without filtering or manipulating it. Simple.
select * from sales;
select first_name, locationfrom users;
Distinct
select distinct user_id from sales;
-- cannot use distinct and order by togetherselect distinct order_name, order_id from orders order by order_id desc;-- work around group by (distinct_col) and max(order_col)
Count
select count(*) from products;
select count(*) from ( select * from products left join sales on sales.product_id = products.id) subquery;
Limit, Offset and Order By
select * from salesorder by user_id asclimit 3 offset 0;
select * from salesorder by user_id asc --desclimit 3 offset 3;
Joins
A NATURAL JOIN is a JOIN operation that creates an implicit join clause for you based on the common columns in the two tables being joined. Common columns are columns that have the same name in both tables. A NATURAL JOIN can be an INNER join, a LEFT OUTER join, or a RIGHT OUTER join. The default is INNER join.
-- cross join - cartesian productselect * from userscross join saleswhere sales.user_id = users.id;
-- self join (same table join) example - team match fixtureselect * from users as u1, users as u2where u2.user_id < u1.user_id;
-- inner - Matches bothselect * from usersinner join saleson sales.user_id = users.id;
-- left - All left and matched right records based on columnselect * from usersleft join saleson sales.user_id = users.id;
-- right - All right and matched left records based on columnselect * from usersleft join saleson sales.user_id = users.id;
-- full outer - Both table data no matter matched or notselect * from usersfull outer join saleson sales.user_id = users.id;
Intersect, Union and Except
-- intersect - Only common set items matched on both column or tableselect first_name from usersintersectselect first_name from mailing_lists;
-- Union - Set(A) U Set (B) all items and duplicates are not allowedselect first_name from usersunionselect first_name from mailing_lists;
-- Union All - Set(A) U Set (B) all items and duplicates are allowedselect first_name from usersunion allselect first_name from mailing_lists;
-- Except - Set(A) - Set (B). All Unique items in set(A) but no items common between A & Bselect first_name from usersexceptselect first_name from mailing_lists;
Aggregating Data
Grouping and aggregating data is a pretty powerful feature. Postgres provides the standard functions like: sum(), avg(), min(), max() and count().
Group By
select products.name, sum(sale_price), avg(sale_price), min(sale_price), max(sale_price), count(sales.id)from salesleft join products on products.id = sales.product_idgroup by products.namehaving count(sales.id) > 2;
String_agg
Can also use _agg functions (like string_agg) in combination with group by to build a comma delimited string of people who bought each product.
select products.name, string_agg(users.first_name, ‘, ‘)from productsleft join sales on sales.product_id = products.idleft join users on users.id = sales.user_idgroup by products.name;
Modifying Selected Values
Casting
select name, manufacturing_cost / 3 cost_int, manufacturing_cost::decimal / 3 as cost2_dec, '2020–01–01'::text, '2020–01–01'::datefrom products;
Round
select name, round( manufacturing_cost::decimal / 3, 2 )from products;
case
select id, case when user_id is null then 100 else user_id endfrom sales;
Coalesce
Coalesce allows returning the value from a different column if the first column’s value is null.
select id, coalesce(user_id, product_id)from sales;
SELECT COALESCE(NULL,NULL,'ByteScout',NULL,'Byte'); -- Output ByteScout
Concat
select concat(first_name, ‘ ‘, location)from users
Upper and lower
Changes the case of a string. If this needs to be done somewhere in your data processing pipeline, doing it at the SQL level is significantly faster than at the python/app level.
select upper(first_name), lower(first_name)from users;
Date casting
select to_char(current_date , 'YYYY')::int, to_char(current_date, 'MM')::int, to_char(current_date, 'DD')::int, to_char(current_date, 'day')::varchar, to_char(current_date + '1 day'::interval, 'YYYY-MM-DD')::date;
select to_char(current_date - interval '6 months', 'YYYY-MM-DD'):: date;select to_char(current_date - interval '30 days', 'YYYY-MM-DD'):: date;
select date::datefrom generate_series( '2022-01-24'::date, --START date '2022-07-26'::date, -- END date '15 days'::interval) date/*2022-01-262022-02-102022-02-252022-03-122022-03-272022-04-112022-04-262022-05-112022-05-262022-06-102022-06-252022-07-102022-07-25*/
Where Clauses
Operators
We can use all the equality operators you’d expect in a where clause: =, <>, !=, <, <=, >=, >
select * from users where first_name = 'Liam'select * from users where first_name != 'Liam'select * from users where id >= 5
And, Or, Not
Chain multiple where clauses together with and, or and not. But notice we only write the where word once.
select * from users where first_name = 'Liam' or first_name = 'Olivia';select * from users where first_name = 'Liam' and id = 5;select * from users where first_name = 'Liam' and not id = 5;
In
Rather than chaining clauses with or, or, or… you can find records where a value exists in a given array.
select * from users where first_name in ('Liam', 'Olivia');
Null
We can also load records where a value is (or is not) null.
select * from sales where user_id is null;select * from sales where user_id is not null;
Fuzzy matching
Sometimes we want to find values that roughly match a query. For this, we can search on partial strings or ignore capitalization.
select * from users where first_name like '%ia%';-- ignore caseselect * from users where first_name ilike ‘%IA%’;
'abc' LIKE 'abc' true'abc' LIKE 'a%' true'abc' LIKE '_b_' true'abc' LIKE 'c' false
The LIKE expression returns true if the string matches the supplied pattern.
An underscore (_
) in pattern stands for (matches) any single character;
a percent sign (%
) matches any sequence of zero or more characters.
Where in subqueries
select first_namefrom ( select * from users where id > 5 ) subquery;
With
with cte as ( select * from users where id > 5)select first_namefrom cte
Date filtering
select * from sales where created_at > '2016–01–01';
select *from saleswhere created_at between '2016–01–01' and '2017–01–01';
JSON(B)s
Postgres has some pretty awesome functionality for working with JSON.
select * from products where data -> 'in_stock' is not null;select * from products where (data -> 'in_stock')::int > 5;select name, data -> 'in_stock' as stock from products;
Lag
Get a record in the table and attach the record immediately before (above) it.
select first_name as joined_user, lag(first_name) over (order by created_at) as prev_joined_userfrom users;
Lead
The opposite of above. Load the user that joined immediately after (down, below) each other user.
select first_name as joined_user, lead(first_name) over (order by created_at) as next_joined_userfrom users;
-- After two recordsselect first_name as joined_user, lead(first_name, 2) over (order by created_at) as next_joined_userfrom users;
dense_rank() vs rank()
RANK
skips the number of positions after records with the same rank number.
RANK_DENSE
returns position numbers it doesn’t skip records with the same rank number
select sale_price,rank() over (order by sale_price) as rank_func,dense_rank() over (order by sale_price) as dense_rank_funcfrom sales;
Helpful
credit - https://learnsql.com/
credit - http://www.postgresqltutorial.com