Sql Basics Catchup | Vijay's Blog

Learn Architecture of SQL - Link

credit - https://towardsdatascience.com/sql-query-cheatsheet-for-postgres-96eba6e9e419

First, we’ll create a database and insert some data, then we’ll execute every SQL query on that data and investigate the output. Contents:

Setup
Selects and Counts
Limit, Offset and Order By
Joins
Intersect, Union and Except
Aliasing
Aggregating Data
Modifying Selected Values
Where Clauses

Setup

Create a few related tables with different data types.

Constraints

The following constraints are commonly used in SQL:

• NOT NULL - Ensures that a column cannot have a NULL value

• UNIQUE - Ensures that all values in a column are different

• PRIMARY KEY - A combination of a NOT NULL and UNIQUE. Uniquely identifies each row in a table

• FOREIGN KEY - Prevents actions that would destroy links between tables

• CHECK - Ensures that the values in a column satisfies a specific condition

• DEFAULT - Sets a default value for a column if no value is specified

• CREATE INDEX - Used to create and retrieve data from the database very quickly

-- users whose information the company has
create table users (
 id serial primary key,
 first_name varchar (50),
 location varchar (50),
 created_at TIMESTAMP
);
-- users who are on the company mailing list
create table mailing_lists (
 id serial primary key,
 first_name varchar (50),
 email varchar (50),
 created_at TIMESTAMP
);
-- products the company sells
create table products (
 id serial primary key,
 name varchar(50),
 manufacturing_cost int,
 data jsonb,
 created_at TIMESTAMP
)
-- sales transactions of products by users
create table sales (
 id serial primary key,
 user_id int,
 product_id int,
 sale_price int,
 created_at TIMESTAMP
);

-- EXAMPLE Constraints
CREATE TABLE Orders (
    OrderID int NOT NULL,
    OrderNumber int NOT NULL,
    PersonID int,
    PRIMARY KEY (OrderID),
    FOREIGN KEY (PersonID) REFERENCES Persons(PersonID)
);

CREATE TABLE Persons (
    id SERIAL PRIMARY KEY,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255),
    Age int,
    first_name VARCHAR (50),
    last_name VARCHAR (50),
    birth_date DATE CHECK (birth_date > '1900-01-01'),
    joined_date DATE CHECK (joined_date > birth_date),
    salary numeric CHECK(salary > 0),
    City varchar(255) DEFAULT 'Sandnes'
);

Insert Data

insert into users (first_name, location, created_at)
values
 ('Liam', 'Toronto', '2010-01-01'),
 ('Ava', 'New York', '2011-01-01'),
 ('Emma', 'London', '2012-01-01'),
 ('Noah', 'Singapore', '2012-01-01'),
 ('William', 'Tokyo', '2014-01-01'),
 ('Oliver', 'Beijing', '2015-01-01'),
 ('Olivia', 'Moscow', '2014-01-01'),
 ('Mia', 'Toronto', '2015-01-01');

insert into mailing_lists (first_name, email, created_at)
values
 ('Liam', 'liam@fake.com', '2010-01-01'),
 ('Ava', 'ava@fake.com', '2011-01-01');

insert into products (name, manufacturing_cost, data, created_at)
values
 ('laptop', 500, '{"in_stock":1}', '2010-01-01'),
 ('smart phone', 200, '{"in_stock":10}', '2010-01-01'),
 ('TV', 1000, '{}', '2010-01-01');

insert into sales (user_id, product_id, sale_price, created_at)
values
 (1, 1, 900, '2015-01-01'),
 (1, 2, 450, '2016-01-01'),
 (1, 3, 2500, '2017-01-01'),
 (2, 1, 800, '2017-01-01'),
 (2, 2, 600, '2017-01-01'),
 (3, 3, 2500, '2018-01-01'),
 (4, 3, 2400, '2018-01-01'),
 (null, 3, 2500, '2018-01-01');

Selects and Counts

Select

This is the basic query around which everything later will be based.
Get all sales data without filtering or manipulating it. Simple.

select * from sales;

select first_name, location
from users;

Distinct

select distinct user_id from sales;

-- cannot use distinct and order by together
select distinct order_name, order_id  from orders order by order_id desc;
-- work around group by (distinct_col) and max(order_col)

Count

select count(*) from products;

select count(*) from (
  select * from products
  left join sales on sales.product_id = products.id
) subquery;

Limit, Offset and Order By

select * from sales
order by user_id asc
limit 3 offset 0;

select * from sales
order by user_id asc --desc
limit 3 offset 3;

Joins

A NATURAL JOIN is a JOIN operation that creates an implicit join clause for you based on the common columns in the two tables being joined. Common columns are columns that have the same name in both tables. A NATURAL JOIN can be an INNER join, a LEFT OUTER join, or a RIGHT OUTER join. The default is INNER join.

-- cross join - cartesian product
select * from users
cross join sales
where sales.user_id = users.id;

-- self join  (same table join) example - team match fixture
select * from users as u1, users as u2
where u2.user_id < u1.user_id;

-- inner - Matches both
select * from users
inner join sales
on sales.user_id = users.id;

-- left - All left and matched right records based on column
select * from users
left join sales
on sales.user_id = users.id;

-- right - All right and matched left records based on column
select * from users
left join sales
on sales.user_id = users.id;

-- full outer - Both table data no matter matched or not
select * from users
full outer join sales
on sales.user_id = users.id;

Intersect, Union and Except

-- intersect - Only common set items matched on both column or table
select first_name from users
intersect
select first_name from mailing_lists;

-- Union - Set(A) U Set (B) all items and duplicates are not allowed
select first_name from users
union
select first_name from mailing_lists;

-- Union All - Set(A) U Set (B) all items and duplicates are allowed
select first_name from users
union all
select first_name from mailing_lists;

-- Except - Set(A) - Set (B). All Unique items in set(A) but no items common between A & B
select first_name from users
except
select first_name from mailing_lists;

Aggregating Data

Grouping and aggregating data is a pretty powerful feature. Postgres provides the standard functions like: sum(), avg(), min(), max() and count().

Group By

select
  products.name,
  sum(sale_price),
  avg(sale_price),
  min(sale_price),
  max(sale_price),
  count(sales.id)
from sales
left join products on products.id = sales.product_id
group by products.name
having count(sales.id) > 2;

String_agg

Can also use _agg functions (like string_agg) in combination with group by to build a comma delimited string of people who bought each product.

select
 products.name,
 string_agg(users.first_name, ‘, ‘)
from products
left join sales on sales.product_id = products.id
left join users on users.id = sales.user_id
group by products.name;

Modifying Selected Values

Casting

select
  name,
  manufacturing_cost / 3 cost_int,
  manufacturing_cost::decimal / 3 as cost2_dec,
  '2020–01–01'::text,
  '2020–01–01'::date
from products;

Round

select
  name,
  round( manufacturing_cost::decimal / 3, 2 )
from products;

case

select
  id,
  case
    when user_id is null then 100
    else user_id
  end
from sales;

Coalesce

Coalesce allows returning the value from a different column if the first column’s value is null.

select
  id,
  coalesce(user_id, product_id)
from sales;

SELECT COALESCE(NULL,NULL,'ByteScout',NULL,'Byte'); -- Output ByteScout

Concat

select
 concat(first_name, ‘ ‘, location)
from users

Upper and lower

Changes the case of a string. If this needs to be done somewhere in your data processing pipeline, doing it at the SQL level is significantly faster than at the python/app level.

select
  upper(first_name),
  lower(first_name)
from users;

Date casting

select
	to_char(current_date , 'YYYY')::int,
	to_char(current_date, 'MM')::int,
	to_char(current_date, 'DD')::int,
	to_char(current_date, 'day')::varchar,
	to_char(current_date + '1 day'::interval, 'YYYY-MM-DD')::date;

select to_char(current_date - interval '6 months', 'YYYY-MM-DD'):: date;
select to_char(current_date - interval '30 days', 'YYYY-MM-DD'):: date;

select
	date::date
from
	generate_series(
	'2022-01-24'::date, --START date
	'2022-07-26'::date, -- END date
	'15 days'::interval) date
/*
2022-01-26
2022-02-10
2022-02-25
2022-03-12
2022-03-27
2022-04-11
2022-04-26
2022-05-11
2022-05-26
2022-06-10
2022-06-25
2022-07-10
2022-07-25
*/

Where Clauses

Operators

We can use all the equality operators you’d expect in a where clause: =, <>, !=, <, <=, >=, >

select * from users where first_name = 'Liam'
select * from users where first_name != 'Liam'
select * from users where id >= 5

And, Or, Not

Chain multiple where clauses together with and, or and not. But notice we only write the where word once.

select * from users where first_name = 'Liam' or first_name = 'Olivia';
select * from users where first_name = 'Liam' and id = 5;
select * from users where first_name = 'Liam' and not id = 5;

In

Rather than chaining clauses with or, or, or… you can find records where a value exists in a given array.

select * from users where first_name in ('Liam', 'Olivia');

Null

We can also load records where a value is (or is not) null.

select * from sales where user_id is null;
select * from sales where user_id is not null;

Fuzzy matching

Sometimes we want to find values that roughly match a query. For this, we can search on partial strings or ignore capitalization.

select * from users where first_name like '%ia%';
-- ignore case
select * from users where first_name ilike ‘%IA%’;

'abc' LIKE 'abc'    true
'abc' LIKE 'a%'     true
'abc' LIKE '_b_'    true
'abc' LIKE 'c'      false

The LIKE expression returns true if the string matches the supplied pattern.
An underscore (_) in pattern stands for (matches) any single character;
a percent sign (%) matches any sequence of zero or more characters.

Where in subqueries

select first_name
from ( select * from users where id > 5 ) subquery;

With

with cte as (
  select * from users where id > 5
)
select
  first_name
from cte

Date filtering

select * from sales where created_at > '2016–01–01';

select
  *
from sales
where created_at between '2016–01–01' and '2017–01–01';

JSON(B)s

Postgres has some pretty awesome functionality for working with JSON.

select * from products where data -> 'in_stock' is not null;
select * from products where (data -> 'in_stock')::int > 5;
select name, data -> 'in_stock' as stock from products;

Lag

Get a record in the table and attach the record immediately before (above) it.

select
 first_name as joined_user,
 lag(first_name) over (order by created_at) as prev_joined_user
from users;

Lead

The opposite of above. Load the user that joined immediately after (down, below) each other user.

select
 first_name as joined_user,
 lead(first_name) over (order by created_at) as next_joined_user
from users;

-- After two records
select
 first_name as joined_user,
 lead(first_name, 2) over (order by created_at) as next_joined_user
from users;

dense_rank() vs rank()

RANK skips the number of positions after records with the same rank number.
RANK_DENSE returns position numbers it doesn’t skip records with the same rank number

select sale_price,
rank() over (order by sale_price) as rank_func,
dense_rank() over (order by sale_price) as dense_rank_func
from sales;

Helpful

credit - https://learnsql.com/

credit - http://www.postgresqltutorial.com

Check !