top of page

Day 2 of 30: AlextheAnalyst SQL Data Exploration

  • bribrown11
  • Jul 28, 2022
  • 3 min read

Updated: Jul 30, 2022

Today is Day 2 of my 30-day data challenge. It was a bit more frustrating than I thought it would be. I had so much trouble trying to import the CSV file into BigQuery that I finally tried using bit.io which uses PostgreSQL. PostgreSQL is a little bit different from MySQL, Microsoft SQL Server and Oracle.


Here is the technical information for everything I used today.


Data source: from ourworldindata.org/covid-deaths in xlsx format

SQL platform: bit.io - online platform

Guide: AlextheAnalyst video

Excel: I split the dataset into two separate tables: covid_deaths, covid_vaccinations.


Here are the SQL queries that I ran today.


- - Looking at selected information

SELECT location, date, total_cases, new_cases, total_deaths, population

FROM "covid_deaths"

ORDER BY 1,2


- - Looking at Total Cases vs Total Deaths as a percentage

- - Shows the likelihood of dying from COVID

- - Note: must cast as numeric for division to work in bit.io

SELECT location, date, total_cases, total_deaths,

((total_deaths::numeric)/(total_cases::numeric))*100 as deathpercent

FROM "covid_deaths"

ORDER BY 1,2;


- - Looking at Total Cases vs. Total Deaths in the United States

- - Note: WHERE clause is case sensitive in PostgreSQL

SELECT location, date, total_cases, total_deaths,

((total_deaths::numeric)/(total_cases::numeric))*100 as deathpercent

FROM "covid_deaths"

WHERE location like '%States%'

ORDER BY 1,2;


- - Looking at the most recent numbers in the US of Total Cases vs. Total Deaths

- - Screenshot below

SELECT location, date, total_cases, total_deaths,

((total_deaths::numeric)/(total_cases::numeric))*100 as deathpercent

FROM "covid_deaths"

WHERE location like '%States%'

ORDER BY 2 desc;

ree

- - Looking at Total Cases vs. Population in the US.

- - Shows what percentage of population got COVID.

SELECT location, date, total_cases, population,

((total_cases::numeric)/(population::numeric))*100 as infectionrate

FROM "covid_deaths"

WHERE location like '%States%'

ORDER BY 1,2;


- - Order by most recent

SELECT location, date, total_cases, population,

((total_cases::numeric)/(population::numeric))*100 as infectionrate

FROM "covid_deaths"

WHERE location like '%States%'

ORDER BY 2 desc;


- - Looking at countries with highest infection rate compared to population

- - Screenshot below

SELECT location, population, max(total_cases) as highestinfectioncount,

max(((total_cases::numeric)/(population::numeric)))*100 as infectionrate

FROM "covid_deaths"

GROUP BY location, population

ORDER BY 4 desc;

ree

- - Showing countries with the highest death count per population

- - Screenshot below

SELECT location, max(total_deaths) as TotalDeathCount

FROM "covid_deaths"

WHERE continent is not null

GROUP BY location

ORDER BY 2 desc;

ree

- - Let’s look at information according to continent

- - Showing the continents with the highest death count per population

SELECT location, MAX(total_deaths) as TotalDeathCount

FROM "covid_deaths"

WHERE continent is null

GROUP BY location

Order by 2 desc;


- - Looking at global numbers

- - Death Percentage worldwide, not according to location or continent

SELECT date, SUM(new_cases) as TotalNewCases, SUM(new_deaths) as TotalNewDeaths,

((SUM(new_deaths)::numeric)/(SUM(new_cases)::numeric))*100 as deathpercent

FROM "covid_deaths"

WHERE new_deaths > 0

GROUP BY date

ORDER BY 1;


- - Basic join on table covid_deaths and covid_vaccinations

SELECT *

FROM "covid_deaths" dea

JOIN "covid_vaccinations" vac

ON dea.location = vac.location

and dea.date = vac.date;


- - Looking at Total Population vs. Vaccinations

SELECT dea.continent, dea.location, dea.date, population, vac.new_vaccinations

FROM "covid_deaths" dea

JOIN "covid_vaccinations" vac

ON dea.location = vac.location

and dea.date = vac.date

WHERE dea.continent is not null

and vac.continent is not null

ORDER BY 1,2,3;


- - Showing Rolling Count of Total Vaccinated

SELECT dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations,

SUM(new_vaccinations::integer) OVER (Partition By dea.location ORDER BY dea.location, dea.date) as RollingCountVaccinated

FROM "covid_deaths" dea

JOIN "covid_vaccinations" vac

ON dea.location = vac.location

and dea.date = vac.date

WHERE dea.continent is not null

and vac.continent is not null

ORDER BY 2,3;


- - Use Temp Table

DROP TABLE if exists "rollingcountvaccinated"

CREATE TABLE "rollingcountvaccinated"

(

Continent varchar(255),

Location varchar(255),

Date text,

Population numeric,

New_vaccinations numeric,

RollingCountVaccinated numeric

)


INSERT INTO "rollingcountvaccinated"

(

SELECT dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations,

SUM(new_vaccinations::integer) OVER (Partition By dea.location ORDER BY dea.location, dea.date) as RollingCountVaccinated

FROM "covid_deaths" dea

JOIN "covid_vaccinations" vac

ON dea.location = vac.location

and dea.date = vac.date

WHERE dea.continent is not null

and vac.continent is not null

)

note: There was an error with the new table. It would run the query correctly but not show any data in the columns and rows when I selected it.


Final thoughts:

I am very proud of my flexibility to use another SQL interface and still be able to follow along with Alex Freberg. If I wasn't able to write the query myself I looked online to find more information. I still feel like I'm a total beginner at data analytics, but I know six months ago I wasn't able to do what I did today. I'm learning a lot and loving this experience.

 
 
 

Recent Posts

See All
Day 4 of 30

Familiarizing myself better with Google's BigQuery Today I wanted to practice more with BigQuery. I searched through Google's public...

 
 
 

Comments


Send Me a Message &
I'll Send One Back

  • LinkedIn

Thanks for submitting!

© 2022 by Brittany Brown. Proudly created with Wix.com

bottom of page