Day 2 of 30: AlextheAnalyst SQL Data Exploration
- bribrown11
- Jul 28, 2022
- 3 min read
Updated: Jul 30, 2022
Today is Day 2 of my 30-day data challenge. It was a bit more frustrating than I thought it would be. I had so much trouble trying to import the CSV file into BigQuery that I finally tried using bit.io which uses PostgreSQL. PostgreSQL is a little bit different from MySQL, Microsoft SQL Server and Oracle.
Here is the technical information for everything I used today.
Data source: from ourworldindata.org/covid-deaths in xlsx format
SQL platform: bit.io - online platform
Guide: AlextheAnalyst video
Excel: I split the dataset into two separate tables: covid_deaths, covid_vaccinations.
Here are the SQL queries that I ran today.
- - Looking at selected information
SELECT location, date, total_cases, new_cases, total_deaths, population
FROM "covid_deaths"
ORDER BY 1,2
- - Looking at Total Cases vs Total Deaths as a percentage
- - Shows the likelihood of dying from COVID
- - Note: must cast as numeric for division to work in bit.io
SELECT location, date, total_cases, total_deaths,
((total_deaths::numeric)/(total_cases::numeric))*100 as deathpercent
FROM "covid_deaths"
ORDER BY 1,2;
- - Looking at Total Cases vs. Total Deaths in the United States
- - Note: WHERE clause is case sensitive in PostgreSQL
SELECT location, date, total_cases, total_deaths,
((total_deaths::numeric)/(total_cases::numeric))*100 as deathpercent
FROM "covid_deaths"
WHERE location like '%States%'
ORDER BY 1,2;
- - Looking at the most recent numbers in the US of Total Cases vs. Total Deaths
- - Screenshot below
SELECT location, date, total_cases, total_deaths,
((total_deaths::numeric)/(total_cases::numeric))*100 as deathpercent
FROM "covid_deaths"
WHERE location like '%States%'
ORDER BY 2 desc;

- - Looking at Total Cases vs. Population in the US.
- - Shows what percentage of population got COVID.
SELECT location, date, total_cases, population,
((total_cases::numeric)/(population::numeric))*100 as infectionrate
FROM "covid_deaths"
WHERE location like '%States%'
ORDER BY 1,2;
- - Order by most recent
SELECT location, date, total_cases, population,
((total_cases::numeric)/(population::numeric))*100 as infectionrate
FROM "covid_deaths"
WHERE location like '%States%'
ORDER BY 2 desc;
- - Looking at countries with highest infection rate compared to population
- - Screenshot below
SELECT location, population, max(total_cases) as highestinfectioncount,
max(((total_cases::numeric)/(population::numeric)))*100 as infectionrate
FROM "covid_deaths"
GROUP BY location, population
ORDER BY 4 desc;

- - Showing countries with the highest death count per population
- - Screenshot below
SELECT location, max(total_deaths) as TotalDeathCount
FROM "covid_deaths"
WHERE continent is not null
GROUP BY location
ORDER BY 2 desc;

- - Let’s look at information according to continent
- - Showing the continents with the highest death count per population
SELECT location, MAX(total_deaths) as TotalDeathCount
FROM "covid_deaths"
WHERE continent is null
GROUP BY location
Order by 2 desc;
- - Looking at global numbers
- - Death Percentage worldwide, not according to location or continent
SELECT date, SUM(new_cases) as TotalNewCases, SUM(new_deaths) as TotalNewDeaths,
((SUM(new_deaths)::numeric)/(SUM(new_cases)::numeric))*100 as deathpercent
FROM "covid_deaths"
WHERE new_deaths > 0
GROUP BY date
ORDER BY 1;
- - Basic join on table covid_deaths and covid_vaccinations
SELECT *
FROM "covid_deaths" dea
JOIN "covid_vaccinations" vac
ON dea.location = vac.location
and dea.date = vac.date;
- - Looking at Total Population vs. Vaccinations
SELECT dea.continent, dea.location, dea.date, population, vac.new_vaccinations
FROM "covid_deaths" dea
JOIN "covid_vaccinations" vac
ON dea.location = vac.location
and dea.date = vac.date
WHERE dea.continent is not null
and vac.continent is not null
ORDER BY 1,2,3;
- - Showing Rolling Count of Total Vaccinated
SELECT dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations,
SUM(new_vaccinations::integer) OVER (Partition By dea.location ORDER BY dea.location, dea.date) as RollingCountVaccinated
FROM "covid_deaths" dea
JOIN "covid_vaccinations" vac
ON dea.location = vac.location
and dea.date = vac.date
WHERE dea.continent is not null
and vac.continent is not null
ORDER BY 2,3;
- - Use Temp Table
DROP TABLE if exists "rollingcountvaccinated"
CREATE TABLE "rollingcountvaccinated"
(
Continent varchar(255),
Location varchar(255),
Date text,
Population numeric,
New_vaccinations numeric,
RollingCountVaccinated numeric
)
INSERT INTO "rollingcountvaccinated"
(
SELECT dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations,
SUM(new_vaccinations::integer) OVER (Partition By dea.location ORDER BY dea.location, dea.date) as RollingCountVaccinated
FROM "covid_deaths" dea
JOIN "covid_vaccinations" vac
ON dea.location = vac.location
and dea.date = vac.date
WHERE dea.continent is not null
and vac.continent is not null
)
note: There was an error with the new table. It would run the query correctly but not show any data in the columns and rows when I selected it.
Final thoughts:
I am very proud of my flexibility to use another SQL interface and still be able to follow along with Alex Freberg. If I wasn't able to write the query myself I looked online to find more information. I still feel like I'm a total beginner at data analytics, but I know six months ago I wasn't able to do what I did today. I'm learning a lot and loving this experience.
Recent Posts
See AllFamiliarizing myself better with Google's BigQuery Today I wanted to practice more with BigQuery. I searched through Google's public...
Comments