top of page

Day 4 of 30

  • bribrown11
  • Jul 30, 2022
  • 2 min read

Familiarizing myself better with Google's BigQuery


Today I wanted to practice more with BigQuery. I searched through Google's public databases, and I was surprised by the complexity of the data. Being from Texas with our usual hurricane season, I was looking through the NOAA Hurricane dataset. It's amazingly detailed, and domain knowledge of weather and geography is a must. I quickly moved along looking for something a little bit more manageable and straightforward. I stumbled across a dataset for baseball from 2016. Perfect! I love baseball!


So I queried the information that I found interesting and exported the data.

SELECT seasonType, year, attendance, dayNight, duration, awayTeamName, homeTeamName, venueName, venueSurface, homeFinalRuns, homeFinalHits, homeFinalRunsForInning, homeFinalErrors, awayFinalRuns, awayFinalHits, awayFinalRunsForInning, awayFinalErrors

FROM `bigquery-public-data.baseball.games_wide`


I used Google Sheets to save the new dataset then imported the new data set back into my personal project dataset in BigQuery to do more analysis, cleaning and exploration.


There were numerous rows for a single game surely with other information that I had already cut out. So I queried out the duplicate rows, while checking to see if the correct information remained.

SELECT distinct startTime, seasonType, year, attendance, dayNight, duration, awayTeamName, homeTeamName, venueName, VenueSurface, homeFinalHits, homeFinalRuns, homeFinalErrors, awayFinalHits, awayFinalRuns, awayFinalErrors

FROM `named-embassy-340811.Baseball2016.Baseball`

ORDER BY startTime


I continued to check for errors in the data or repeated information.

I have found that there are 31 away team names and 31 home team names so that is good.

SELECT distinct awayTeamName FROM `named-embassy-340811.Baseball2016.baseballstats` ORDER BY 1


I will continue to use this dataset to visualize in Tableau.


Closing thoughts for today

I am so happy with what I am learning on my own. I feel like there are things that seemed so easy in the Google Data Analytics Professional Certificate program, but when I do it alone it gets more complicated. It's so easy to get frustrated when things don't immediately work, but it's so satisfying to finally reach an end product. I may have spent too much time looking at datasets and trying to format them to work with BigQuery but I found a dataset that's so interesting and it's going to make a very informative and interesting visualization.

 
 
 

Comments


Send Me a Message &
I'll Send One Back

  • LinkedIn

Thanks for submitting!

© 2022 by Brittany Brown. Proudly created with Wix.com

bottom of page