This is the ultimate GLOBAL data set for COVID-19. It provides population, deaths, land area, ISO codes, confirmed cases, deaths, ICU beds, workers, workers that take public transit, new and cumulative cases/deaths/recovered, day numbers (to compare on set of outbreak or quarantine dates), and more ..at the county level!
Go to the bottom of the post for the download links!
I have been using John Hopkins (JH) data to track COVID-19 for quite awhile now. JH has done a great job collecting all this data. However, the ability to tie other data into it has been challenging and the schema has frequently changed. Again, I’m grateful for their work.
As you may know, I have been maintaining a consistent schema for a few months now and sharing that on my blog. However, I found myself manually combining and analyzing other data with JH’s data. So, this ultimate data set combines the hard work of John Hopkins with other useful data (e.g., population, land area, quarantine dates, outbreak dates, ICU beds, and more).
I have created an ultimate data set for COVID-19, that has the following:
- Global Confirmed Cases, Deaths, Recovered (John Hopkins)
- Temporarily removing Canada recovered cases for another reason
- Source: https://github.com/CSSEGISandData/COVID-19
- US Confirmed Cases, Deaths (John Hopkins)
- Note: US Recovered is currently omitted from the JH data set
- ISO Alpha 2, Alpha 3, Numeric
- Population (total and Urban), Deaths (non-COVID related), Median age, and Land area [provided for all reported Global regions and US counties (only population, deaths, land area), for super detailed analysis]; also allows for population density calculations
- Global Source (population, median age, land area): https://www.worldometers.info/world-population/population-by-country/
- Australia provinces source: https://www.ga.gov.au/scientific-topics/national-location-information/dimensions/area-of-australia-states-and-territories
- Canada Province Source: https://en.wikipedia.org/wiki/Provinces_and_territories_of_Canada
- China province source: https://en.wikipedia.org/wiki/Provinces_of_China#cite_note-7 and http://data.stats.gov.cn/english/easyquery.htm?cn=E0103
- US Source for Population/Deaths (county level): https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/
- US Source for land area (square miles): https://www.ers.usda.gov/data-products/rural-urban-commuting-area-codes/
- US Only (detailed to county level):
- State Abbreviation
- FIPS Code
- Total Workers, Workers from home, Total Workers that take Public Transit (US Census Data)
- Population 60 and over, ICU Beds – county level detail (KHN)
- Source: https://khn.org/news/as-coronavirus-spreads-widely-millions-of-older-americans-live-in-counties-with-no-icu-beds/#lookup (click on “Get the Data”)
- Political affiliation (e.g., Republican, Democrat)
- Only available in the file that has dates on each row:
- Day Numbers and Dates that events occurred [These lets you compare regions side by side. This is available for all levels (Country, Region, County)]
- First 100, 500, 1000 cases
- First 10, 50, 100 deaths
- Quarantine dates
- New Cases, New Deaths, New Recovered (from previous day)
- Day Numbers and Dates that events occurred [These lets you compare regions side by side. This is available for all levels (Country, Region, County)]
I was considering adding the following:
- US state-level test data (positive, negative, total)
This file is in beta at the moment and may change. Please provide feedback.
Download here (refreshed with 8/9/2021 data):
Note: Starting in June 2020, the “series2” file will only include a trailing 3 months due to an error I was getting, likely due to the size of the file and my computer’s ability to process. All other files have all dates available. If you need those dates, let me know and I’ll add a separate file that you can concatenate.
Dates are exploded into rows (my preference): (This file has more info avail, like day numbers, dates that events occurred, new cases/deaths/recovered but is only limited to the trailing three months)
https://www.soothsawyer.com/wp-content/uploads/2020/03/john-hopkins-ryan-format-time-series2.csv (~210MB)
Download gzip version here (~12MB) <—– FASTER DOWNLOAD
Dates are individual columns instead of rows (this makes the file way smaller) and this file has ALL dates:
https://www.soothsawyer.com/wp-content/uploads/2020/03/john-hopkins-ryan-format-time-series1.csv (~3MB)
I have been around IT since I was in high school (running a customized BBS, and hacking) and am not the typical person that finds one area of interest at work; I have designed databases, automated IT processes, written code at the driver level and all the way up to the GUI level, ran an international software engineering team, started an e-commerce business that generated over $1M, ran a $5B product marketing team for one of the largest semiconductor players in the world, traveled as a sales engineer for the largest storage OEM in the world, researched and developed strategy for one of the top 5 enterprise storage providers, and traveled around the world helping various companies make investment decisions in startups. I also am extremely passionate about uncovering insights from any data set. I just like to have fun by making a notable difference, influencing others, and to work with smart people.
Pretty cool – thank you.
Hi Ryan,
There is no new information for new or recovered cases in the whole of China for a few days, have you any information about this?
I’m wondering about this since I live in China and your site is my only information source.
Thanks!
Paul
Hi Paul, There were three new deaths on July 14/15, and there is about 20-30 new cases per day on average (7-day avg). However, to your point there has been zero new cases for 4 days now. I read an article today (7/21/2020) https://www.wionews.com/world/chinese-mainland-reports-11-new-covid-19-cases-including-8-in-xinjiang-314891 that indicates there were 11 new cases in Xinjiang where they only had 77 confirmed cases. I have not seen this show up in the data yet. Let’s see if it shows up today or tomorrow in the numbers.
Hi Ryan,
This is amazing – thank you.
Where can I find data on confirmed COVID cases for last summer, for May and June only?
Thanks!
Nathan