This is a just small post where I share some experiment I did with the coronavirus package in R. It provides access to data shared by Johns Hopkins University. Unfortunately, at the moment of this post the latest data entries where from Feb 16 2020, i.e. 3 weeks in the past.
Below I import the package (which can be installed from CRAN) and read in the data.
library(coronavirus)
library(dplyr)
data(coronavirus)
data_df = coronavirus
head(data_df)
## # A tibble: 6 x 7
## Province.State Country.Region Lat Long date cases type
## <chr> <chr> <dbl> <dbl> <date> <int> <chr>
## 1 "" Japan 35.7 140. 2020-01-22 2 confirmed
## 2 "" South Korea 37.6 127. 2020-01-22 1 confirmed
## 3 "" Thailand 13.8 101. 2020-01-22 2 confirmed
## 4 Anhui Mainland China 31.8 117. 2020-01-22 1 confirmed
## 5 Beijing Mainland China 40.2 116. 2020-01-22 14 confirmed
## 6 Chongqing Mainland China 30.1 108. 2020-01-22 6 confirmed
Next, I make some time series plots using ggplot2. This one is for mainland China.
library(ggplot2)
ggplot(dplyr::filter(data_df,
Country.Region=="Mainland China",
type == "confirmed")) +
geom_col(mapping = aes(x=date,
y = cases),
fill = "red") +
labs(title="confirmed covid19 cases in Mainland China")
Below is a time series for Germany.
library(ggplot2)
ggplot(dplyr::filter(data_df,
Country.Region=="Germany",
type == "confirmed")) +
geom_col(mapping = aes(x=date,
y = cases),
fill = "red") +
labs(title="confirmed covid19 cases in Germany")
The data set contains latitude and longitude coordinates and would thus allow for heatmapping in packages such as Leaflet, deckgl, webglobe, ggmap or even ggplot2. However, as stated at the entry of this post, the latest data entries are from Feb 16 2020. It seems the dataset is not updated correctly.
Data scientist focusing on simulation, optimization and modeling in R, SQL, VBA and Python
Leave a Reply