OECD R-package transport data in R

“OECD” is a package in R, providing an interface for performing a querry directly from R code. In this post I use the “OECD” package to analyze European transport data.

I begin by reading in the OECD package in R. It contains a function “search_dataset” and I want to read what it can do:

library(OECD)
?search_dataset

## starting httpd help server ... done

“search_dataset” allows us to search datasets, based on a search string provided to the function as first parameter. “search_dataset” will return a dataframe as return output. Lets use the the function to find a dataset that contains interesting transport data.

head(search_dataset("transport"))

##                          id
## 80             AIRTRANS_CO2
## 385                    ETCR
## 452                     MTC
## 482            CIF_FOB_ITIC
## 900 ITF_PASSENGER_TRANSPORT
## 934     ITF_GOODS_TRANSPORT
##                                                                       title
## 80                                              Air Transport CO2 Emissions
## 385                 Regulation in energy, transport and communications 2013
## 452                                                Maritime Transport Costs
## 482 International Transport and Insurance Costs of Merchandise Trade (ITIC)
## 900                                                     Passenger transport
## 934                                                       Freight transport

For the article in the post at hand I want to use the “ITF_GOODS_TRANSPORT” dataset (freight transport). I querry it by using the “get_data_structure” function:

data_struct <- get_data_structure("ITF_GOODS_TRANSPORT")
typeof(data_struct) # what type os the data structure returned in by the function?

## [1] "list"

Let us see the content of the list:

str(data_struct)

## List of 9
##  $ VAR_DESC       :'data.frame': 9 obs. of  2 variables:
##   ..$ id         : chr [1:9] "COUNTRY" "VARIABLE" "YEAR" "OBS_VALUE" ...
##   ..$ description: chr [1:9] "Country" "Variable" "Year" "Observation Value" ...
##  $ COUNTRY        :'data.frame': 58 obs. of  2 variables:
##   ..$ id   : chr [1:58] "ALB" "ARG" "ARM" "AUS" ...
##   ..$ label: chr [1:58] "Albania" "Argentina" "Armenia" "Australia" ...
##  $ VARIABLE       :'data.frame': 14 obs. of  2 variables:
##   ..$ id   : chr [1:14] "T-GOODS-TOT-INLD" "T-GOODS-RL-TOT" "T-GOODS-RD-TOT" "T-GOODS-RD-REW" ...
##   ..$ label: chr [1:14] "Total inland freight transport" "Rail freight transport" "Road freight transport" "Road freight transport for hire and reward" ...
##  $ YEAR           :'data.frame': 49 obs. of  2 variables:
##   ..$ id   : chr [1:49] "1970" "1971" "1972" "1973" ...
##   ..$ label: chr [1:49] "1970" "1971" "1972" "1973" ...
##  $ OBS_STATUS     :'data.frame': 16 obs. of  2 variables:
##   ..$ id   : chr [1:16] "c" "B" "C" "D" ...
##   ..$ label: chr [1:16] "Internal estimate" "Break" "Non-publishable and confidential value" "Difference in methodology" ...
##  $ UNIT           :'data.frame': 316 obs. of  2 variables:
##   ..$ id   : chr [1:316] "1" "GRWH" "AVGRW" "IDX" ...
##   ..$ label: chr [1:316] "RATIOS" "Growth rate" "Average growth rate" "Index" ...
##  $ POWERCODE      :'data.frame': 32 obs. of  2 variables:
##   ..$ id   : chr [1:32] "0" "1" "2" "3" ...
##   ..$ label: chr [1:32] "Units" "Tens" "Hundreds" "Thousands" ...
##  $ REFERENCEPERIOD:'data.frame': 92 obs. of  2 variables:
##   ..$ id   : chr [1:92] "2013_100" "2012_100" "2011_100" "2010_100" ...
##   ..$ label: chr [1:92] "2013=100" "2012=100" "2011=100" "2010=100" ...
##  $ TIME_FORMAT    :'data.frame': 5 obs. of  2 variables:
##   ..$ id   : chr [1:5] "P1Y" "P1M" "P3M" "P6M" ...
##   ..$ label: chr [1:5] "Annual" "Monthly" "Quarterly" "Half-yearly" ...

The data provided by OECD has many dimensions. This is why I used get_data_structure – to understand the dimensions and structure of the data provided in the “ITS_GOODS_TRANSPORT” dataset.

Let us proceed to querrying the actual dataset, applying relevant filters. For this I use the “get_dataset” function provided by the “OECD” R-package:

data_df <- as.data.frame(get_dataset(dataset = "ITF_GOODS_TRANSPORT")) 
# this may take a file since we are not using any filter setting in get_dataset
head(data_df)

##   COUNTRY       VARIABLE TIME_FORMAT  UNIT POWERCODE obsTime obsValue
## 1     MKD T-CONT-SEA-TON         P1Y TONNE         3    1970       NA
## 2     MKD T-CONT-SEA-TON         P1Y TONNE         3    1971       NA
## 3     MKD T-CONT-SEA-TON         P1Y TONNE         3    1972       NA
## 4     MKD T-CONT-SEA-TON         P1Y TONNE         3    1973       NA
## 5     MKD T-CONT-SEA-TON         P1Y TONNE         3    1974       NA
## 6     MKD T-CONT-SEA-TON         P1Y TONNE         3    1975       NA
##   OBS_STATUS
## 1          M
## 2          M
## 3          M
## 4          M
## 5          M
## 6          M

Lets filter out some relevant subset of data:

backup_df <- data_df # in case I make a mistake I do not want to querry again
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

colnames(data_df) <- c("country","variable","timeformat","unit","powercode","obsTime","obsValue","obsStatus")
data_df <- dplyr::filter(data_df,country=="DEU") # filter for Germany only
data_df <- dplyr::filter(data_df,timeformat=="P1Y") # filter for annual values only
data_df <- dplyr::filter(data_df,unit=="TONNEKM") # filter for figures in tonne kilometres only

# obsStatus indicates if the observation value is e.g. just an estimate or "ball-park-figure"
# you can see things like that by browsing through the entries of data_struct
data_df <- data_df[is.na(data_df$obsStatus),] # filtering out all estimates or "ball-park-figures"

# there are different variables in our dataset, as indicates by the "variable" column entries
unique(data_df$variable)

## [1] "T-GOODS-RL-TOT"   "T-GOODS-RD-REW"   "T-GOODS-TOT-INLD" "T-GOODS-PP-TOT"  
## [5] "T-GOODS-IW-TOT"   "T-GOODS-RD-TOT"   "T-SEA-CAB"        "T-GOODS-RD-OWN"

We subsetted heavily, but the remaining dataframe still has different variables in it. If we want to understand those in detail we must go back to the structure object, i.e. the “data_struct” object:

data_struct$VARIABLE

##                  id                                      label
## 1  T-GOODS-TOT-INLD             Total inland freight transport
## 2    T-GOODS-RL-TOT                     Rail freight transport
## 3    T-GOODS-RD-TOT                     Road freight transport
## 4    T-GOODS-RD-REW Road freight transport for hire and reward
## 5    T-GOODS-RD-OWN      Road freight transport on own account
## 6    T-GOODS-IW-TOT         Inland waterways freight transport
## 7    T-GOODS-PP-TOT                        Pipelines transport
## 8         T-SEA-CAB      Coastal shipping (national transport)
## 9             T-SEA                         Maritime transport
## 10    T-CONT-RL-TEU            Rail containers transport (TEU)
## 11           T-CONT                       Containers transport
## 12    T-CONT-RL-TON         Rail containers transport (weight)
## 13   T-CONT-SEA-TEU        Maritime containers transport (TEU)
## 14   T-CONT-SEA-TON     Maritime containers transport (weight)

This means that we do not just have figures for freight transport in sum, but we have figures by relevant sub-categories. We will use this for further analysis in other posts. In this post we want to continue analyzing total inland freight transport.

Below chart visualizes the timeseries of total inland freight transport tonne-kilometries [TONNEKM] in millions (powercode 6) for Germany:

library(ggplot2)
ggplot(filter(data_df,variable=="T-GOODS-TOT-INLD")) + 
  geom_col(mapping = aes(x=as.numeric(obsTime),y=obsValue),fill="black") +
  ggtitle("Total inland freight transport development in Germany [OECD data]") +
  xlab("year") +
  ylab("in millions of TONNEKM")

Besides the OECD R-package you might also want to check the fredr package in R. Fredr is another R-package for accessing public economic data, providing access to the FRED database.

Linnart Felkl

Data scientist focusing on simulation, optimization and modeling in R, SQL, VBA and Python

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

OECD R-package transport data in R

Leave a Reply

Leave a Reply Cancel reply

You May Also Like

Excel operations in SQL: 40 examples

Parking lot simulator with simmer in R

Backlog simulation of FIFO production

Leave a Reply

Leave a Reply Cancel reply