This vignette describes how to retrieve data from a coin. The main
functions to do this are
get_dset() and the more flexible
These functions are important to understand, because many COINr functions use them to retrieve data for plotting, analysis and other functions. Both functions are generics, which means that they have methods for coins and purses.
Every time a “building” operation is applied to a coin, such as
and so on, a new data set is created. Data sets live in the
.$Data sub-list of the coin. We can retrieve a data set at
any time using the
library(COINr) # build full example coin coin <- build_example_coin(quietly = TRUE) # retrieve normalised data set dset_norm <- get_dset(coin, dset = "Normalised") # view first few rows and cols head(dset_norm[1:5], 5) #> uCode LPI Flights Ship Bord #> 1 AUS 79.96112 12.3223217 66.14497 0.00000 #> 2 AUT 94.07137 27.8763185 0.00000 42.01269 #> 3 BEL 94.56023 23.3967426 97.14314 100.00000 #> 4 BGD 27.63906 0.1243185 45.80661 10.85013 #> 5 BGR 34.29965 10.8828790 37.40495 16.34359
By default, a data set in the coin consists of indicator columns plus
the “uCode” column, which is the unique identifier of each row. You can
also ask to attach unit metadata columns, such as unit names, groups,
and anything else that was input when building the coin, using the
# retrieve normalised data set dset_norm2 <- get_dset(coin, dset = "Normalised", also_get = c("uName", "GDP_group")) # view first few rows and cols head(dset_norm2[1:5], 5) #> uCode uName GDP_group LPI Flights #> 1 AUS Australia XL 79.96112 12.3223217 #> 2 AUT Austria L 94.07137 27.8763185 #> 3 BEL Belgium L 94.56023 23.3967426 #> 4 BGD Bangladesh M 27.63906 0.1243185 #> 5 BGR Bulgaria S 34.29965 10.8828790
get_dset() is a quick way to retrieve an entire
data set and metadata, the
get_data() function is a
generalisation: it can also be used to obtain a whole data set, but also
subsets of data, based on e.g. indicator selection and grouping
(columns), as well as unit selection and grouping (rows).
A simple example is to extract one or more named indicators from a target data set:
get_data() returns the requested indicators,
uCode identifier column. We can also set
also_get = "none" to return only the indicator columns.
iCode argument can also accept groups of indicators,
based on the structure of the index. In our example, indicators are
aggregated into “pillars” (level 2) within groups. We can name an
aggregation group and extract the underlying indicators:
Here we have requested all the indicators in level 1 (the indicator level), that belong to the group called “Political” (one of the pillars). Specifying the level becomes more relevant when we look at the aggregated data set, which also includes the pillar, sub-index and index scores. Here, for example, we can ask for all the pillar scores (level 2) which belong to the sustainability sub-index (level 3):
If this isn’t clear, look at the structure of the example index using
plot_framework(coin). If we wanted to select all the
indicators within the “Sust” sub-index we would set
Level = 1. If we wanted to select the sub-index scores
themselves we would set
Level = 3, and so on.
The idea of selecting indicators and aggregates based on the
structure of the index is useful in many places in COINr, for example
examining correlations within aggregation groups using
Units (rows) of the data set can also be selected (also in combination with selecting indicators). Starting with a simple example, let’s select specified units for a specific indicator:
Rows can also be sub-setted using groups, i.e. unit groupings that
are defined using variables input with
iMeta$Type = "Group"
when building the coin. Recall that for our example coin we have several
groups (a reminder that you can see some details about the coin using
its print method):
coin #> -------------- #> A coin with... #> -------------- #> Input: #> Units: 51 (AUS, AUT, BEL, ...) #> Indicators: 49 (Goods, Services, FDI, ...) #> Denominators: 4 (Area, Energy, GDP, ...) #> Groups: 4 (GDP_group, GDPpc_group, Pop_group, ...) #> #> Structure: #> Level 1 Indicator: 49 indicators (FDI, ForPort, Goods, ...) #> Level 2 Pillar: 8 groups (ConEcFin, Instit, P2P, ...) #> Level 3 Sub-index: 2 groups (Conn, Sust) #> Level 4 Index: 1 groups (Index) #> #> Data sets: #> Raw (51 units) #> Denominated (51 units) #> Imputed (51 units) #> Screened (51 units) #> Treated (51 units) #> Normalised (51 units) #> Aggregated (51 units)
The first way to subset by unit group is to name a grouping variable, and a group within that variable to select. For example, say we want to know the values of the “Goods” indicator for all the countries in the “XL” GDP group:
get_data(coin, dset = "Raw", iCodes = "Goods", use_group = list(GDP_group = "XL")) #> uCode GDP_group Goods #> 1 AUS XL 288.4893 #> 8 CHN XL 1713.6190 #> 11 DEU XL 1919.1940 #> 13 ESP XL 447.1229 #> 16 FRA XL 849.3303 #> 17 GBR XL 778.9052 #> 21 IDN XL 222.4186 #> 22 IND XL 288.9806 #> 24 ITA XL 658.1981 #> 25 JPN XL 732.2078 #> 28 KOR XL 568.9920 #> 45 RUS XL 343.8504
Since we have subsetted by group, this also returns the group column which was used.
Another way of sub-setting is to combine
use_group. When these two arguments are both specified, the
result is to return the full group(s) to which the specified
uCodes belong. This can be used to put a unit in context
with its peers within a group. For example, we might want to see the
values of the “Flights” indicator for a specific unit, as well as all
other units within the same population group:
get_data(coin, dset = "Raw", iCodes = "Flights", uCodes = "MLT", use_group = "Pop_group") #> uCode Pop_group Flights #> 6 BRN S 2.01900 #> 9 CYP S 8.75467 #> 14 EST S 3.12946 #> 19 HRV S 9.24529 #> 23 IRL S 34.17721 #> 30 LTU S 5.37919 #> 31 LUX S 4.84458 #> 32 LVA S 6.77976 #> 33 MLT S 6.75251 #> 35 MNG S 0.98951 #> 38 NOR S 25.64994 #> 39 NZL S 13.37242 #> 48 SVN S 1.51736
Here, we have to specify
use_group simply as a string
rather than a list. Since MLT is in the “S” population group, it returns
all units within that group.
Overall, the idea of
get_data() is to flexibly return
subsets of indicator data, based on the structure of the index and unit
As a final point, it’s worth pointing out that a coin is simply a list of R objects such as data frames, other lists, vectors and so on. It has a particular format which allows things to be easily accessed by COINr functions. But other than that, its an ordinary R object. This means that even without the helper functions mentioned, you can get at the data simply by exploring the coin yourself.
The data sets live in the
.$Data sub-list of the
And we can access any of these directly:
The metadata lives in the
.$Meta sub-list. For example,
the unit metadata, which includes groups, names etc:
str(coin$Meta$Unit) #> 'data.frame': 51 obs. of 11 variables: #> $ uCode : chr "AUS" "AUT" "BEL" "BGD" ... #> $ uName : chr "Australia" "Austria" "Belgium" "Bangladesh" ... #> $ GDP_group : chr "XL" "L" "L" "M" ... #> $ GDPpc_group : chr "XL" "XL" "L" "S" ... #> $ Pop_group : chr "L" "M" "L" "XL" ... #> $ EurAsia_group: chr "Asia" "Europe" "Europe" "Asia" ... #> $ Time : num 2018 2018 2018 2018 2018 ... #> $ Area : num 7741220 83871 30528 148460 110879 ... #> $ Energy : num 81.3 27 41.83 27.92 9.96 ... #> $ GDP : num 1304.5 390.8 468 220.8 53.2 ... #> $ Population : num 24451 8735 11429 164670 7085 ...
The point is that if COINr tools don’t get you where you want to go, knowing your way around the coin allows you to access the data exactly how you want.