odns
provides a base for exploring and obtaining data
available through the Scottish Health and Social Care Open Data
platform. The package provides a wrapper for the underlying CKAN) API
and simplifies the process of accessing the available data with R,
allowing users to quickly explore the available data and start using it
without having to write complex queries.
CKAN and by extension this package refers to packages and resources.
The term package refers to a dataset, a collection of resources. A resource, contains the data itself.
Example of structure;
.
├── package_1
│ ├── resource_1
│ ├── resource_2
│ └── resource_3
|
└── package_2
├── resource_1
└── resource_2
To view all the packages available we can use
all_packages()
.
#' view all available packages
> all_packages()
# package_name package_id
# covid-19-vaccination-in-scotland 6dbdd466-…
# enhanced-surveillance-of-covid-19-in-scotland 3c5231ee-…
# hospital-onset-covid-19-cases-in-scotland d67b13ef-…
# weekly-covid-19-statistical-data-in-scotland 524b42b4-…
# covid-19-in-scotland b318bddf-…
# … with 85 more rows
If we wish to search for packages by name, or a part of a name, we
can use the contains
argument of
all_packages()
. For example if we want to find packages
relating to populations;
> all_packages(contains = "population")
# package_name package_id
# population-estimates 7f010430-6ce1-4813-b25c-f7f335bdc4dc
# standard-populations 4dd86111-7326-48c4-8763-8cc4aa190c3e
# population-projections 9e00b589-817e-45e6-b615-46c935bbace0
# gp-practice-populations e3300e98-cdd2-4f4e-a24e-06ee14fcc66c
# scottish-index-of-multiple-deprivation 78d41fa9-1a62-4f7b-9edb-3e8522a93378
If we prefer to see more detail, including the names of resources
within packages, we can use all_resources()
with the
package_contains
argument.
#' view all resources within packages whose names contain "population"
> all_resources(package_contains = "population")
# resource_name resource_id package_name package_id url last_modified
# Data Zone (2011) Pop… c505f490-c… population-… 7f010430-… http… 2021-10-11T1…
# Intermediate Zone (2… 93df4c88-f… population-… 7f010430-… http… 2021-10-11T1…
# Council Area (2019) … 09ebfefb-3… population-… 7f010430-… http… 2021-07-06T0…
# Health and Social Ca… c3a393ce-2… population-… 7f010430-… http… 2021-07-06T0…
# Health Board (2019) … 27a72cc8-d… population-… 7f010430-… http… 2021-07-06T0…
# … with 53 more rows
The resource_contains
argument of
all_resources()
can also be used to further narrow the
results of a query.
#' view all resources whose names contain "population"
> all_resources(resource_contains = "european")
# resource_name resource_id package_name package_id url last_modified
# Population mortality… ec2af2be-8… hospital-st… c88a5231-… http… 2022-05-10T0…
# GP Practice Populati… 2c701f90-c… gp-practice… e3300e98-… http… 2022-05-10T0…
# GP Practice Populati… d07debcf-7… gp-practice… e3300e98-… http… 2022-02-07T1…
# GP Practice Populati… 4a3c438b-2… gp-practice… e3300e98-… http… 2021-11-02T1…
# GP Practice Populati… 0779e100-1… gp-practice… e3300e98-… http… 2022-02-17T1…
# … with 45 more rows
When passing search strings they are case insensitive. The example
above of all_resources(resource_contains = "european")
would return resources contained ‘EUROPEAN’, ‘European’, or european.
Lets say that we are interested in ‘standard-populations’ resource
‘European Standard Population’. We can view the metadata for the package
and the resource.
#' view metadata for a package using a valid package name
> package_metadata(package = "standard-populations")
# $nhs_language
# [1] "English"
#
# $license_title
# [1] "UK Open Government Licence (OGL)"
#
# $maintainer
# [1] ""
#
# $version
# [1] ""
#
#...
#' view metadata for a resource using a valid resource id
> resource_metadata(resource="edee9731-daf7-4e0d-b525-e4c1469b8f69")
# id type
# _id int
# AgeGroup text
# EuropeanStandardPopulation numeric
The package metadata contains useful information such as the time it was last modified, the publisher, a description, and notes.
The resource metadata provides details of the available fields and
their types. This information is particularly useful when putting
together a query where we want to return only a subset of data.
We can import all of the resources within a package using
get_resource()
. This is often the quickest and simplest way
to import data where all resources within one or more packages are
required.
#' get all resources in a package
> get_resource(package = "4dd86111-7326-48c4-8763-8cc4aa190c3e")
#' get the first 10 rows of each resource in a package
> get_resource(package = "4dd86111-7326-48c4-8763-8cc4aa190c3e", limit = 10L)
#' both package IDs and names can be used
> get_resource(package = "standard-populations", limit = 10L)
#' multiple packages can be specified returning all resources under each
> get_resource(package = c("standard-populations", "population-projections")
We can also use the resource
argument of
get_resource()
to import a specific resource within a
package. This is often the simplest way to get a single resource in its
entirety.
#' get specific resources
> get_resource(
resource = c("European Standard Population",
"9e00b589-817e-45e6-b615-46c935bbace0"),
limit = 5L
)
#' get a specific resource, if it exists within a specified package
> get_resource(
package = "standard-populations",
resource = "European Standard Population"
)
get_resource()
always returns a list, even when only one
resource is being imported. Where multiple resources have been imported,
each resource is its own list element.
Where more granular control is desired over the data imported, the
get_data()
function can be used. get_data()
allows us to import selected fields and to filter the data. If we want
to import the fields ‘AgeGroup’ and ‘EuropeanStandardPopulation’ from
the ‘European Standard Population’ resource we can achieve this with
get_data()
.
# import specified fields from a resource
> get_data(
resource = "edee9731-daf7-4e0d-b525-e4c1469b8f69",
fields = c("AgeGroup", "EuropeanStandardPopulation")
)
# AgeGroup EuropeanStandardPopulati…
# 0-4 years 5000
# 5-9 years 5500
# 10-14 years 5500
# 15-19 years 5500
# 20-24 years 6000
# … with 14 more rows
The where
argument of get_data()
can be
used to exact further control over the data we import. If we only want
to retrieve rows from ‘European Standard Population’ where the age group
is 45-49 years, we can write a SQL style where query to achieve this.
#' import specified fields from a data set utilising a SQL style where query
> get_data(
resource = "edee9731-daf7-4e0d-b525-e4c1469b8f69",
fields = c("AgeGroup", "EuropeanStandardPopulation"),
where = "\"AgeGroup\" = \'45-49 years\'"
)
# AgeGroup EuropeanStandardPopulation
# 45-49 years 7000
The where
argument of get_data()
requires
specific formatting to allow for compatibility with the CKAN API. Field
names must be double quoted "
, non-numeric values must be
single quoted '
, and both single and double quotes must be
delimited, for example;
where = "\"AgeGroup\" = \'45-49 years\\'"
.