This vignette explains the purpose and usage of
token_fetch()
and the functions it subsequently calls. The
goal of token_fetch()
is to secure a token for use in
downstream requests.
The target audience is someone who works directly with a Google API. These people roughly fall into two camps:
token_fetch()
is aimed at whoever is going to manage the
returned token, e.g., incorporate it into downstream requests. It can be
very nice for users if wrapper packages assume this responsibility, as
opposed to requiring users to explicitly acquire and manage their
tokens. We give a few design suggestions here and cover this in more
depth in How
to use gargle for auth in a client package.
library(gargle)
token_fetch()
token_fetch()
is a rather magical function for getting a
token. The goal is to make auth relatively painless for users, while
allowing developers and power users to take control when and if they
need to. Most users will presumably interact with
token_fetch()
only in an indirect way, mediated through an
API wrapper package. That is not because the interface of
token_fetch()
is unfriendly – it’s very flexible! The
objective of token_fetch()
is to allow package developers
to take responsibility for managing the user’s token, without
having to implement all the different ways of obtaining that
token in the first place.
The signature of token_fetch()
is very simple and,
therefore, not very informative:
token_fetch(scopes, ...)
Under the hood, token_fetch()
calls a sequence of much
more specific credential functions, each wrapped in a
tryCatch()
and returning NULL
if unsuccessful.
The only formal argument these functions have in common is
scopes
, with the rest being passed via
...
.
This gives a sense of the credential functions and reflects the order in which they are called:
names(cred_funs_list())
#> [1] "credentials_service_account" "credentials_external_account"
#> [3] "credentials_app_default" "credentials_gce"
#> [5] "credentials_byo_oauth2" "credentials_user_oauth2"
It is possible to manipulate this registry of functions. The help for
cred_funs_list()
is a good place to learn more.
From now on, however, we assume you’re working with the default registry that ships with gargle.
Note also that these credential functions are exported and can be called directly.
To see more information about what gargle is up to, set the option
named “gargle_verbosity” to “debug”. Read more in the docs for
gargle_verbosity()
.
credentials_service_account()
The first function tried is
credentials_service_account()
. Here’s how a call to
token_fetch()
with service account inputs plays out:
token_fetch(scopes = <SCOPES>, path = "/path/to/your/service-account.json")
# leads to this call:
credentials_service_account(
scopes = <SCOPES>,
path = "/path/to/your/service-account.json"
)
The scopes
are often provided by the API wrapper
function that is mediating the calls to token_fetch()
and
credential_service_account()
. The path
argument is presumably coming from the user. It is treated as a JSON
representation of service account credentials, in any form that is
acceptable to jsonlite::fromJSON()
. In the above example,
that is a file path, but it could also be a JSON string. If there is no
named path
argument or if it can’t be parsed as a service
account credential, we fail and token_fetch()
’s execution
moves on to the next function in the registry.
Here is some Google documentation about service accounts:
For R users, a service account is a great option for credentials that will be used in a script or application running remotely or in an unattended fashion. In particular, this is a better approach than trying to move OAuth2 credentials from one machine to another. For example, a service account is the preferred method of auth when testing and documenting a package on a continuous integration service.
The JSON key file must be managed securely. In particular, it should not be kept in, e.g., a GitHub repository (unless it is encrypted). The encryption strategy used by gargle and other packages is described in the article Managing tokens securely.
Note that fetching a token for a service account requires a reasonably accurate system clock. This is of particular importance for users running gargle inside a Docker container, as Docker for Windows has intermittently seen problems with clock drift. If your service account token requests fail with “Bad Request” inside a container but succeed locally, check that the container’s system clock is accurate.
credentials_external_account()
The second function tried is
credentials_external_account()
. Here’s how a call to
token_fetch()
with an external account inputs plays
out:
token_fetch(scopes = <SCOPES>, path = "/path/to/your/external-account.json")
# leads to this call:
credentials_external_account(
scopes = <SCOPES>,
path = "/path/to/your/external-account.json"
)
credentials_external_account()
implements something
called workload identity federation and is available to
applications running on specific non-Google Cloud platforms. At the time
of writing, gargle only supports AWS, but this could be expanded to
other providers, such as Azure, if there is a documented need.
Similar to credentials_service_account()
, the
path
is treated as a JSON representation of the account’s
configuration and it’s probably a file path. However, in contrast to
credentials_service_account()
, this JSON only contains
non-sensitive metadata, which is, indeed, the main point of this flow.
The secrets needed to complete auth are obtained “on-the-fly” from,
e.g., the running EC2 instance.
credentials_service_account()
will fail for many
reasons: there is no named path
argument, the JSON at
path
can’t be parsed as configuration for an external AWS
account, we don’t appear to running on AWS, suggested packages for AWS
functionality are not installed, or the workload identity pool is
misconfigured. If any of that happens, we fail and
token_fetch()
’s execution moves on to the next function in
the registry.
Here is some Google documentation about workload identity federation and the specifics for AWS:
credentials_app_default()
The third function tried is credentials_app_default()
.
Here’s how a call to token_fetch()
might work:
token_fetch(scopes = <SCOPES>)
# credentials_service_account() fails because no `path`,
# which leads to this call:
credentials_app_default(
scopes = <SCOPES>
)
credentials_app_default()
loads credentials from a file
identified via a search strategy known as Application
Default Credentials (ADC). The credentials themselves are
conventional service account, external account, or user credentials that
happen to be stored in a pre-ordained location and format.
The hope is to make auth “just work” for someone working on
Google-provided infrastructure or who has used Google tooling to get
started, such as the gcloud
command
line tool. A sequence of paths is consulted, which we describe here,
with some abuse of notation. ALL_CAPS represents the value of an
environment variable.
${GOOGLE_APPLICATION_CREDENTIALS}
${CLOUDSDK_CONFIG}/application_default_credentials.json
# on Windows:
%APPDATA%\gcloud\application_default_credentials.json
%SystemDrive%\gcloud\application_default_credentials.json
:\gcloud\application_default_credentials.json
C
# on not-Windows:
~/.config/gcloud/application_default_credentials.json
If the above search successfully identifies a JSON file, it is parsed
and ingested either as a service account token, an external account
configuration, or an OAuth2 user credential. In the case of an OAuth2
credential, the requested scopes
must also meet certain
criteria. Note that this will NOT work for OAuth2 credentials initiated
by gargle, which are stored on disk in .rds
files. The
storage of OAuth2 user credentials as JSON is unique to certain Google
tools – possibly just the gcloud
CLI – and should probably be regarded as deprecated. It is
recommended to use ADC with a service account or workload identity
federation. If this quest is unsuccessful, we fail and
token_fetch()
’s execution moves on to the next function in
the registry.
The main takeaway lesson:
token_fetch()
is
called with only the scopes
argument specified.Again, remember that the JSON key file for a conventional service account must be managed securely and should NOT live in a directory that syncs to the cloud. The JSON configuration for an external account is not actually sensitive and this is one of the benefits of this flow, but it’s only available in a very narrow set of circumstances.
credentials_gce()
The next function tried is credentials_gce()
. Here’s how
a call to token_fetch()
might work:
token_fetch(scopes = <SCOPES>)
# or perhaps
token_fetch(scopes = <SCOPES>, service_account = <SERVICE_ACCOUNT>)
# credentials_service_account() fails because no `path`,
# credentials_app_default() fails because no ADC found,
# which leads to one of these calls:
credentials_gce(
scopes = <SCOPES>,
service_account = "default"
)# or
credentials_gce(
scopes = <SCOPES>,
service_account = <SERVICE_ACCOUNT>
)
credentials_gce()
retrieves service account credentials
from a metadata service that is specific to virtual machine instances
running on Google Cloud Engine (GCE). Basically, if you have to ask what
this is about, this is not the auth method for you. Let us move on.
credentials_byo_oauth2()
The next function tried is credentials_byo_oauth2()
.
Here’s how a call to token_fetch()
might work:
token_fetch(token = <TOKEN2.0>)
# credentials_service_account() fails because no `path`,
# credentials_app_default() fails because no ADC found,
# credentials_gce() fails because not on GCE,
# which leads to this call:
credentials_byo_oauth2(
token = <TOKEN2.0>
)
credentials_byo_oauth2()
provides a back door for a
“bring your own token” workflow. This function accounts for the scenario
where an OAuth token has been obtained through external means and it’s
convenient to be able to put it into force.
credentials_byo_oauth2()
checks that token
is of class httr::Token2.0
and that it appears to be
associated with Google. A token
of class
request
is also acceptable, in which case the
auth_token
component is extracted and treated as the input.
This is how a Token2.0
object would present, if processed
with httr::config()
, as functions like
googledrive::drive_token()
and
bigrquery::bq_token()
do.
If token
is not provided or if it doesn’t satisfy these
requirements, we fail and token_fetch()
’s execution moves
on to the next function in the registry.
credentials_user_oauth2()
The next and final function tried is
credentials_user_oauth2()
. Here’s how a call to
token_fetch()
might work:
token_fetch(scopes = <SCOPES>)
# credentials_service_account() fails because no `path`,
# credentials_app_default() fails because no ADC found,
# credentials_gce() fails because not on GCE,
# credentials_byo_oauth2() fails because no `token`,
# which leads to this call:
credentials_user_oauth2(
scopes = <SCOPES>,
app = <OAUTH_APP>,
package = "<PACKAGE>"
)
credentials_user_oauth2()
is where the vast majority of
users will end up. This is the function that choreographs the
traditional “OAuth dance” in the browser. User credentials are cached
locally, at the user level, by default. Therefore, after first use,
there are scenarios in which gargle can determine unequivocally that it
already has a suitable token on hand and can load (and possibly refresh)
it, without additional user intervention.
The scopes
, app
, and package
are generally provided by the API wrapper function that is mediating the
calls to token_fetch()
. Do not “borrow” an OAuth app (OAuth
client ID and secret) from gargle or any other package; always use
credentials associated with your package or provided by your user. Per
the Google User Data Policy https://developers.google.com/terms/api-services-user-data-policy,
your application must accurately represent itself when authenticating to
Google API services.
The wrapper package would presumably also declare itself as the
package requesting a token (this is used in messages). So here’s how a
call to token_fetch()
and
credentials_user_oauth2()
might look when initiated from
THINGY_auth()
, a function in the fictional thingyr wrapper
package:
# user initiates auth or does something that triggers it indirectly
THINGY_auth()
# which then calls
::token_fetch(
garglescopes = <SCOPES_NEEDED_FOR_THE_THINGY_API>,
app = thingy_app(),
package = "thingyr"
)
# which leads to this call:
credentials_user_oauth2(
scopes = <SCOPES_NEEDED_FOR_THE_THINGY_API>,
app = thingy_app(),
package = "thingyr"
)
See How
to use gargle for auth in a client package for design ideas for a
function like THINGY_auth()
.
What happens tomorrow or next week? Do we make this user go through the browser dance again? How do we get to that happy place where we don’t bug them constantly about auth?
First, we define “suitable”, i.e. what it means to find a matching
token in the cache. credentials_user_oauth2()
is a thin
wrapper around gargle2.0_token()
which is the constructor
for the gargle::Gargle2.0
class used to hold an OAuth2
token. And that call might look something like this (simplified for
communication purposes):
gargle2.0_token(
email = gargle_oauth_email(),
app = thingy_app(),
package = "thingyr",
scope = <SCOPES_NEEDED_FOR_THE_THINGY_API>,
cache = gargle_oauth_cache()
)
gargle looks in the cache specified by
gargle_oauth_cache()
for a token that has these scopes,
this app, and the Google identity specified by email
. By
default email
is NA
, so we might find one or
more tokens that have the necessary scopes and app. In that case, gargle
reveals the email
associated with the matching token(s) and
asks the user for explicit instructions about how to proceed. That looks
something like this:
The thingyr package is requesting access to your Google account. Select a-authorised account or enter '0' to obtain a new token. Press Esc/Ctrl + C to
pre
abort.
1: janedoe_personal@gmail.com
2: janedoe@example.com
3: janedoe_work@gmail.com
: 3 Selection
If none of the tokens has the right scopes and app (or if the user declines to use a pre-existing token), we head to the browser to initiate OAuth2 flow de novo.
A user can reduce the need for interaction by passing the target
email
to thingy_auth()
:
thingy_auth(email = "janedoe_work@gmail.com")
or by specifying same in the gargle_oauth_email
option.
A value of email = TRUE
, passed directly or via the option,
is an alternative strategy: TRUE
means that gargle is
allowed to use a matching token whenever there is exactly one match.
The elevated status of email
for
gargle::Gargle2.0
tokens is motivated by the fact that many
of us have multiple Google identities and need them to be very prominent
when working with Google APIs. This is one of the main motivations for
gargle::Gargle2.0
, which extends
httr::Token2.0
. The gargle::Gargle2.0
class
also defaults to a user-level token cache, as opposed to project-level.
An overview of the current OAuth cache is available via
gargle_oauth_cache()
and the output looks something like
this:
gargle_oauth_sitrep()
#' gargle OAuth cache path:
#' /Users/janedoe/.R/gargle/gargle-oauth
#'
#' 14 tokens found
#'
#' email app scope hash...
#' ----------------------------- ----------- ------------------------------ ----------
#' abcdefghijklm@gmail.com thingy ...bigquery, ...cloud-platform 128f9cc...
#' buzzy@example.org gargle-demo 15acf95...
#' stella@example.org gargle-demo ...drive 4281945...
#' abcdefghijklm@gmail.com gargle-demo ...drive 48e7e76...
#' abcdefghijklm@gmail.com tidyverse 69a7353...
#' nopqr@ABCDEFG.com tidyverse ...spreadsheets.readonly 86a70b9...
#' abcdefghijklm@gmail.com tidyverse ...drive d9443db...
#' nopqr@HIJKLMN.com tidyverse ...drive d9443db...
#' nopqr@ABCDEFG.com tidyverse ...drive d9443db...
#' stuvwzyzabcd@gmail.com tidyverse ...drive d9443db...
#' efghijklmnopqrtsuvw@gmail.com tidyverse ...drive d9443db...
#' abcdefghijklm@gmail.com tidyverse ...drive.readonly ecd11fa...
#' abcdefghijklm@gmail.com tidyverse ...bigquery, ...cloud-platform ece63f4...
#' nopqr@ABCDEFG.com tidyverse ...spreadsheets f178dd8...