This major release introduces a significant new feature that allows users to perform data cleaning operations in OpenRefine through an API query.
The new functionality passes JSON-specified operations to the running instance via the /command/core/apply-operations
endpoint. In addition to the generic refine_operations()
that can flexibly accept any valid JSON operation, the rrefine package includes a series of wrapper functions to perform common data cleaning procedures:
refine_remove_column()
: Remove a column from a projectrefine_add_column()
: Add a column to a projectrefine_rename_column()
: Rename an existing column in a projectrefine_move_column()
: Move a column to a new indexrefine_transform()
: Apply arbitrary text transformationsrefine_to_lower()
: Coerce text to lowercaserefine_to_upper()
: Coerce text to uppercaserefine_to_title()
: Coerce text to title caserefine_to_null()
: Set values to NULL
refine_to_empty()
: Set text values to empty string (""
)refine_to_text()
: Coerce value to stringrefine_to_number()
: Coerce value to numericrefine_to_date()
: Coerce value to daterefine_trim_whitespace()
: Remove leading and trailing whitespacesrefine_collapse_whitespace()
: Collapse consecutive whitespaces to single whitespacerefine_unescape_html()
: Unescape HTML in stringIn addition to the data cleaning operations functionality, the documentation has been updated throughout to point to the current OpenRefine user manual (https://docs.openrefine.org/).
Tested with OpenRefine 3.4.1 and 3.5.0 running on Linux and Mac OSX.
Minor release to incorporate new features.
refine_export
the user can now specify “col_types” for tabular format returned. Thanks to @joelnitta for the contribution!The only update in this release is the removal of one of the package dependencies (rlist
), which has been scheduled to be archived per the CRAN team. This change is required for continued distribution of rrefine
via CRAN. Functions from rlist
were only used in an unexported rrefine
helper, and there are no anticipated user-facing changes in this release.
This release includes a number of new features, more robust checks and internal logic, and many improvements to package documentation. Most significantly, this version introduces support for the Cross-Site Request Forgery (CSRF) token in OpenRefine API requests, which is required in certain API calls as of OpenRefine v3.3. This feature is now included in rrefine
but operates internally and should be invisible to users. For more information the OpenRefine CSRF protection see: https://github.com/OpenRefine/OpenRefine/wiki/Changes-for-3.3#csrf-protection-changes
refine_metadata()
function is now exported and user-facing.http://127.0.0.1:3333
)refine_upload()
function now checks file format based on extension, and allows both .csv and .tsv files to be uploaded.refine_upload(..., open.browser=TRUE)
will now redirect the user to the newly created project in the OpenRefine instance.tibble
with the data in R.refine_upload()
and refine_delete()
functions now confirm success of operations by comparing metadata before/after POST requests.refine_query()
internal helper function.refine_id()
helper function now validates “project.id” against list of project ids in the running instance.news.md
to track release notes!refine_delete()
and refine_upload()
now generate a CSFR token internally.Tested with OpenRefine 3.2, 3.3, and 3.4.1 running on Linux and Mac OSX.
NOTE The
rrefine
package was released to CRAN under version 1.0. However, theDESCRIPTION
file for that release noted the version as 0.1. All releases from v1.1.0 onwards will maintain consistency between the version in theDESCRIPTION
and the version number on the CRAN release.