content
column
(#80)xlex()
(#57).include_blank_cells = FALSE
had a bug that returned blank
cells as an empty row in the xlsx_cells()
data frame.-F10
(#26 @cablegui)._
as dates when the underscore is followed by a date-ish
character like M
(#24).include_blank_cells = FALSE
in xlsx_cells()
(#25).maybe_xlsx()
is provided for checking whether
a file might be in the xlsx format. It is impossible to be sure from the
magic number alone, because the magic numbers are either common to all
zip files, or common to other Microsoft Office files (e.g. .doc,
.ppt).xlsx_cells()
and xlsx_formats()
replace
tidy_xlsx()
, which has been deprecated.
xlsx_cells()
returns a single data frame of all the cells
in scope (the whole workbook, or chosen sheets), rather than a list of
separate data frames for each sheet. xlsx_formats()
performs orders of magnitude faster.xlsx_validation()
imports validation rules from cells
that restrict data input, such as cells that require a selection from a
drop-down list. See the vignette
vignette("data-validation-rules", package = "tidyxl")
.xlsx_names()
imports defined names (aka named
ranges/formulas), which can be used to filter for particular ranges of
cells by name. Use is_range()
to filter for ones that are
named ranges, and then read joining
rules to cells for how to join cell ranges to cell addresses. This
will become easier in a future release.is_range()
checks whether a formula is simply ranges of
cells.xlex()
tokenises formulas. This is useful for detecting
spreadsheet smells like embedded constants and deep nesting. There is a
demo Shiny
app, and a vignette
vignette("smells", package = "tidyxl")
. A vector of Excel
function names excel_functions
can be used to separated
built-in functions from custom functions. More experimental features
will be implemented in the off-CRAN package lexl before becoming part of
tidyxl.xlsx_cells()$character_formatted
is a new column for
the in-cell formatting of text (#5). This is for when different parts of
text in a single cell have been formatted differently from one
another.is_date_format()
checks whether a number format string
is a date format. This is useful if a cell formula contains a number
formatting string (e.g. TEXT(45678,"yyyy")
), and you need
to know that the constant 45678 is a date in order to recover it at full
resolution (rather than parsing the character output “2025” as a
year).xlsx_color_theme()
and it’s British alias
xlsx_colour_theme()
returns the theme colour palette used
in a file. This is useful to monitor use of a corporate standard
theme.xlsx_color_standard
and it’s British alias
xlsx_colour_standard
are data frames of the standard Excel
palette (red
, blue
, etc.).=A1*2
in cell B1
becomes
=A2*2
in cell B2
(for more details see issue
#7).tidy_xlsx()
has been deprecated in favour of
xlsx_cells()
, which returns a data frame of all the cells
in the workbook (or in the requested sheets), and
xlsx_formats()
, which returns a lookup list of cell
formats.tidy_xlsx()
and one of it’s replacments
xlsx_cells()
content
has been replaced by
is_blank
, a logical value indicating whether the cell
contains data. Please replace !is.na(content)
with
!is_blank
to filter out blank cells (ones with formatting
but no value).formula_type
has been replaced by
is_array
, a logical value indicating whether the cell’s
formula is an array formula or not. In Excel array formulas are
represented visually by being surrounded by curly braces
{}
.xlsx_formats()
and tidy_xlsx()
, theme
colours are given by name rather than by number,
e.g. "accent6"
instead of 4
."[Cyan]0%"
) are no longer mis-detect as dates (#21).
is_date_format()
tests whether a number format is a date
format.xlsx_formats()
is now thoroughly tested, and several
relatively minor bugs fixed. For example,
xlsx_formats(path)$local$fill$patternFill$patternType
consistently returns NA
and never "none"
when
a pattern fill has not been set, and escape-backslashes are consistently
omitted from numFmts.xlex()
, is_range()
and the handling of
relative references in shared formulas requires a dependency on the piton package, which
wraps the PEGTL C++ parser
generator.
date1904
attribute for
"false"
or "1"
to support files created by the
openxlsx
package (#8)."Normal"
) instead of by index integer. All the vectors
under x$formats$style
are named according to the style
names. x$data$sheet$style_format_id
has been renamed to
x$data$sheet$style_format
and its type changed from integer
(index into style formats) to character (still an index, but looking up
the named vectors by name). There are examples in the README and
vignette.NEWS.md
file to track changes to the
package.