This vignettes gives some detailed information regarding the Lua filters including in the rmarkdown package. You don’t have to do anything to use them usually as they just works usaully. This document presents the features they power.
To know more about Lua filters in general, you can jump directly to last section
This filter is available since rmarkdown 1.15 and works with Pandoc 2.1+ (shipped in RStudio >= 1.2).
Adding a pagebreak in document was always possible using custom output specific syntax in a rmarkdown file but one drawback was the compatibility with several output format.
With this Lua filter, it is possible to add a \newpage
or \pagebreak
command in a new line to include a pagebreak
in any of these formats: pdf_document()
,
html_document()
, word_document()
and
odt_document()
.
# Header 1
Some text
\newpage
# Header 2 on a new page
Some other text
\pagebreak
# Header 3 on a third page
rmarkdown will convert those commands in the correct output format syntax using a Lua filter during pandoc conversion.
As the commands are the ones already used in latex syntax, this works
as expected in a tex output document, and thus with pdf. Adding a
pagebreak was already possible with rmarkdown when output is
pdf_document()
or latex_document()
, without
any restriction about the version of pandoc.
A \newpage
or \pagebreak
command in a
rmarkdown document with output as HTML will be converted by default in
this html code with inline style using CSS rule page-break-after
<div style="page-break-after: always;"></div>
This will always insert a pagebreak after this div.
To get more flexibility, you can use a HTML class and some custom CSS
instead of an inline style. You need to add a metadata field
newpage_html_class
in your yaml header to set the
class.
Then you can control the behavior using custom CSS as in this example
---
output:
html_document: default
newpage_html_class: page-break
---
```{css, echo = FALSE}
// display the pagebreak only when printing the html page
@media all {
.page-break { display: none; }
}
@media print {
.page-break { display: block; break-after: page; }
}
```
# Header 1
Some text
\newpage
# Header 2 on a new page
Some other text
\newpage
will be converted here to
<div class="page-break"></div>
and the style will be applied to this class from the CSS included in the chunk.
This customisation can also be achieved by setting the environnement
variable PANDOC_NEWPAGE_HTML_CLASS
in the R session that
will render the document (or in .Renviron
file for
example)
Let’s note that in this example we use break-after
property instead of page-break-after
as it is recommended
now to use the former which is the replacement. The latter is kept
around for compatibility reason with
browsers.
A \newpage
or \pagebreak
command in a
rmarkdown document with output as Word document will be converted in a
pagebreak for word document. Manually, this would mean adding this in
your rmarkdown
```{=openxml}
<w:p><w:r><w:br w:type="page"/></w:r></w:p>
```
For example, using the pagebreak feature, this will add the first header in the second page of the work document
---
title: My main title
output: word_document
---
\newpage
# First Header
To use the pagebreak feature with odt_document()
, you
need to provide a reference document that includes a paragraph style
with, by default, the name Pagebreak. This named paragraph
style should have no extra space before or after and have a pagebreak
after it. (see libre office
documentation on how to create a style).
The name of the named paragrah style could be customized using
newpage_odt_style
metadata in yaml header or
PANDOC_NEWPAGE_ODT_STYLE
environment variable (as in html document).
As the previous one, this example will lead to a two pages document, with first header on the second page.
---
title: My main title
output:
odt_document:
reference_odt: reference.odt
---
\newpage
# First Header
This filter is available since rmarkdown 2.4 and works with Pandoc 2.1+ (shipped in RStudio >= 1.2).
Numbering sections are supported by Pandoc for limited formats (e.g.,
html and pdf). The rmarkdown package adds
number_sections.lua
to support this feature in other
formats (e.g., docx, odt, and so on). Users do not have to know which
formats use the Pandoc’s feature or the Lua filter.
---
title: My main title
output:
md_document:
number_sections: true # implemented by Lua filter
html_document
number_sections: true # implemented by Pandoc
---
# First Header
## A Header Belonging the First Header
# Second Header
In general, numbers and titles of sections are separated by a space.
An exception is the word_document
function, which separates
them by a tab in order to be consistent with Pandoc’s number sections
for docx format in Pandoc >= 2.10.1. If one wants to have fine
controls on the format of section numbers, prepare customized docx file
and specify it to the reference_docx
argument of the
word_document
function.
This filter is available since rmarkdown 1.15 and works with Pandoc 2.1+ (shipped in RStudio >= 1.2).
Pandoc supports a syntax called Fenced Divs
that allows to create <div>
in HTML. For example, the
syntax is
::: {#special .sidebar}
Here is a paragraph.
And another. :::
By default, this special syntax will be ignored in LaTeX conversion for PDF output and the content of the block will be used as-is. However, this syntax could be used to create LaTeX environment.
That is why rmarkdown uses a custom Lua filter to
allow any Fenced Divs to be converted as a LaTeX environment
(\begin{env}
… \end{env}
) when desired by
adding a special attribute.
::: {.verbatim data-latex=""}
We show some _verbatim_ text here. :::
Its LaTeX output will be:
\begin{verbatim}
We show some \emph{verbatim} text here.
\end{verbatim}
This feature we call Custom blocks is documented in details in the R Markdown Cookbook
Since pandoc 2.0, it is possible to use Lua filters to add some extra functionality to pandoc document conversion. Adding a pagebreak command in markdown to be compatible with several output documents is one of them. You can find some more informations about Lua filters in pandoc’s documentation and also some examples in a collection of Lua filters for pandoc. These examples, and any other Lua filters, can be use in your Rmarkdown document directly by adding a pandoc argument in yaml header
---
output:
html_document:
pandoc_args: ["--lua-filter=filter.lua"] ---
You can also use a special helper using a !expr
syntax
for yaml in your header
The package rmdfiltr provides a collection of Lua filters and helpers functions to use them.
Before pandoc 2.0, using
filter with pandoc was already available through programs that
modifies the AST. pandoc-citeproc
is an example used to
deal with citations. The package pandocfilter
is useful to create filters using R.
If you want to create a format that uses a special filter, you can add it as a pandoc option if the output format.
Either by modifying an existing format:
<- function(...) {
custom_format <- rmarkdown::html_document(...)
base_format # prepending a new Lua filter to html_document() ones
$pandoc$lua_filters <- c(
base_format::pandoc_path_arg("new.lua"),
rmarkdown$pandoc$lua_filters)
base_format
base_format
}
basename(custom_format()$pandoc$lua_filters)
#> [1] "new.lua" "pagebreak.lua" "latex-div.lua"
Or by creating a new format :
<- function(toc = TRUE, ...) {
custom_format ::output_format(
rmarkdownknitr = rmarkdown::knitr_options(),
# a new filter will be appended to base_format ones
pandoc = rmarkdown::pandoc_options(to = "html", lua_filters = "new.lua"),
base_format = rmarkdown::html_document(toc = toc, ...)
)
}
basename(custom_format()$pandoc$lua_filters)
#> [1] "pagebreak.lua" "latex-div.lua" "new.lua"
As this is an advanced feature, if you want to go further, we encourage you to look at the examples of formats in packages such as rmarkdown, bookdown or pagedown.