The inpdfr package allows analysing and comparing PDF and/or TXT documents using both classical text mining tools and those from theoretical ecolgy. In the later, words are considered as species and documents as communities, therefore allowing analysis at the community and metacommunity levels.
Gather some PDF and/or TXT files in a folder. Pointing the working directory to this folder, inpdfr package will extract the text and produce a word occurrence data.frame which will be used to analyse and compare documents. An easy way to start is to use the RGtk2 GUI through the loadGUI
function (only available on the gitHub version, not on CRAN).
The package uses XPDF (http://www.foolabs.com/xpdf/download.html) for PDF to text extraction. You need to install XPDF before using inpdfr
package. Depending on your operating system, you may need to restart your computer after installing XPDF. If you do not want to use XPDF, you can extract the content of your PDF files with the method of your choice and then store the content in TXT files. The only function making use of XPDF is getPDF
which can be substituted with the getTXT
function. install.packages(“inpdfr”)
The inpdfr package provides three cathegories of functions: - functions to extract and process text into a word-occurrence data.frame, - functions to analyse the word-occurrence data.frame with standard and ecological tools, and - functions to use inpdfr through a GTk2 Graphical User Interface. Further instructions and a complete example are provided in vignette.