Introduction

This vignette explains basic functionalities of the package litRiddle, a part of the Riddle of Literary Quality project.

The package contains the data of a reader survey about fiction in Dutch, a description of the novels the readers rated, and the results of stylistic measurements of the novels. The package also contains functions to combine, analyze, and visualize these data.

See: https://literaryquality.huygens.knaw.nl/ for further details. Information in Dutch about the package can be found at https://karinavdo.github.io/RaadselLiteratuur/02_07_data_en_R_package.html.

If you use litRiddle in your academic publications, please consider citing the following reference:

Karina van Dalen-Oskam (2021). Het raadsel literatuur. Is literaire kwaliteit meetbaar? Amsterdam University Press.

Installation

Install the package from the CRAN repository:

install.packages("litRiddle")

Alternatively, try installing it directly from the current GitHub repository:

library(devtools)
install_github("karinavdo/LitRiddleData", build_vignettes = TRUE)

Usage

First, one has to activate the package so that its functions become visible to the user:

library(litRiddle)

## litRiddle version: 0.4.1

The dataset

To activate the dataset, type one of the following lines (or all of them):

data(books)
data(respondents)
data(reviews)

From now on, the dataset, divided into three data tables, is visible for the user. Plase note that the functions discussed below do not need the dataset to be activated (they take care of it themselves), therefore you don’t have to remember about this step if you plan to analyze the data using the functions from the package.

Time to explore some of the data tables. This generic funcion will list all the data points from the table books:

books

Quite a lot of stuff dumped on the screen, right? It’s usually a better idea to select one portion of information at a time, usually one variable or one observation. We assume here that the user has some basic knowledge about R, and particularly s/he knows how to access values in vectors and tables (matrices). To get the titles of the books scored in the survey (or, say, the first 10 titles), one might type:

books$title[1:10]

##  [1] "Haar naam was Sarah"           "Duel"                         
##  [3] "Het Familieportret"            "De kraai"                     
##  [5] "Mannen die vrouwen haten"      "Heldere hemel"                
##  [7] "Vijftig tinten grijs"          "Gerechtigheid"                
##  [9] "De verrekijker"                "De vrouw die met vuur speelde"

Well, but how do I know that the name of the particular variable I want to get is title, rather than anything else? There exists a function that lists all the variables from the three data tables.

Print column names

The function that creates a list of all the column names from all three datasets is named get.columns() and needs no arguments to be run. What it means is that you simply type the following code, remembering about the parentheses at the end of the function:

get.columns()

## $books
##  [1] "short.title"              "author"                  
##  [3] "title"                    "genre"                   
##  [5] "book.id"                  "riddle.code"             
##  [7] "translated"               "gender.author"           
##  [9] "origin.author"            "original.language"       
## [11] "inclusion.criterion"      "publication.date"        
## [13] "first.print"              "publisher"               
## [15] "english.title"            "word.count"              
## [17] "type.count"               "sentence.length.mean"    
## [19] "sentence.length.variance" "paragraph.count"         
## [21] "sentence.count"           "paragraph.length.mean"   
## [23] "raw.TTR"                  "sampled.TTR"             
## 
## $respondents
##  [1] "respondent.id"   "gender.resp"     "age.resp"        "zipcode"        
##  [5] "education"       "books.per.year"  "typically.reads" "how.literary"   
##  [9] "s.4a1"           "s.4a2"           "s.4a3"           "s.4a4"          
## [13] "s.4a5"           "s.4a6"           "s.4a7"           "s.4a8"          
## [17] "s.12b1"          "s.12b2"          "s.12b3"          "s.12b4"         
## [21] "s.12b5"          "s.12b6"          "s.12b7"          "s.12b8"         
## [25] "remarks.survey"  "date.time"       "week.nr"         "day"            
## 
## $reviews
## [1] "respondent.id"        "book.id"              "quality.read"        
## [4] "literariness.read"    "quality.notread"      "literariness.notread"
## [7] "motivations"          "book.read"

Not bad indeed. However, how can I know what s.4a2 stands for?

Explain variables

Function that lists an short explanation of what the different column names refer to and what their levels consist of is called explain(). To work properly, this function needs an argument to be passed, which basically mean that the user has to specify which dataset s/he is interested in. The options are as follows:

explain("books")

## The 'books' dataset contains information on several details of the 401 
## different books used in the survey.
##         
## Here follows a list with the different column names and an explanation of
## the information they contain:
## 
## 1. short.title        A short name containing the author's name and 
##                       (a part of) the title;
## 2. author             Last name and first name of the author of the book;
## 3. title              Full title of the book;
## 4. genre              Genre of the book. There are four different genres:
##                       a) Fiction; b) Romantic; c) Suspense; d) Other;
## 5. book.id            Unique number to identify each book;
## 6. riddle.code        More complete list of genres of the books. 
##                       Contains 13 categories --- to see which, type
##                       'levels(books$riddle.code' in the terminal;
## 7. translated         'yes' if the book has been translated, 'no' if not;
## 8. gender.author      The gender of the author: Female, Male, Unknown/Multiple;
## 9. origin.author      The country of origin of the author. Note that short
##                       country codes have been used instead of the full
##                       country names;
## 10. original.language The original language of the book. Note that short
##                       language codes have been used, instead of the full
##                       language names;
## 11. inclusion.criterion   In what category a book has been placed, either
##                       a) bestseller; b) boekenweekgeschenk; c) library; or
##                       d) literair juweeltje;
## 12. publication.date  Publication date of the book, using a YYYY-MM-DD format;
## 13. first.print       Year in which the first print appeared;
## 14. publisher         Publishers of the books;
## 15. english.title     Title of the book in English;
## 16. word.count        Word count, or total number of words (tokens) used 
##                       in a book;
## 17. type.count        Total number of unique words (types) used in a book;
## 
## 18. sentence.length.mean   Average sentence lengh in a book (in words);
## 19. sentence.length.variance   Standard deviation of the sentence lenght;
## 20. paragraph.count   Total number of paragraphs in a book;
## 21. sentence.count    Total number of sentences in a book;
## 22. paragraph.length.mean   Average paragraph length in a book (in words); 
## 23. raw.TTR           Lexical diversity, or type-token ratio, which gives an
##                       indication of how diverse the word use in a book is;
## 24. sampled.TTR      Unlike the raw type-token ratio, the sampled TTR is 
##                      significantly more resistant to text size, and thus
##                      it should be preferred over the raw TTR.

explain("reviews")

## The 'reviews' dataset contains four different ratings that were given 
## to 401 different books.
##         
## Here follows a list with the different column names and an explanation of
## what information they contain:
## 
## 1. respondent.id          Unique number for each respondent of the survey;
## 2. book.id                Unique number to identify each book;
## 3. quality.read           Rating on the quality of a book that a respondent
##                           has read. Scale from 1 - 7, with 1 meaning 
##                           'very bad' and 7 meaning 'very good';
## 4. literariness.read      Rating on how literary a respondent found a book
##                           that s/he has read. Scale from 1 - 7, with 1 meaning 
##                           'not literary at all' and 7 meaning 'very literary';
## 5. quality.notread        Rating on the quality of a book that a respondent
##                           has not read. Scale from 1 - 7, with 1 meaning 
##                           'very bad' and 7 meaning 'very good';
## 6. literariness.notread   Rating on how literary a respondent found a book that
##                           s/he has not read. Scale from 1 - 7, with 1 meaning 
##                           'not literary at all' and 7 meaning 'very literary';
## 7. motivations            Written explanations of why a respondent gave a
##                           a certain rating to a certain book.
## 8. book.read              1 or 0: 1 indicates that the respondent read 
##                           the book, 0 indicates the respondent did not 
##                           read the book but had an opinion about 
##                           the literary quality of the book.

explain("respondents")

## The 'respondents' dataset contains information on the people that participated 
## in the survey.
##         
## Here follows a list with the different column names and an explanation of
## what information they contain:
## 
## 1. respondent.id      Unique number for each respondent of the survey;
## 2. gender.resp        Gender of the respondent: Female, Male, NA;
## 3. age.resp           Age of the respondent;
## 4. zipcode            Zipcode of the respondent;
## 5. education          Education level, containing 8 levels (see which
##                       levels by typing 'levels(respondents$education)'
##                       in the terminal);
## 6. books.per.year     Number of books read per year by each respondent;
## 7. typically.reads    Typical genre of books that a respondent reads, 
##                       with three levels a) Fiction; b) Non-fiction;
##                       c) both;
## 8. how.literary       Answer to the question 'How literary a reader do 
##                       you consider yourself to be?', where respondents
##                       could fill in a number from 1 - 7, with 1 meaning
##                       'not literary at all' and 7 meaning 'very literary';
## 9. s.4a1              Answer to the question: 'I like reading novels that 
##                       I can relate to my own life'. Scale from 1 - 5, with 
##                       1 meaning 'completely disagree', and 5 meaning 
##                       'completely agree';
## 10. s.4a2             Answer to the question: 'The story of a novel is what 
##                       matters most to me'. Scale from 1 - 5; 
## 11. s.4a3             Answer to the question: 'The writing style in a book 
##                       is important to me'. Scale from 1 - 5;
## 12. s.4a4             Answer to the question: 'I like searching for deeper 
##                       layers in a novel'. Scale from 1 - 5;
## 13. s.4a5             Answer to the question: 'I like reading literature'. 
##                       Scale from 1 - 5;
## 14. s.4a6             Answer to the question: 'I read novels to discover new 
##                       worlds and unknown time periods'. Scale from 1 - 5;
## 15. s.4a7             Answer to the question: 'I mostly read novels during my 
##                       vacation'. Scale from 1 - 5;
## 16. s.4a8             Answer to the question: 'I usually read several novels at 
##                       the same time'. Scale from 1 - 5;
## 17. s.12b1            Answer to the question: 'I like novels based on real 
##                       events'. Scale from 1 - 5;
## 18. s.12b2            Answer to the question: 'I like thinking about a novel's 
##                       structure'. Scale from 1 - 5;
## 19. s.12b3            Answer to the question: 'The writing style in a novel 
##                       is of more importance to me than its story'. 
##                       Scale from 1 - 5;  
## 20. s.12b4            Answer to the question: 'I like to get carried away by 
##                       a novel'. Scale from 1 - 5;
## 21. s.12b5            Answer to the question: 'I like to pick my books from 
##                       the top 10 list of best sold books'. Scale from 1 - 5;
## 22. s.12b6            Answer to the question: 'I read novels to be challenged 
##                       intellectually'. Scale from 1 - 5;
## 23. s.12b7            Answer to the question: 'I love novels that are easy 
##                       to read'. Scale from 1 - 5;
## 24. s.12b8            Answer to the question: 'In the evening, I prefer 
##                       to read books over watching TV'. Scale from 1 - 5;
## 25. remarks.survey    Any additional remarks that respondents filled in
##                       at the end of the survey;
## 26. date.time         Date and time of the moment a respondent filled in
##                       the survey, format in YYYY-MM-DD HH:MM:SS;
## 27. week.nr           Number of week in which the respondent filled in 
##                       the survey;
## 28. day               Day of the week in which the respondent filled in
##                       the survey.

Combine all data tables

The the package provides a function to combine all information of the survey, reviews, and books into one big dataframe. The user can specify whether or not s/he wants to also load the freqTable with the frequency counts of the word n-grams of the books.

Combine and load all data from the books, respondents and reviews into a new dataframe (tibble format)

dat = combine.all(load.freq.table = FALSE)

## Joining, by = "book.id"

## Joining, by = "respondent.id"

Combine and load all data from the books, respondents and reviews into a new dataframe (tibble format), and additionally also load the frequency table of all word 1grams of the corpus used.

dat = combine.all(load.freq.table = TRUE)

## Joining, by = "book.id"

## Joining, by = "respondent.id"

Find dataset

Return the name of the dataset where a column can be found.

find.dataset("book.id")

## [1] "books"   "reviews"

find.dataset("age.resp")

## [1] "respondents"

It’s useful to combine it with the already-discussed function get.columns().

Make table (and plot it!)

Make a table of frequency counts for one variable, and plot a histogram of the results. Not sure which variable you want to plot? Invoke the above-discussed function get.columns() once more, to see which variables you can choose from:

get.columns()

Now the fun stuff:

make.table(table.of = 'age.resp')

## 
##  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35 
##  63  56  67  66  83 104 126 150 160 156 152 153 142 128 145 143 141 128 126 139 
##  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55 
## 123 139 135 124 147 148 181 178 209 196 208 231 229 258 283 312 331 343 372 384 
##  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75 
## 389 409 419 394 389 389 407 362 382 445 459 309 312 272 222 159 143 130  96 107 
##  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  93  94  98 
##  70  54  62  42  49  18  25  19   8  10   7   5   8   3   4   1   1   1   1

You can also adjust the x label, y label, title, and colors:

make.table(table.of = 'age.resp', xlab = 'age respondent', 
           ylab = 'number of people', 
           title = 'Distribution of respondent age', 
           barcolor = 'red', barfill = 'white')

## 
##  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35 
##  63  56  67  66  83 104 126 150 160 156 152 153 142 128 145 143 141 128 126 139 
##  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55 
## 123 139 135 124 147 148 181 178 209 196 208 231 229 258 283 312 331 343 372 384 
##  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75 
## 389 409 419 394 389 389 407 362 382 445 459 309 312 272 222 159 143 130  96 107 
##  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  93  94  98 
##  70  54  62  42  49  18  25  19   8  10   7   5   8   3   4   1   1   1   1

Note: please mind that in the above examples we used single quotes to indicate arguments (e.g. xlab = 'age respondent'), whereas at the beginning of the document, we used double quotes (explain("books")). We did it for a reason, namely we wanted to emphasize that the functions provided by the package litRiddle are fully compliant with the generic R syntax, which allows for using either single or double quotes to indicate the strings.

Make table of X split by Y

make.table2(table.of = 'age.resp', split = 'gender.resp')

## Joining, by = "book.id"

## Joining, by = "respondent.id"

##         
##             16    17    18    19    20    21    22    23    24    25    26
##   female   704   748   791   735  1238  1889  2536  2507  2879  3205  2701
##   male      95    59   215   100   437   194   212   227   267   357   535
##   NA        12     0     0     0     2    22    33    10    18     7    14
##         
##             27    28    29    30    31    32    33    34    35    36    37
##   female  3265  2826  2871  3472  2961  3621  3136  3519  3445  2963  3073
##   male     405   429   480   517   487   621   362   401   675   380   909
##   NA         0    19    21    48     0    12     1     0    17    15    10
##         
##             38    39    40    41    42    43    44    45    46    47    48
##   female  3618  2519  4296  4020  5024  5253  5855  5859  5557  6392  6630
##   male     602   606   568   619  1111   852  1103  1036   786   709  1750
##   NA         0     0    42     0   119    44    16    41    75     4     5
##         
##             49    50    51    52    53    54    55    56    57    58    59
##   female  8439  9399  9957 10284 10012 11748 13228 12400 13059 12023 12659
##   male    1055  1455  1669  1920  2074  2066  2014  2522  2045  2459  3206
##   NA       101   148    87    34   194    36    56    39    89     1    34
##         
##             60    61    62    63    64    65    66    67    68    69    70
##   female 11663 12296 11626  9625 11363 12173 10903  8509  8469  6240  5049
##   male    3136  3206  2695  2696  2747  3659  4631  2548  2728  3564  2221
##   NA       100   144    76    56    42     0    51     0     0     6   147
##         
##             71    72    73    74    75    76    77    78    79    80    81
##   female  3530  3495  2991  1944  1905  1863   849   912   758   955   342
##   male    1021  1031  1649   812  1129   471   731   618   237   343   113
##   NA         0     0     0     0     0     0     0     0     0     0     0
##         
##             82    83    84    85    86    87    88    89    90    91    93
##   female   440   190   123   183   294    51   115    53    48    27     0
##   male     216   119    32    88   110    34    23     9    32     0    28
##   NA         0     0     0     0     0     0     0     0     0     0     0
##         
##             94    98
##   female     5    16
##   male       0     0
##   NA         0     0

make.table2(table.of = 'literariness.read', split = 'gender.author')

## Joining, by = "book.id"

## Joining, by = "respondent.id"

## Warning: Removed 309688 rows containing non-finite values (stat_count).

##                   
##                        1     2     3     4     5     6     7
##   female            3565  7145 10667  9259 12221 10630  2530
##   male              1591  4532  7679  8553 16334 26154 13090
##   unknown/multiple   206   491   837   817   977   570   101

Note that you can only provide an argument to the ‘split’ variable that has less than 31 unique values, to avoid uninterpretable outputs. E.g., consider the following code:

make.table2(table.of = 'age.resp', split = 'zipcode')

## Joining, by = "book.id"

## Joining, by = "respondent.id"

## The 'split-by' variable has many unique values, which will make the output 
## very hard to process. Please providea 'split-by' variable that contains 
## less unique values.

You can also adjust the x label, y label, title, and colors:

make.table2(table.of = 'age.resp', split = 'gender.resp', 
            xlab = 'age respondent', ylab = 'number of people', 
            barcolor = 'purple', barfill = 'yellow')

## Joining, by = "book.id"

## Joining, by = "respondent.id"

##         
##             16    17    18    19    20    21    22    23    24    25    26
##   female   704   748   791   735  1238  1889  2536  2507  2879  3205  2701
##   male      95    59   215   100   437   194   212   227   267   357   535
##   NA        12     0     0     0     2    22    33    10    18     7    14
##         
##             27    28    29    30    31    32    33    34    35    36    37
##   female  3265  2826  2871  3472  2961  3621  3136  3519  3445  2963  3073
##   male     405   429   480   517   487   621   362   401   675   380   909
##   NA         0    19    21    48     0    12     1     0    17    15    10
##         
##             38    39    40    41    42    43    44    45    46    47    48
##   female  3618  2519  4296  4020  5024  5253  5855  5859  5557  6392  6630
##   male     602   606   568   619  1111   852  1103  1036   786   709  1750
##   NA         0     0    42     0   119    44    16    41    75     4     5
##         
##             49    50    51    52    53    54    55    56    57    58    59
##   female  8439  9399  9957 10284 10012 11748 13228 12400 13059 12023 12659
##   male    1055  1455  1669  1920  2074  2066  2014  2522  2045  2459  3206
##   NA       101   148    87    34   194    36    56    39    89     1    34
##         
##             60    61    62    63    64    65    66    67    68    69    70
##   female 11663 12296 11626  9625 11363 12173 10903  8509  8469  6240  5049
##   male    3136  3206  2695  2696  2747  3659  4631  2548  2728  3564  2221
##   NA       100   144    76    56    42     0    51     0     0     6   147
##         
##             71    72    73    74    75    76    77    78    79    80    81
##   female  3530  3495  2991  1944  1905  1863   849   912   758   955   342
##   male    1021  1031  1649   812  1129   471   731   618   237   343   113
##   NA         0     0     0     0     0     0     0     0     0     0     0
##         
##             82    83    84    85    86    87    88    89    90    91    93
##   female   440   190   123   183   294    51   115    53    48    27     0
##   male     216   119    32    88   110    34    23     9    32     0    28
##   NA         0     0     0     0     0     0     0     0     0     0     0
##         
##             94    98
##   female     5    16
##   male       0     0
##   NA         0     0

make.table2(table.of = 'literariness.read', split = 'gender.author', 
            xlab = 'Overall literariness scores', 
            ylab = 'number of people', barcolor = 'black', 
            barfill = 'darkred')

## Joining, by = "book.id"

## Joining, by = "respondent.id"

## Warning: Removed 309688 rows containing non-finite values (stat_count).

##                   
##                        1     2     3     4     5     6     7
##   female            3565  7145 10667  9259 12221 10630  2530
##   male              1591  4532  7679  8553 16334 26154 13090
##   unknown/multiple   206   491   837   817   977   570   101

Order responses

The orginal survey about Dutch fiction was designed to rank the responses using descriptive terms, e.g. “very bad”, “neutral”, “a bit good” etc. In order to conduct the analyses, the responses were then converted to numerical scales ranging from 1 to 7 (the questions about literariness and literary quality) or from 1 to 5 (the questions about the reviewer’s reading patterns). However, if you want the responses converted back to their original form, invoke the function order.responses() that transforms the survey responses into ordered factors. Use either “bookratings” or “readingbehavior” to specify which of the survey questions needs to be changed into ordered factors. (We assume here that the user knows what the ordered factors are, because otherwise this function will not seem very useful). Levels of quality.read and quality.notread: “very bad”, “bad”, “a bit bad”, “neutral”, “a bit good”, “good”, “very good”, “NA”. Levels literariness.read and literariness.notread: “absolutely not literary”, “non-literary”, “not very literary”, “between literary and non-literary”,“a bit literary”, “literary”, “very literary”, “NA”. Levels statements 4/12: “completely disagree”, “disagree”, “neutral”, “agree”, “completely agree”, “NA”.

To create a data frame with ordered factor levels of the questions on reading behavior:

dat.reviews = order.responses('readingbehavior')
str(dat.reviews)

## tibble [13,541 × 28] (S3: tbl_df/tbl/data.frame)
##  $ respondent.id  : num [1:13541] 0 1 2 3 4 5 6 7 8 9 ...
##  $ gender.resp    : Factor w/ 3 levels "female","male",..: 1 1 1 1 1 2 1 1 2 1 ...
##  $ age.resp       : num [1:13541] 18 24 78 77 71 58 38 51 66 32 ...
##  $ zipcode        : num [1:13541] 4834 5625 2272 2151 NA ...
##  $ education      : Ord.factor w/ 8 levels "none/primary school"<..: 5 7 7 5 5 7 6 6 7 7 ...
##  $ books.per.year : num [1:13541] 20 30 30 12 15 60 25 30 50 2 ...
##  $ typically.reads: Factor w/ 3 levels "both","only fiction",..: 1 1 1 1 1 1 1 2 1 1 ...
##  $ how.literary   : Ord.factor w/ 5 levels "completely disagree"<..: 3 3 3 3 4 2 3 3 1 3 ...
##  $ s.4a1          : Ord.factor w/ 5 levels "completely disagree"<..: 4 4 4 3 4 2 3 2 3 2 ...
##  $ s.4a2          : Ord.factor w/ 5 levels "completely disagree"<..: 4 4 5 4 3 4 5 3 4 5 ...
##  $ s.4a3          : Ord.factor w/ 5 levels "completely disagree"<..: 4 5 4 4 4 5 4 5 4 4 ...
##  $ s.4a4          : Ord.factor w/ 5 levels "completely disagree"<..: 4 5 4 3 4 3 1 4 4 4 ...
##  $ s.4a5          : Ord.factor w/ 5 levels "completely disagree"<..: 5 5 4 3 4 4 3 5 5 4 ...
##  $ s.4a6          : Ord.factor w/ 5 levels "completely disagree"<..: 4 5 4 4 4 4 4 4 3 4 ...
##  $ s.4a7          : Ord.factor w/ 5 levels "completely disagree"<..: 4 3 3 2 2 1 3 2 2 5 ...
##  $ s.4a8          : Ord.factor w/ 5 levels "completely disagree"<..: 4 5 3 4 2 3 1 5 4 1 ...
##  $ s.12b1         : Ord.factor w/ 5 levels "completely disagree"<..: 2 4 4 3 4 2 3 2 3 3 ...
##  $ s.12b2         : Ord.factor w/ 5 levels "completely disagree"<..: 4 1 4 4 3 4 2 3 5 3 ...
##  $ s.12b3         : Ord.factor w/ 5 levels "completely disagree"<..: 3 3 3 3 3 3 2 3 3 3 ...
##  $ s.12b4         : Ord.factor w/ 5 levels "completely disagree"<..: 4 3 4 4 4 4 5 4 4 4 ...
##  $ s.12b5         : Ord.factor w/ 5 levels "completely disagree"<..: 1 2 3 2 3 1 2 2 4 2 ...
##  $ s.12b6         : Ord.factor w/ 5 levels "completely disagree"<..: 4 4 4 3 3 4 2 4 3 2 ...
##  $ s.12b7         : Ord.factor w/ 5 levels "completely disagree"<..: 2 3 4 4 4 2 5 3 2 2 ...
##  $ s.12b8         : Ord.factor w/ 5 levels "completely disagree"<..: 3 4 3 4 3 3 2 4 3 3 ...
##  $ remarks.survey : chr [1:13541] "" "" "" "" ...
##  $ date.time      : POSIXct[1:13541], format: "2013-06-04 11:12:00" "2013-04-10 15:33:00" ...
##  $ week.nr        : num [1:13541] 23 15 15 27 15 29 15 15 15 15 ...
##  $ day            : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tue"<..: 3 4 5 6 4 2 4 4 4 5 ...

To create a data frame with ordered factor levels of the book ratings:

dat.ratings = order.responses('bookratings')
str(dat.ratings)

## tibble [448,055 × 8] (S3: tbl_df/tbl/data.frame)
##  $ respondent.id       : num [1:448055] 0 0 0 0 0 0 0 0 0 0 ...
##  $ book.id             : num [1:448055] 1 9 11 19 30 34 82 116 300 372 ...
##  $ quality.read        : Ord.factor w/ 7 levels "very bad"<"bad"<..: 6 5 7 5 5 7 5 5 6 6 ...
##  $ literariness.read   : Ord.factor w/ 7 levels "absolutely not literary"<..: 5 6 6 6 4 6 3 5 6 6 ...
##  $ quality.notread     : Ord.factor w/ 7 levels "very bad"<"bad"<..: NA NA NA NA NA NA NA NA NA NA ...
##  $ literariness.notread: Ord.factor w/ 7 levels "absolutely not literary"<..: NA NA NA NA NA NA NA NA NA NA ...
##  $ motivations         : chr [1:448055] "" "" "" "" ...
##  $ book.read           : num [1:448055] 1 1 1 1 1 1 1 1 1 1 ...

Likert plots

Next versions of the litRiddle package will support likert plots. Visit https://github.com/jbryer/likert to learn more about the general idea and the implementation in R.

Topic modeling

Next versions of the litRiddle package will support topic modeling of the motivations indicated by the reviewers.

Documentation

Each function provided by the package has its own help page; the same applies to the datasets:

help(books)
help(respondents)
help(reviews)
help(frequencies)
help(combine.all)
help(explain)
help(find.dataset)
help(get.columns)
help(make.table)
help(make.table2)
help(order.responses)
help(litRiddle) # for the general description of the package

Possible issues

All the datasets use the UTF-8 encoding (also known as the Unicode). This should normally not cause any problems on MacOS and Linux machines, but Windows might be more tricky in this respect. We haven’t experienced any inconveniences in our testing environment, but we cannot say the same about all the other machines.

References

Karina van Dalen-Oskam (2021). Het raadsel literatuur. Is literaire kwaliteit meetbaar? Amsterdam University Press.

Karina van Dalen-Oskam (2014). Prehistory of The Riddle. (‘The Riddle of Literary Quality: The search for conventions of literariness’, transl. of: The Riddle of Literary Quality. Op zoek naar conventies van literariteit’ and was published in: Vooys: tijdschrift voor letteren 32(3): 25-33.), https://literaryquality.huygens.knaw.nl/?p=537.

Corina Koolen, Karina van Dalen-Oskam, Andreas van Cranenburgh, Erica Nagelhout (2020). Literary quality in the eye of the Dutch reader: The National Reader Survey. Poetics 79: 101439, https://doi.org/10.1016/j.poetic.2020.101439.

More publications from the project: see https://literaryquality.huygens.knaw.nl/?page_id=588.

Literary Quality of Dutch Novels with litRiddle

Joris van Zundert

Maciej Eder

Karina van Dalen-Oskam

Saskia Lensink