This vignette explains basic functionalities of the package litRiddle
, a part of the Riddle of Literary Quality project.
The package contains the data of a reader survey about fiction in Dutch, a description of the novels the readers rated, and the results of stylistic measurements of the novels. The package also contains functions to combine, analyze, and visualize these data.
See: https://literaryquality.huygens.knaw.nl/ for further details. Information in Dutch about the package can be found at https://karinavdo.github.io/RaadselLiteratuur/02_07_data_en_R_package.html.
If you use litRiddle
in your academic publications, please consider citing the following reference:
Karina van Dalen-Oskam (2021). Het raadsel literatuur. Is literaire kwaliteit meetbaar? Amsterdam University Press.
Install the package from the CRAN repository:
install.packages("litRiddle")
Alternatively, try installing it directly from the current GitHub repository:
library(devtools)
install_github("karinavdo/LitRiddleData", build_vignettes = TRUE)
First, one has to activate the package so that its functions become visible to the user:
library(litRiddle)
## litRiddle version: 0.4.1
To activate the dataset, type one of the following lines (or all of them):
data(books)
data(respondents)
data(reviews)
From now on, the dataset, divided into three data tables, is visible for the user. Plase note that the functions discussed below do not need the dataset to be activated (they take care of it themselves), therefore you don’t have to remember about this step if you plan to analyze the data using the functions from the package.
Time to explore some of the data tables. This generic funcion will list all the data points from the table books
:
books
Quite a lot of stuff dumped on the screen, right? It’s usually a better idea to select one portion of information at a time, usually one variable or one observation. We assume here that the user has some basic knowledge about R, and particularly s/he knows how to access values in vectors and tables (matrices). To get the titles of the books scored in the survey (or, say, the first 10 titles), one might type:
$title[1:10] books
## [1] "Haar naam was Sarah" "Duel"
## [3] "Het Familieportret" "De kraai"
## [5] "Mannen die vrouwen haten" "Heldere hemel"
## [7] "Vijftig tinten grijs" "Gerechtigheid"
## [9] "De verrekijker" "De vrouw die met vuur speelde"
Well, but how do I know that the name of the particular variable I want to get is title
, rather than anything else? There exists a function that lists all the variables from the three data tables.
The function that creates a list of all the column names from all three datasets is named get.columns()
and needs no arguments to be run. What it means is that you simply type the following code, remembering about the parentheses at the end of the function:
get.columns()
## $books
## [1] "short.title" "author"
## [3] "title" "genre"
## [5] "book.id" "riddle.code"
## [7] "translated" "gender.author"
## [9] "origin.author" "original.language"
## [11] "inclusion.criterion" "publication.date"
## [13] "first.print" "publisher"
## [15] "english.title" "word.count"
## [17] "type.count" "sentence.length.mean"
## [19] "sentence.length.variance" "paragraph.count"
## [21] "sentence.count" "paragraph.length.mean"
## [23] "raw.TTR" "sampled.TTR"
##
## $respondents
## [1] "respondent.id" "gender.resp" "age.resp" "zipcode"
## [5] "education" "books.per.year" "typically.reads" "how.literary"
## [9] "s.4a1" "s.4a2" "s.4a3" "s.4a4"
## [13] "s.4a5" "s.4a6" "s.4a7" "s.4a8"
## [17] "s.12b1" "s.12b2" "s.12b3" "s.12b4"
## [21] "s.12b5" "s.12b6" "s.12b7" "s.12b8"
## [25] "remarks.survey" "date.time" "week.nr" "day"
##
## $reviews
## [1] "respondent.id" "book.id" "quality.read"
## [4] "literariness.read" "quality.notread" "literariness.notread"
## [7] "motivations" "book.read"
Not bad indeed. However, how can I know what s.4a2
stands for?
Function that lists an short explanation of what the different column names refer to and what their levels consist of is called explain()
. To work properly, this function needs an argument to be passed, which basically mean that the user has to specify which dataset s/he is interested in. The options are as follows:
explain("books")
## The 'books' dataset contains information on several details of the 401
## different books used in the survey.
##
## Here follows a list with the different column names and an explanation of
## the information they contain:
##
## 1. short.title A short name containing the author's name and
## (a part of) the title;
## 2. author Last name and first name of the author of the book;
## 3. title Full title of the book;
## 4. genre Genre of the book. There are four different genres:
## a) Fiction; b) Romantic; c) Suspense; d) Other;
## 5. book.id Unique number to identify each book;
## 6. riddle.code More complete list of genres of the books.
## Contains 13 categories --- to see which, type
## 'levels(books$riddle.code' in the terminal;
## 7. translated 'yes' if the book has been translated, 'no' if not;
## 8. gender.author The gender of the author: Female, Male, Unknown/Multiple;
## 9. origin.author The country of origin of the author. Note that short
## country codes have been used instead of the full
## country names;
## 10. original.language The original language of the book. Note that short
## language codes have been used, instead of the full
## language names;
## 11. inclusion.criterion In what category a book has been placed, either
## a) bestseller; b) boekenweekgeschenk; c) library; or
## d) literair juweeltje;
## 12. publication.date Publication date of the book, using a YYYY-MM-DD format;
## 13. first.print Year in which the first print appeared;
## 14. publisher Publishers of the books;
## 15. english.title Title of the book in English;
## 16. word.count Word count, or total number of words (tokens) used
## in a book;
## 17. type.count Total number of unique words (types) used in a book;
##
## 18. sentence.length.mean Average sentence lengh in a book (in words);
## 19. sentence.length.variance Standard deviation of the sentence lenght;
## 20. paragraph.count Total number of paragraphs in a book;
## 21. sentence.count Total number of sentences in a book;
## 22. paragraph.length.mean Average paragraph length in a book (in words);
## 23. raw.TTR Lexical diversity, or type-token ratio, which gives an
## indication of how diverse the word use in a book is;
## 24. sampled.TTR Unlike the raw type-token ratio, the sampled TTR is
## significantly more resistant to text size, and thus
## it should be preferred over the raw TTR.
explain("reviews")
## The 'reviews' dataset contains four different ratings that were given
## to 401 different books.
##
## Here follows a list with the different column names and an explanation of
## what information they contain:
##
## 1. respondent.id Unique number for each respondent of the survey;
## 2. book.id Unique number to identify each book;
## 3. quality.read Rating on the quality of a book that a respondent
## has read. Scale from 1 - 7, with 1 meaning
## 'very bad' and 7 meaning 'very good';
## 4. literariness.read Rating on how literary a respondent found a book
## that s/he has read. Scale from 1 - 7, with 1 meaning
## 'not literary at all' and 7 meaning 'very literary';
## 5. quality.notread Rating on the quality of a book that a respondent
## has not read. Scale from 1 - 7, with 1 meaning
## 'very bad' and 7 meaning 'very good';
## 6. literariness.notread Rating on how literary a respondent found a book that
## s/he has not read. Scale from 1 - 7, with 1 meaning
## 'not literary at all' and 7 meaning 'very literary';
## 7. motivations Written explanations of why a respondent gave a
## a certain rating to a certain book.
## 8. book.read 1 or 0: 1 indicates that the respondent read
## the book, 0 indicates the respondent did not
## read the book but had an opinion about
## the literary quality of the book.
explain("respondents")
## The 'respondents' dataset contains information on the people that participated
## in the survey.
##
## Here follows a list with the different column names and an explanation of
## what information they contain:
##
## 1. respondent.id Unique number for each respondent of the survey;
## 2. gender.resp Gender of the respondent: Female, Male, NA;
## 3. age.resp Age of the respondent;
## 4. zipcode Zipcode of the respondent;
## 5. education Education level, containing 8 levels (see which
## levels by typing 'levels(respondents$education)'
## in the terminal);
## 6. books.per.year Number of books read per year by each respondent;
## 7. typically.reads Typical genre of books that a respondent reads,
## with three levels a) Fiction; b) Non-fiction;
## c) both;
## 8. how.literary Answer to the question 'How literary a reader do
## you consider yourself to be?', where respondents
## could fill in a number from 1 - 7, with 1 meaning
## 'not literary at all' and 7 meaning 'very literary';
## 9. s.4a1 Answer to the question: 'I like reading novels that
## I can relate to my own life'. Scale from 1 - 5, with
## 1 meaning 'completely disagree', and 5 meaning
## 'completely agree';
## 10. s.4a2 Answer to the question: 'The story of a novel is what
## matters most to me'. Scale from 1 - 5;
## 11. s.4a3 Answer to the question: 'The writing style in a book
## is important to me'. Scale from 1 - 5;
## 12. s.4a4 Answer to the question: 'I like searching for deeper
## layers in a novel'. Scale from 1 - 5;
## 13. s.4a5 Answer to the question: 'I like reading literature'.
## Scale from 1 - 5;
## 14. s.4a6 Answer to the question: 'I read novels to discover new
## worlds and unknown time periods'. Scale from 1 - 5;
## 15. s.4a7 Answer to the question: 'I mostly read novels during my
## vacation'. Scale from 1 - 5;
## 16. s.4a8 Answer to the question: 'I usually read several novels at
## the same time'. Scale from 1 - 5;
## 17. s.12b1 Answer to the question: 'I like novels based on real
## events'. Scale from 1 - 5;
## 18. s.12b2 Answer to the question: 'I like thinking about a novel's
## structure'. Scale from 1 - 5;
## 19. s.12b3 Answer to the question: 'The writing style in a novel
## is of more importance to me than its story'.
## Scale from 1 - 5;
## 20. s.12b4 Answer to the question: 'I like to get carried away by
## a novel'. Scale from 1 - 5;
## 21. s.12b5 Answer to the question: 'I like to pick my books from
## the top 10 list of best sold books'. Scale from 1 - 5;
## 22. s.12b6 Answer to the question: 'I read novels to be challenged
## intellectually'. Scale from 1 - 5;
## 23. s.12b7 Answer to the question: 'I love novels that are easy
## to read'. Scale from 1 - 5;
## 24. s.12b8 Answer to the question: 'In the evening, I prefer
## to read books over watching TV'. Scale from 1 - 5;
## 25. remarks.survey Any additional remarks that respondents filled in
## at the end of the survey;
## 26. date.time Date and time of the moment a respondent filled in
## the survey, format in YYYY-MM-DD HH:MM:SS;
## 27. week.nr Number of week in which the respondent filled in
## the survey;
## 28. day Day of the week in which the respondent filled in
## the survey.
The the package provides a function to combine all information of the survey, reviews, and books into one big dataframe. The user can specify whether or not s/he wants to also load the freqTable with the frequency counts of the word n-grams of the books.
Combine and load all data from the books, respondents and reviews into a new dataframe (tibble format)
= combine.all(load.freq.table = FALSE) dat
## Joining, by = "book.id"
## Joining, by = "respondent.id"
Combine and load all data from the books, respondents and reviews into a new dataframe (tibble format), and additionally also load the frequency table of all word 1grams of the corpus used.
= combine.all(load.freq.table = TRUE) dat
## Joining, by = "book.id"
## Joining, by = "respondent.id"
Return the name of the dataset where a column can be found.
find.dataset("book.id")
## [1] "books" "reviews"
find.dataset("age.resp")
## [1] "respondents"
It’s useful to combine it with the already-discussed function get.columns()
.
Make a table of frequency counts for one variable, and plot a histogram of the results. Not sure which variable you want to plot? Invoke the above-discussed function get.columns()
once more, to see which variables you can choose from:
get.columns()
Now the fun stuff:
make.table(table.of = 'age.resp')
##
## 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
## 63 56 67 66 83 104 126 150 160 156 152 153 142 128 145 143 141 128 126 139
## 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
## 123 139 135 124 147 148 181 178 209 196 208 231 229 258 283 312 331 343 372 384
## 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
## 389 409 419 394 389 389 407 362 382 445 459 309 312 272 222 159 143 130 96 107
## 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 93 94 98
## 70 54 62 42 49 18 25 19 8 10 7 5 8 3 4 1 1 1 1
You can also adjust the x label, y label, title, and colors:
make.table(table.of = 'age.resp', xlab = 'age respondent',
ylab = 'number of people',
title = 'Distribution of respondent age',
barcolor = 'red', barfill = 'white')
##
## 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
## 63 56 67 66 83 104 126 150 160 156 152 153 142 128 145 143 141 128 126 139
## 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
## 123 139 135 124 147 148 181 178 209 196 208 231 229 258 283 312 331 343 372 384
## 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
## 389 409 419 394 389 389 407 362 382 445 459 309 312 272 222 159 143 130 96 107
## 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 93 94 98
## 70 54 62 42 49 18 25 19 8 10 7 5 8 3 4 1 1 1 1
Note: please mind that in the above examples we used single quotes to indicate arguments (e.g. xlab = 'age respondent'
), whereas at the beginning of the document, we used double quotes (explain("books")
). We did it for a reason, namely we wanted to emphasize that the functions provided by the package litRiddle
are fully compliant with the generic R syntax, which allows for using either single or double quotes to indicate the strings.
make.table2(table.of = 'age.resp', split = 'gender.resp')
## Joining, by = "book.id"
## Joining, by = "respondent.id"
##
## 16 17 18 19 20 21 22 23 24 25 26
## female 704 748 791 735 1238 1889 2536 2507 2879 3205 2701
## male 95 59 215 100 437 194 212 227 267 357 535
## NA 12 0 0 0 2 22 33 10 18 7 14
##
## 27 28 29 30 31 32 33 34 35 36 37
## female 3265 2826 2871 3472 2961 3621 3136 3519 3445 2963 3073
## male 405 429 480 517 487 621 362 401 675 380 909
## NA 0 19 21 48 0 12 1 0 17 15 10
##
## 38 39 40 41 42 43 44 45 46 47 48
## female 3618 2519 4296 4020 5024 5253 5855 5859 5557 6392 6630
## male 602 606 568 619 1111 852 1103 1036 786 709 1750
## NA 0 0 42 0 119 44 16 41 75 4 5
##
## 49 50 51 52 53 54 55 56 57 58 59
## female 8439 9399 9957 10284 10012 11748 13228 12400 13059 12023 12659
## male 1055 1455 1669 1920 2074 2066 2014 2522 2045 2459 3206
## NA 101 148 87 34 194 36 56 39 89 1 34
##
## 60 61 62 63 64 65 66 67 68 69 70
## female 11663 12296 11626 9625 11363 12173 10903 8509 8469 6240 5049
## male 3136 3206 2695 2696 2747 3659 4631 2548 2728 3564 2221
## NA 100 144 76 56 42 0 51 0 0 6 147
##
## 71 72 73 74 75 76 77 78 79 80 81
## female 3530 3495 2991 1944 1905 1863 849 912 758 955 342
## male 1021 1031 1649 812 1129 471 731 618 237 343 113
## NA 0 0 0 0 0 0 0 0 0 0 0
##
## 82 83 84 85 86 87 88 89 90 91 93
## female 440 190 123 183 294 51 115 53 48 27 0
## male 216 119 32 88 110 34 23 9 32 0 28
## NA 0 0 0 0 0 0 0 0 0 0 0
##
## 94 98
## female 5 16
## male 0 0
## NA 0 0
make.table2(table.of = 'literariness.read', split = 'gender.author')
## Joining, by = "book.id"
## Joining, by = "respondent.id"
## Warning: Removed 309688 rows containing non-finite values (stat_count).
##
## 1 2 3 4 5 6 7
## female 3565 7145 10667 9259 12221 10630 2530
## male 1591 4532 7679 8553 16334 26154 13090
## unknown/multiple 206 491 837 817 977 570 101
Note that you can only provide an argument to the ‘split’ variable that has less than 31 unique values, to avoid uninterpretable outputs. E.g., consider the following code:
make.table2(table.of = 'age.resp', split = 'zipcode')
## Joining, by = "book.id"
## Joining, by = "respondent.id"
## The 'split-by' variable has many unique values, which will make the output
## very hard to process. Please providea 'split-by' variable that contains
## less unique values.
You can also adjust the x label, y label, title, and colors:
make.table2(table.of = 'age.resp', split = 'gender.resp',
xlab = 'age respondent', ylab = 'number of people',
barcolor = 'purple', barfill = 'yellow')
## Joining, by = "book.id"
## Joining, by = "respondent.id"
##
## 16 17 18 19 20 21 22 23 24 25 26
## female 704 748 791 735 1238 1889 2536 2507 2879 3205 2701
## male 95 59 215 100 437 194 212 227 267 357 535
## NA 12 0 0 0 2 22 33 10 18 7 14
##
## 27 28 29 30 31 32 33 34 35 36 37
## female 3265 2826 2871 3472 2961 3621 3136 3519 3445 2963 3073
## male 405 429 480 517 487 621 362 401 675 380 909
## NA 0 19 21 48 0 12 1 0 17 15 10
##
## 38 39 40 41 42 43 44 45 46 47 48
## female 3618 2519 4296 4020 5024 5253 5855 5859 5557 6392 6630
## male 602 606 568 619 1111 852 1103 1036 786 709 1750
## NA 0 0 42 0 119 44 16 41 75 4 5
##
## 49 50 51 52 53 54 55 56 57 58 59
## female 8439 9399 9957 10284 10012 11748 13228 12400 13059 12023 12659
## male 1055 1455 1669 1920 2074 2066 2014 2522 2045 2459 3206
## NA 101 148 87 34 194 36 56 39 89 1 34
##
## 60 61 62 63 64 65 66 67 68 69 70
## female 11663 12296 11626 9625 11363 12173 10903 8509 8469 6240 5049
## male 3136 3206 2695 2696 2747 3659 4631 2548 2728 3564 2221
## NA 100 144 76 56 42 0 51 0 0 6 147
##
## 71 72 73 74 75 76 77 78 79 80 81
## female 3530 3495 2991 1944 1905 1863 849 912 758 955 342
## male 1021 1031 1649 812 1129 471 731 618 237 343 113
## NA 0 0 0 0 0 0 0 0 0 0 0
##
## 82 83 84 85 86 87 88 89 90 91 93
## female 440 190 123 183 294 51 115 53 48 27 0
## male 216 119 32 88 110 34 23 9 32 0 28
## NA 0 0 0 0 0 0 0 0 0 0 0
##
## 94 98
## female 5 16
## male 0 0
## NA 0 0
make.table2(table.of = 'literariness.read', split = 'gender.author',
xlab = 'Overall literariness scores',
ylab = 'number of people', barcolor = 'black',
barfill = 'darkred')
## Joining, by = "book.id"
## Joining, by = "respondent.id"
## Warning: Removed 309688 rows containing non-finite values (stat_count).
##
## 1 2 3 4 5 6 7
## female 3565 7145 10667 9259 12221 10630 2530
## male 1591 4532 7679 8553 16334 26154 13090
## unknown/multiple 206 491 837 817 977 570 101
The orginal survey about Dutch fiction was designed to rank the responses using descriptive terms, e.g. “very bad”, “neutral”, “a bit good” etc. In order to conduct the analyses, the responses were then converted to numerical scales ranging from 1 to 7 (the questions about literariness and literary quality) or from 1 to 5 (the questions about the reviewer’s reading patterns). However, if you want the responses converted back to their original form, invoke the function order.responses()
that transforms the survey responses into ordered factors. Use either “bookratings” or “readingbehavior” to specify which of the survey questions needs to be changed into ordered factors. (We assume here that the user knows what the ordered factors are, because otherwise this function will not seem very useful). Levels of quality.read
and quality.notread
: “very bad”, “bad”, “a bit bad”, “neutral”, “a bit good”, “good”, “very good”, “NA”. Levels literariness.read
and literariness.notread
: “absolutely not literary”, “non-literary”, “not very literary”, “between literary and non-literary”,“a bit literary”, “literary”, “very literary”, “NA”. Levels statements 4/12: “completely disagree”, “disagree”, “neutral”, “agree”, “completely agree”, “NA”.
To create a data frame with ordered factor levels of the questions on reading behavior:
= order.responses('readingbehavior')
dat.reviews str(dat.reviews)
## tibble [13,541 × 28] (S3: tbl_df/tbl/data.frame)
## $ respondent.id : num [1:13541] 0 1 2 3 4 5 6 7 8 9 ...
## $ gender.resp : Factor w/ 3 levels "female","male",..: 1 1 1 1 1 2 1 1 2 1 ...
## $ age.resp : num [1:13541] 18 24 78 77 71 58 38 51 66 32 ...
## $ zipcode : num [1:13541] 4834 5625 2272 2151 NA ...
## $ education : Ord.factor w/ 8 levels "none/primary school"<..: 5 7 7 5 5 7 6 6 7 7 ...
## $ books.per.year : num [1:13541] 20 30 30 12 15 60 25 30 50 2 ...
## $ typically.reads: Factor w/ 3 levels "both","only fiction",..: 1 1 1 1 1 1 1 2 1 1 ...
## $ how.literary : Ord.factor w/ 5 levels "completely disagree"<..: 3 3 3 3 4 2 3 3 1 3 ...
## $ s.4a1 : Ord.factor w/ 5 levels "completely disagree"<..: 4 4 4 3 4 2 3 2 3 2 ...
## $ s.4a2 : Ord.factor w/ 5 levels "completely disagree"<..: 4 4 5 4 3 4 5 3 4 5 ...
## $ s.4a3 : Ord.factor w/ 5 levels "completely disagree"<..: 4 5 4 4 4 5 4 5 4 4 ...
## $ s.4a4 : Ord.factor w/ 5 levels "completely disagree"<..: 4 5 4 3 4 3 1 4 4 4 ...
## $ s.4a5 : Ord.factor w/ 5 levels "completely disagree"<..: 5 5 4 3 4 4 3 5 5 4 ...
## $ s.4a6 : Ord.factor w/ 5 levels "completely disagree"<..: 4 5 4 4 4 4 4 4 3 4 ...
## $ s.4a7 : Ord.factor w/ 5 levels "completely disagree"<..: 4 3 3 2 2 1 3 2 2 5 ...
## $ s.4a8 : Ord.factor w/ 5 levels "completely disagree"<..: 4 5 3 4 2 3 1 5 4 1 ...
## $ s.12b1 : Ord.factor w/ 5 levels "completely disagree"<..: 2 4 4 3 4 2 3 2 3 3 ...
## $ s.12b2 : Ord.factor w/ 5 levels "completely disagree"<..: 4 1 4 4 3 4 2 3 5 3 ...
## $ s.12b3 : Ord.factor w/ 5 levels "completely disagree"<..: 3 3 3 3 3 3 2 3 3 3 ...
## $ s.12b4 : Ord.factor w/ 5 levels "completely disagree"<..: 4 3 4 4 4 4 5 4 4 4 ...
## $ s.12b5 : Ord.factor w/ 5 levels "completely disagree"<..: 1 2 3 2 3 1 2 2 4 2 ...
## $ s.12b6 : Ord.factor w/ 5 levels "completely disagree"<..: 4 4 4 3 3 4 2 4 3 2 ...
## $ s.12b7 : Ord.factor w/ 5 levels "completely disagree"<..: 2 3 4 4 4 2 5 3 2 2 ...
## $ s.12b8 : Ord.factor w/ 5 levels "completely disagree"<..: 3 4 3 4 3 3 2 4 3 3 ...
## $ remarks.survey : chr [1:13541] "" "" "" "" ...
## $ date.time : POSIXct[1:13541], format: "2013-06-04 11:12:00" "2013-04-10 15:33:00" ...
## $ week.nr : num [1:13541] 23 15 15 27 15 29 15 15 15 15 ...
## $ day : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tue"<..: 3 4 5 6 4 2 4 4 4 5 ...
To create a data frame with ordered factor levels of the book ratings:
= order.responses('bookratings')
dat.ratings str(dat.ratings)
## tibble [448,055 × 8] (S3: tbl_df/tbl/data.frame)
## $ respondent.id : num [1:448055] 0 0 0 0 0 0 0 0 0 0 ...
## $ book.id : num [1:448055] 1 9 11 19 30 34 82 116 300 372 ...
## $ quality.read : Ord.factor w/ 7 levels "very bad"<"bad"<..: 6 5 7 5 5 7 5 5 6 6 ...
## $ literariness.read : Ord.factor w/ 7 levels "absolutely not literary"<..: 5 6 6 6 4 6 3 5 6 6 ...
## $ quality.notread : Ord.factor w/ 7 levels "very bad"<"bad"<..: NA NA NA NA NA NA NA NA NA NA ...
## $ literariness.notread: Ord.factor w/ 7 levels "absolutely not literary"<..: NA NA NA NA NA NA NA NA NA NA ...
## $ motivations : chr [1:448055] "" "" "" "" ...
## $ book.read : num [1:448055] 1 1 1 1 1 1 1 1 1 1 ...
Next versions of the litRiddle
package will support likert plots. Visit https://github.com/jbryer/likert to learn more about the general idea and the implementation in R.
Next versions of the litRiddle
package will support topic modeling of the motivations indicated by the reviewers.
Each function provided by the package has its own help page; the same applies to the datasets:
help(books)
help(respondents)
help(reviews)
help(frequencies)
help(combine.all)
help(explain)
help(find.dataset)
help(get.columns)
help(make.table)
help(make.table2)
help(order.responses)
help(litRiddle) # for the general description of the package
All the datasets use the UTF-8 encoding (also known as the Unicode). This should normally not cause any problems on MacOS and Linux machines, but Windows might be more tricky in this respect. We haven’t experienced any inconveniences in our testing environment, but we cannot say the same about all the other machines.
Karina van Dalen-Oskam (2021). Het raadsel literatuur. Is literaire kwaliteit meetbaar? Amsterdam University Press.
Karina van Dalen-Oskam (2014). Prehistory of The Riddle. (‘The Riddle of Literary Quality: The search for conventions of literariness’, transl. of: The Riddle of Literary Quality. Op zoek naar conventies van literariteit’ and was published in: Vooys: tijdschrift voor letteren 32(3): 25-33.), https://literaryquality.huygens.knaw.nl/?p=537.
Corina Koolen, Karina van Dalen-Oskam, Andreas van Cranenburgh, Erica Nagelhout (2020). Literary quality in the eye of the Dutch reader: The National Reader Survey. Poetics 79: 101439, https://doi.org/10.1016/j.poetic.2020.101439.
More publications from the project: see https://literaryquality.huygens.knaw.nl/?page_id=588.