charlatan
makes fake data, inspired from and borrowing
some code from Python’s faker
Why would you want to make fake data? Here’s some possible use cases to give you a sense for what you can do with this package:
See the Contributing to charlatan vignette
R6
objects that
a user can initialize and then call methods on. These contain all the
logic that the below interfaces use.ch_*()
that wrap low level interfaces, and are meant to be
easier to use and provide an easy way to make many instances of a
thing.ch_generate()
- generate a data.frame with fake data,
choosing which columns to include from the data types provided in
charlatan
fraudster()
- single interface to all fake data
methods, - returns vectors/lists of data - this function wraps the
ch_*()
functions described aboveStable version from CRAN
install.packages("charlatan")
Development version from Github
devtools::install_github("ropensci/charlatan")
library("charlatan")
… for all fake data operations
x <- fraudster()
x$job()
#> [1] "Exercise physiologist"
x$name()
#> [1] "Mahalie Volkman"
x$job()
#> [1] "Pharmacist, hospital"
x$color_name()
#> [1] "Salmon"
Adding more locales through time, e.g.,
Locale support for job data
ch_job(locale = "en_US", n = 3)
#> [1] "Comptroller" "Applications developer"
#> [3] "Conservator, museum/gallery"
ch_job(locale = "fr_FR", n = 3)
#> [1] "Conseiller en fusion-acquisition" "Technicien automobile"
#> [3] "Économe de flux"
ch_job(locale = "hr_HR", n = 3)
#> [1] "Revident" "Dentalni asistent"
#> [3] "Inženjer medicinske radiologije"
ch_job(locale = "uk_UA", n = 3)
#> [1] "Астроном" "Антрополог" "Ріелтор"
ch_job(locale = "zh_TW", n = 3)
#> [1] "排版人員" "英文翻譯/口譯人員" "染整技術人員"
For colors:
ch_color_name(locale = "en_US", n = 3)
#> [1] "DarkSeaGreen" "Violet" "DarkSlateGray"
ch_color_name(locale = "uk_UA", n = 3)
#> [1] "Дерева" "Темно-лазурний" "Зелений"
More coming soon …
ch_generate()
#> # A tibble: 10 × 3
#> name job phone_number
#> <chr> <chr> <chr>
#> 1 Shona Howell Tax adviser (297)747-155…
#> 2 Missouri Hoppe-Gerlach Dispensing optician 657-442-1745…
#> 3 Miss Exa Lindgren DVM Surveyor, insurance 1-393-703-89…
#> 4 Marilou Hilll Civil Service administrator 03436190752
#> 5 Mr. Hoke Hansen Teacher, adult education 816.530.1379
#> 6 Ms. Violeta Ebert Psychiatric nurse 577-572-9397…
#> 7 Kieth Kiehn IV Teacher, English as a foreign language 030-742-7026…
#> 8 Mikaila Wintheiser-Bruen Museum/gallery conservator (305)030-717…
#> 9 Baylie Armstrong Recruitment consultant 178.968.3856…
#> 10 Gerard Quigley Jr. Airline pilot 765.313.3182
ch_generate('job', 'phone_number', n = 30)
#> # A tibble: 30 × 2
#> job phone_number
#> <chr> <chr>
#> 1 Paramedic 916.806.7738
#> 2 Engineer, electronics 1-558-738-5481x4670
#> 3 Education officer, environmental 1-777-390-2418x927
#> 4 Production designer, theatre/television/film 177.776.3900
#> 5 Marine scientist 06918434812
#> 6 Control and instrumentation engineer 089-980-1061x5075
#> 7 Licensed conveyancer 082.027.2276x7008
#> 8 Operational researcher 134.883.0475x1393
#> 9 Management consultant 844-455-4954x0276
#> 10 Optometrist 264.670.0122x820
#> # … with 20 more rows
#> # ℹ Use `print(n = ...)` to see more rows
ch_name()
#> [1] "Akeem Schaden"
ch_name(10)
#> [1] "Allen Littel" "Dr. Yadira Mosciski" "Ms. Teela Rath"
#> [4] "Griselda Macejkovic" "Loula Moen" "Geri Hermiston"
#> [7] "Latrice Mueller-Purdy" "Bessie Greenfelder" "Brittnay Beahan"
#> [10] "Marely Harvey-Haag"
ch_phone_number()
#> [1] "694.142.2479x8330"
ch_phone_number(10)
#> [1] "729.696.4235x432" "(410)948-5947x078" "558.791.9758"
#> [4] "(019)599-4783x5009" "204-331-9183" "(478)845-1133"
#> [7] "631.713.7642" "1-647-843-4062x1199" "705-007-7353x9342"
#> [10] "(178)325-2557x56382"
ch_job()
#> [1] "Nurse, children's"
ch_job(10)
#> [1] "Commercial art gallery manager"
#> [2] "IT technical support officer"
#> [3] "Adult guidance worker"
#> [4] "Librarian, public"
#> [5] "Diplomatic Services operational officer"
#> [6] "Control and instrumentation engineer"
#> [7] "Engineer, manufacturing"
#> [8] "Chartered certified accountant"
#> [9] "Administrator, Civil Service"
#> [10] "Designer, industrial/product"
ch_credit_card_provider()
#> [1] "Diners Club / Carte Blanche"
ch_credit_card_provider(n = 4)
#> [1] "JCB 16 digit" "Voyager" "American Express" "JCB 16 digit"
ch_credit_card_number()
#> [1] "53179035548247676"
ch_credit_card_number(n = 10)
#> [1] "502059506159116" "3088615926186023845" "3002877866616823"
#> [4] "4383703635318773" "180088446044038768" "869901859495367367"
#> [7] "675928121803937" "3096439026343250946" "4850304535923"
#> [10] "52642312402226446"
ch_credit_card_security_code()
#> [1] "417"
ch_credit_card_security_code(10)
#> [1] "071" "824" "5435" "775" "2961" "976" "2281" "363" "188" "732"
charlatan
makes it very easy to generate fake data with
missing entries. First, you need to run
MissingDataProvider()
and then make an appropriate
make_missing()
call specifying the data type to be
generated. This method picks a random number (N
) of slots
in the input make_missing
vector and then picks
N
random positions that will be replaced with NA matching
the input class.
testVector <- MissingDataProvider$new()
testVector$make_missing(x = ch_generate()$name)
#> [1] NA "Marge Bogan" "Ester Hilll"
#> [4] "Rome Barton" "Mervyn Huels" NA
#> [7] NA "Sonya Pollich" "Mr. Bernard Raynor"
#> [10] "Dr. Aniya Waters"
testVector$make_missing(x = ch_integer(10))
#> [1] NA NA NA NA NA NA NA NA NA NA
set.seed(123)
testVector$make_missing(x = sample(c(TRUE, FALSE), 10, replace = TRUE))
#> [1] TRUE NA NA FALSE TRUE NA FALSE FALSE NA TRUE
Real data is messy, right? charlatan
makes it easy to
create messy data. This is still in the early stages so is not available
across most data types and languages, but we’re working on it.
For example, create messy names:
ch_name(50, messy = TRUE)
#> [1] "Destiney Dicki" "Mrs Freddie Pouros d.d.s."
#> [3] "Jefferey Lesch" "Inga Dach"
#> [5] "Keyshawn Schaefer" "Ferdinand Bergstrom"
#> [7] "Justen Simonis" "Ms. Doloris Stroman md"
#> [9] "Mrs Ermine Heidenreich" "Marion Corwin"
#> [11] "Jalen Grimes" "Mr. Sullivan Hammes IV"
#> [13] "Adrien Vandervort-Dickens" "Dr Sharif Kunde"
#> [15] "Marlena Reichert d.d.s." "Mr. Brandan Oberbrunner"
#> [17] "Lloyd Adams Sr" "Keesha Schowalter"
#> [19] "Randy Ziemann" "Gina Sanford"
#> [21] "Cornell Funk" "Yadiel Collier"
#> [23] "Kamryn Johnson" "Tyesha Schmeler"
#> [25] "Ernie Hegmann-Graham" "Zackery Runolfsdottir"
#> [27] "Cleveland Predovic" "Melvyn Hickle"
#> [29] "Larry Nienow I" "Nicola Langosh Ph.D."
#> [31] "Ebenezer Fadel V" "Andrae Hand-Eichmann"
#> [33] "Shamar Harvey" "Miss Lynn Altenwerth"
#> [35] "Willene McLaughlin-Mohr" "Kyree Kutch"
#> [37] "Ms Delpha Grant" "Ms. Icie Crooks"
#> [39] "Loney Jenkins-Lindgren" "Shania Donnelly DVM"
#> [41] "Dr Patric Veum" "Amirah Rippin DVM"
#> [43] "Randle Hilpert" "Soren Dare"
#> [45] "Roderic Walter" "Farah Daugherty DDS"
#> [47] "Ryland Ledner" "Girtha Harvey DVM"
#> [49] "Tyrique Spencer" "Mr Olan Bernhard"
Right now only suffixes and prefixes for names in en_US
locale are supported. Notice above some variation in prefixes and
suffixes.