Title: | Mock Data Generator |
---|---|
Description: | Generate mock data in R using YAML configuration. |
Authors: | Jakub Nowicki [aut, cre] |
Maintainer: | Jakub Nowicki <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.3 |
Built: | 2025-02-28 05:02:29 UTC |
Source: | https://github.com/jakubnowicki/fixtures |
vector of values that follow specified distribution
distribution_vector(size, distribution_type, distribution_arguments = list())
distribution_vector(size, distribution_type, distribution_arguments = list())
size |
integer, size of the output vector |
distribution_type |
character, type of distribution. You can use direct function name, e.g. "rnorm" or a regular name (e.g. "normal", "gaussian"). All standard distributions from stats package are covered. For a list check Distributions |
distribution_arguments |
list of arguments required by the distribution function |
distribution_vector(10, "normal", list(mean = 2, sd = 0.5))
distribution_vector(10, "normal", list(mean = 2, sd = 0.5))
id vector with sequence of integers
id_vector(size, start = 1)
id_vector(size, start = 1)
size |
integer, size of the output vector |
start |
integer, value of the first element |
id_vector(10, 2)
id_vector(10, 2)
Object that stores mock data configurations and generated datasets
new()
Create a new MockDataGenerator object
MockDataGenerator$new(configuration)
configuration
list or path to YAML file with datasets configurations. Check configuration for details. For a sample YAML check examples.
A new MockDataGenerator object
get_data()
Get a dataset (if does not exist, generate it)
MockDataGenerator$get_data(data_name, size = NULL, refresh = FALSE)
data_name
string, data set name to retrieve
size
integer, size of dataset (if provided, will refresh dataset)
refresh
boolean, refresh existing data?
mock dataset
get_all_data()
Get all datasets
MockDataGenerator$get_all_data(refresh = FALSE, sizes = NULL)
refresh
boolean, refresh existing data?
sizes
integer, or vector of integers with data sizes
list with all datasets
clone()
The objects of this class are cloneable with this method.
MockDataGenerator$clone(deep = FALSE)
deep
Whether to make a deep clone.
Generate random boolean
random_boolean()
random_boolean()
random boolean
random_boolean()
random_boolean()
Generate a random data frame from given configuration
random_data_frame(configuration, size)
random_data_frame(configuration, size)
configuration |
list, a configuration of columns with all arguments required by vector generator passed as sublists of sublist "columns". Column can be also generated with custom function. Pass "custom_column" as column type and function (or function name) as custom_column_generator. Column generator has to accept argument size and return a vector of this size. Third option is to pass an expression that involves existing columns. This can be a simple one, or a call of an existing function. |
size |
integer, number of rows to generate. |
data.frame
conf <- list( columns = list( first_column = list( type = "string", length = 3 ), second_column = list( type = "integer", max = 10 ), third_column = list( type = "calculated", formula = "second_column * 2" ) ) ) random_data_frame(conf, size = 10)
conf <- list( columns = list( first_column = list( type = "string", length = 3 ), second_column = list( type = "integer", max = 10 ), third_column = list( type = "calculated", formula = "second_column * 2" ) ) ) random_data_frame(conf, size = 10)
Get random date from an interval
random_date(min_date, max_date, format = NULL)
random_date(min_date, max_date, format = NULL)
min_date |
character or date, beginning of the time interval to sample from |
max_date |
character or date, ending of the time interval to sample from |
format |
character, check |
random_date("2012-12-04", "2020-10-31")
random_date("2012-12-04", "2020-10-31")
Get random date vector from an interval
random_date_vector(size, min_date, max_date, format = NULL, unique = FALSE)
random_date_vector(size, min_date, max_date, format = NULL, unique = FALSE)
size |
integer, vector length |
min_date |
character or date, beginning of the time interval to sample from |
max_date |
character or date, ending of the time interval to sample from |
format |
character, check |
unique |
boolean, should the output be unique? |
random_date_vector(12, "2012-12-04", "2020-10-31")
random_date_vector(12, "2012-12-04", "2020-10-31")
Get random datetime
random_datetime( min_date, max_date, date_format = NULL, min_time = "00:00:00", max_time = "23:59:59", time_resolution = "seconds", tz = "UTC" )
random_datetime( min_date, max_date, date_format = NULL, min_time = "00:00:00", max_time = "23:59:59", time_resolution = "seconds", tz = "UTC" )
min_date |
character or date, beginning of the dates interval to sample from |
max_date |
character or date, ending of the dates interval to sample from |
date_format |
character, check |
min_time |
character, beginning of the time interval to sample from |
max_time |
character, ending of the time interval to sample from |
time_resolution |
character, one of "seconds", "minutes", "hours", time resolution |
tz |
character, time zone to use |
random_datetime("2012-12-04", "2020-10-31", min_time = "7:00:00", max_time = "17:00:00")
random_datetime("2012-12-04", "2020-10-31", min_time = "7:00:00", max_time = "17:00:00")
Get random datetime vector
random_datetime_vector( size, min_date, max_date, date_format = NULL, date_unique = FALSE, min_time = "00:00:00", max_time = "23:59:59", time_resolution = "seconds", time_unique = FALSE, tz = "UTC" )
random_datetime_vector( size, min_date, max_date, date_format = NULL, date_unique = FALSE, min_time = "00:00:00", max_time = "23:59:59", time_resolution = "seconds", time_unique = FALSE, tz = "UTC" )
size |
integer, vector length |
min_date |
character or date, beginning of the dates interval to sample from |
max_date |
character or date, ending of the dates interval to sample from |
date_format |
character, check |
date_unique |
boolean, should the date part of the output be unique? |
min_time |
character, beginning of the time interval to sample from |
max_time |
character, ending of the time interval to sample from |
time_resolution |
character, one of "seconds", "minutes", "hours", time resolution |
time_unique |
boolean, should the time part of the output be unique? |
tz |
character, time zone to use |
random_datetime_vector(12, "2012-12-04", "2020-10-31", min_time = "7:00:00", max_time = "17:00:00")
random_datetime_vector(12, "2012-12-04", "2020-10-31", min_time = "7:00:00", max_time = "17:00:00")
Choose random element from set
random_from_set(set)
random_from_set(set)
set |
vector, set of values to choose from |
a single element from a given set
random_from_set(c("a", "b", "c"))
random_from_set(c("a", "b", "c"))
Generate random integer
random_integer(min = 0, max = 999999)
random_integer(min = 0, max = 999999)
min |
integer, minimum |
max |
integer, maximum |
random integer
random_integer(min = 2, max = 10)
random_integer(min = 2, max = 10)
Generate random numeric
random_numeric(min = 0, max = 999999)
random_numeric(min = 0, max = 999999)
min |
numeric, minimum |
max |
numeric, maximum |
random numeric
random_numeric(min = 1.5, max = 4.45)
random_numeric(min = 1.5, max = 4.45)
Generate random string
random_string( length = NULL, min_length = 1, max_length = 15, pattern = "[A-Za-z0-9]" )
random_string( length = NULL, min_length = 1, max_length = 15, pattern = "[A-Za-z0-9]" )
length |
integer or NULL (default), output string length. If NULL, length will be random |
min_length |
integer, minimum length if length is random. Default: 1. |
max_length |
integer, maximum length if length is random. Default: 15. |
pattern |
string, pattern for string to follow.
Check |
random string
random_string(length = 5)
random_string(length = 5)
Get random time from an interval
random_time( min_time = "00:00:00", max_time = "23:59:59", resolution = "seconds" )
random_time( min_time = "00:00:00", max_time = "23:59:59", resolution = "seconds" )
min_time |
character, beginning of the time interval to sample from |
max_time |
character, ending of the time interval to sample from |
resolution |
character, one of "seconds", "minutes", "hours", time resolution |
random_time("12:23:00", "15:48:32")
random_time("12:23:00", "15:48:32")
Get random time vector from an interval
random_time_vector( size, min_time = "00:00:00", max_time = "23:59:59", resolution = "seconds", unique = FALSE )
random_time_vector( size, min_time = "00:00:00", max_time = "23:59:59", resolution = "seconds", unique = FALSE )
size |
integer, vector length |
min_time |
character, beginning of the time interval to sample from |
max_time |
character, ending of the time interval to sample from |
resolution |
character, one of "seconds", "minutes", "hours", time resolution |
unique |
boolean, should the output be unique? |
random_time_vector(12, "12:23:00", "15:48:32")
random_time_vector(12, "12:23:00", "15:48:32")
Generate a random vector of desired type
random_vector(size, type, custom_generator = NULL, unique = FALSE, ...)
random_vector(size, type, custom_generator = NULL, unique = FALSE, ...)
size |
integer, vector length |
type |
"integer", "string", "boolean", "date", "time", "datetime" or "numeric" type of vector values. If custom generator provided, should be set to "custom". |
custom_generator |
function or string, custom value generator. Can be a function or a string with function name. Default: NULL |
unique |
boolean, should the output contain only unique values. Default: FALSE. |
... |
arguments passed to function responsible for generating values.
Check |
vector of random values of chosen type
random_vector(5, "boolean") random_vector(10, "numeric", min = 1.5, max = 5) random_vector(4, "string", length = 4, pattern = "[ACGT]") random_vector(2, "integer", max = 10) # custom generator custom_generator <- function() sample(c("A", "B"), 1) random_vector(3, type = "custom", custom_generator = custom_generator)
random_vector(5, "boolean") random_vector(10, "numeric", min = 1.5, max = 5) random_vector(4, "string", length = 4, pattern = "[ACGT]") random_vector(2, "integer", max = 10) # custom generator custom_generator <- function() sample(c("A", "B"), 1) random_vector(3, type = "custom", custom_generator = custom_generator)
Generate a vector of a values from a set
set_vector(size, set = NULL, set_type = NULL, set_size = NULL, ...)
set_vector(size, set = NULL, set_type = NULL, set_size = NULL, ...)
size |
integer, vector length |
set |
vector a set of values to pick from; default: NULL |
set_type |
string if set is NULL generate a random set of type ("integer", "string", "boolean", "numeric"); default: NULL |
set_size |
integer, number of elements in random set; default: NULL |
... |
additional arguments for random set generator.
For details check |
When using a random set, be aware, that set has to be unique, thus if arguments passed to generator do not allow this, the function can end up in an infinite loop.
set_vector(10, set = c("a", "b", "c")) set_vector(size = 5, set_type = "string", set_size = 3)
set_vector(10, set = c("a", "b", "c")) set_vector(size = 5, set_type = "string", set_size = 3)
Wrapper that allows generating a special type vectors
special_vector(size, type, configuration)
special_vector(size, type, configuration)
size |
integer, vector length |
type |
type of vector, one of: "id", "distribution" |
configuration |
list of arguments required by vector function |
special_vector(10, "id", list(start = 3))
special_vector(10, "id", list(start = 3))