This function reads in all files from a directory using the chosen import function. Use the 'pattern' argument to specify a set of files, or a single file type. If collapse = TRUE bind is used to match column names and bind the imported data into a single object. All ingest functions use the source file name as an identifying column to track provenance and relate data and metadata read from files. Please check that you have unique file names.

ingest_directory(
  directory = getwd(),
  ingest.function = utils::read.csv,
  pattern = "*",
  collapse = TRUE,
  recursive = FALSE,
  check.duplicates = "warn",
  ...
)

Arguments

directory

A character vector with the name of the directory that contains your data files. Defaults to the working directory.

ingest.function

The function to use to read in the files, defaults to read.table but can take any ingestr or standard import function.

pattern

A character vector providing the pattern to match filenames as in list.files. Defaults to all files "*".

collapse

A logical argument, when true a single object is returned, when false an object is returned for each file. Defaults to TRUE.

recursive

A logical argument, when true files are read recursively, defaults to TRUE. See list.files for more information..

check.duplicates

A character argument specifying the action that should be taken if files with duplicate contents are detected. One of "warn", "remove", or NULL to disable checking. Defaults to "warn".

...

Additional arguments to pass to the input method

Value

When collapse = T a single object matching the output class of fun is returned. When collapse = F a single object is returned matching the output class of fun in the parent environment of the function. The names of the input sources are used as object names.

Details

If check.duplicates = "remove" then only a single set of records will be retained when files have identical contents. This does not provide rowwise checking for duplicates. A separate data.frame is created specifying the removed input_source, the number of records removed, and the reason for removal.