Skip to main content



This plugin pulls metadata from a previously generated file. The file sink can produce such files, and a number of samples are included in the examples/mce_files directory.

CLI based Ingestion

Install the Plugin

The file source works out of the box with acryl-datahub.

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

type: file
# Coordinates
filename: ./path/to/mce/file.json

# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

Field [Required]TypeDescriptionDefaultNotes
path UnionType (See notes for variants)Path to folder or file to ingest. If pointed to a folder, all files with extension {file_extension} (default json) within that folder will be processed. This can also be in the form of a URL containing a single fileOne of string, string(path)
aspectstringSet to an aspect to only read this aspect for ingestion.
count_all_before_startingbooleanWhen enabled, counts total number of records in the file before starting. Used for accurate estimation of completion time. Turn it off if startup time is too high.True
file_extensionstringWhen providing a folder to use to read files, set this field to control file extensions that you want the source to process. * is a special value that means process every file regardless of extension.json
filenamestring[deprecated in favor of path] The file to ingest.

Code Coordinates

  • Class Name: datahub.ingestion.source.file.GenericFileSource
  • Browse on GitHub


If you've got any questions on configuring ingestion for File, feel free to ping us on our Slack.