Data configuration file (config.json) reference
Here is the general spec for the config.json file.
{  
  "includeInputSubdirs": true | false,
  "inputFiles": {  
    "CSV_FILE_EXPRESSION1": {  
      "format": "variablePerColumn" | "variablePerRow",
      "provenance": "NAME",
    
      # For implicit schema only
      "importType": "variables" | "entities",
      "ignoreColumns": ["COLUMN_HEADING1", "COLUMN_HEADING2", ...],
      # Variables only
      "entityType": "ENTITY_TYPE_DCID",
      # For implicit schema only, custom entities only
      "rowEntityType": "ENTITY_TYPE_DCID",
      "idColumn": "COLUMN_HEADING",
      "entityColumns": ["COLUMN_HEADING_DCID1", "COLUMN_HEADING_DCID2", ...],
      # For explicit schema only
      "columnMappings": {
        "variable": "NAME",
        "entity": "NAME",
        "date": "NAME",
        "value": "NAME",
        "unit": "NAME",
        "scalingFactor": "NAME",
        "measurementMethod": "NAME",
        "observationPeriod": "NAME"
      }
      # For implicit schema only
      "observationProperties" {
        "unit": "MEASUREMENT_UNIT",
        "observationPeriod": "OBSERVATION_PERIOD",
        "scalingFactor": "DENOMINATOR_VALUE",
        "measurementMethod": "METHOD"
      }
    "CSV_FILE_EXPRESSION2": {
      ...
    }
  },
   ...  
   # For implicit schema only, custom entities only
   "entities": {
    "ENTITY_TYPE_DCID: {
      "name": "ENTITY_TYPE_NAME",
      "description: "ENTITY_TYPE_DESCRIPTION"
    }
    ...
  },
   
   # For implicit schema only
  "variables": { 
    "VARIABLE1": {
      "group": "GROUP_NAME1"},  
      "name": "DISPLAY_NAME",  
      "description": "DESCRIPTION",  
      "searchDescriptions": ["SENTENCE1", "SENTENCE2", ...],
      "properties": {  
        "PROPERTY_NAME1":"VALUE",  
        "PROPERTY_NAME2":"VALUE",  
         …  
      },  
    }, 
    "VARIABLE2": {"group": "GROUP_NAME1", ...},  
    "VARIABLE3": {"group": "GROUP_NAME2", ...},  
     ... 
    },  
  },   
  # For explicit schema only
  "groupStatVarsByProperty": false | true,
  "sources": {  
    "SOURCE_NAME1": {  
      "url": "URL",  
      "provenances": {  
        "PROVENANCE_NAME1": "URL",  
        "PROVENANCE_NAME2": "URL",  
        ...  
      }  
    }  
  }  
}  
Each section contains some required and optional fields, which are described in detail below.
Enable subdirectories
If you are using subdirectories, specify the file names using paths relative to the top-level directory (which you specify in the env.list file as the input directory), and be sure to set "includeInputSubdirs": true (the default is false if the option is not specified.) For example:
{
 "inputFiles": {
    "foo.csv": {...},
    "bar*.csv": {...},
    "*.csv": {...},
    "data/*.csv": {...}
  },
  "includeInputSubdirs": true
Note: Although you don’t need to specify the names of MCF files in the
inputFilesblock, if you want to store them in subdirectories, you must still set"includeInputSubdirs": truehere.
Input files
The top-level inputFiles lists out the CSV input files and options specific to each file. The file expression is the file name (including relative subdirectories, where applicable) or wildcard patterns if the same configuration applies to multiple files.
You can use the * wildcard; matches are applied in the order in which they are specified in the config. For example, in the following:
{
 "inputFiles": {
    "foo.csv": {...},
    "bar*.csv": {...},
    "*.csv": {...}
  }
}
The first set of parameters only applies to foo.csv. The second set of parameters applies to bar.csv, bar1.csv, bar2.csv, etc. The third set of parameters applies to all CSVs except the previously specified ones, namely foo.csv and bar*.csv.
Input file parameters
- format
- 
    Only needed to specify variablePerRowfor explicit schemas. The assumed default isvariablePerColumn(implicit schema).
- provenance
- 
    Required: The provenance (named source) of this input file. Provenances map from a source to a dataset. The name here must correspond to the name defined as a provenancein thesourcessection. For example,WorldDevelopmentIndicatorsprovenance (or dataset) is from theWorldBanksource.
You must specify the provenance details under sources.provenances; this field associates one of the provenances defined there to this file.
- ignoreColumns (implicit schema only)
- 
    Optional: A list of headings representing columns that should be ignored by the importer, if any. 
- importType (implicit schema only)
- 
    Only needed to specify entitiesfor custom entity imports. The assumed default isvariables.
- entityType (implicit schema only, variables only)
- 
    Required for CSV files containing observations: All entities in a given file must be of a specific type. The importer tries to resolve entities to DCIDs of that type. In most cases, the entityTypewill be a supported place type; see Place types for a list. For CSV files containing custom entities, use therowEntityTypeoption instead.
- rowEntityType (implicit schema only, entities only)
- 
    Required for CSV files containing custom entities: The DCID of the entity type (new or existing) of the custom entities you are importing. It must match the DCID specified in the entitiessection(s). For example, if you are importing a set of hospital entities, the entity type could be the existing entity typeHospital.
- idColumn (implicit schema only, entities only)
- 
    Optional: The heading of the column representing DCIDs of custom entities that the importer should create. If you don’t specify this, the importer will auto-generate DCIDs for each row in the file. It is strongly recommended that you use specify this to define your own DCIDs. 
- entityColumns (implicit schema only, entities only)
- 
    Optional: A list of headings of columns that represent existing DCIDs in the knowledge graph. The heading must be the DCID of the entity type of the column (e.g. City,Country) and each row must be the DCID of the entity (e.g.country/CAN,country/PAN).
- columnMappings (explicit schema only)
- 
    Optional: If headings in the observations CSV file do not use the required names for these columns ( variable,entity, etc.), provide the equivalent names for each column. For example, if your headings areSERIES,GEOGRAPHY,TIME_PERIOD,OBS_VALUE, you would specify:"variable": "SERIES", "entity": "GEOGRAPHY", "date": "TIME_PERIOD", "value": "OBS_VALUE"
- observationProperties (implicit schema only)
- 
    Optional: Additional information about each observation contained in the CSV file. Whatever setting(s) you specify will apply to all observations in the file. 
Currently, the following properties are supported:
- unit: The unit of measurement used in the observations. This is a string representing a currency, area, weight, volume, etc. For example,- SquareFoot,- USD,- Barrel, etc.
- observationPeriod: The period of time in which the observations were recorded. This must be in ISO duration format, namely- P[0-9][Y|M|D|h|m|s]. For example,- P1Yis 1 year,- P3Mis 3 months,- P3his 3 hours.
- measurementMethod: The method used to gather the observations. This can be a random string or an existing DCID of- MeasurementMethodEnumtype; for example,- EDA_Estimateor- WorldBankEstimate.
- scalingFactor: An integer representing the denominator used in measurements involving ratios or percentages. For example, for percentages, the denominator would be- 100.
Note that you cannot mix different property values in a single CSV file. If you have observations using different properties, you must put them in separate CSV files.
Entities (implicit schema only)
This is required for custom entity imports. Whether you are referencing an existing entity type or a creating a new entity type, specify its DCID here. Note that it must match the DCID specified in the input files rowEntityType field.
Entity parameters
- name
- 
    If you are creating a new entity type, provide a human-readable name for it. If you are referencing an existing entity type, omit this parameter. 
- description
- 
    If you are creating a new entity type, provide a longer description for it. If you are referencing an existing entity type, omit this parameter. 
Variables (implicit schema only)
The variables section is optional. You can use it to define names and associate additional properties with the statistical variables in the files, using the parameters described below. All parameters are optional. If you don’t provide this section, the importer will automatically derive the variable names from the CSV file headings.
Variable parameters
- name
- 
    The display name of the variable, which will show up throughout the UI. If not specified, the column name is used as the display name. 
 The name should be concise and precise; that is, the shortest possible name that allow humans to uniquely identify a given variable. The name is used to generate NL embeddings.
- description
- 
    A long-form description of the variable. 
- properties
- 
    Additional Data Commons properties associated with this variable. The properties are any property required or optional in the MCF Node definition of a variable. The value of the property must be a DCID. 
Each property is specified as a key:value pair. Here are some examples:
{
  "populationType": "schema:Person",
  "measuredProperty": "age",
  "statType": "medianValue",
  "gender": "Female"
}
Note that the measuredProperty property has an effect on the display: if it is not set for any variable, the importer assumes that it is different for every defined variable, so that each variable will be shown in a different chart in the UI tools. If you would like multiple variables to show up in the same chart, be sure to set this property on all of the relevant variables, to the same (DCID) value. For example, if you wanted Adult_curr_cig_smokers_female and Adult_curr_cig_smokers_male to appear on the same Timeline chart, set measuredProperty to a common property of the two variables, for example percent.
"variables": {
    "Adult_curr_cig_smokers": {
      "properties": {
        "measuredProperty": "percent"
      }
    },
    "Adult_curr_cig_smokers_female": {
       "properties": {
         "measuredProperty": "percent"
      }
    }
  }
- group
- 
    By default, the Statistical Variables Explorer will display all custom variables as a group called “Custom Variables”. You can use this option to create one or more custom group names and assign different variables to groups. The value of the groupoption is used as the heading of the group. For example, in the sample data, the group nameOECDis used to group together the two variables from the two CSV files:

You can have a multi-level group hierarchy by using / as a separator between each group.
Note: You can only assign a variable to one group. If you would like to assign the same variable to multiple groups, you will need to define the groups as nodes in MCF; see Define a statistical variable group node for details.
- searchDescriptions
- 
    An array of descriptions to be used for creating more NL embeddings for the variable. This is only needed if the variable nameis not sufficient for generating embeddings.
groupStatVarsByProperty (explicit schema only)
Optional: When set to true, causes the Statistical Variable Explorer to display a top-level category called “Custom Variables”, and groups together variables with the same population types and measured properties. For example:

Sources
The sources section encodes the sources and provenances associated with the input dataset. Each named source is a mapping of provenances to URLs.
Source parameters
- url
- Required: The URL of the named source. For example, for named source U.S. Social Security Administration, it would behttps://www.ssa.gov.
- provenances
- Required: A set of NAME:URL pairs. Here are some examples:
{
  "USA Top Baby Names 2022": "https://www.ssa.gov/oact/babynames/",
  "USA Top Baby Names 1923-2022": "https://www.ssa.gov/oact/babynames/decades/century.html"
}
The named provenances should be used to identify the provenance field(s) of input files.
Page last updated: October 24, 2025 • Send feedback about this page