Skip to content

Configuration File Reference

config.yml is the Metal configuration file, a YAML file used to configure sources, schemas, and plans. This document describes version 1 of the config.yml file format.

Example config.yml

yaml
version: "0.3"

server:
  port: 3000

sources:
  my-source:
    provider: postgres
    host: localhost
    port: 5432
    user: myuser
    password: myStr@ngpa$$w0rd
    database: mydatabase

schemas:
  my-schema:
    source: my-source

version ^0.1

Defines the version used for the configuration. Accepted values : 0.1, 0.2, 0.3

Example:

yaml
version: "0.3"

server ^0.1

Defines the configuration of the Metal server.

The parameters that can be configured inside the server section include:

ParameterTypeRequiredDecriptionMetal version
portintegerNServer TCP port^0.1
verbositystringNLogging level^0.1
cacheobjectNCache configuration^0.1
timezonestringNTimezone setting^0.1
authenticationstringYConfigure user authentication^0.3
request-limitstringNDefine request limit^0.1
response-limitstringNDefine response limit^0.3
response-rateobjectNDefine response rate limit^0.3

Example:

yaml
server:
  port: 3000
  verbosity: debug
  cache:
    provider: mongodb
    uri: mongodb://localhost:27017/
    database: metal_cache
    options:
      connectTimeoutMS: 5000
      serverSelectionTimeoutMS: 5000

port

Defines the Metal server's TCP port for API exposure.

Default: 3000

verbosity

Sets the console logging verbosity, which can be one of the following values:

  • trace
  • debug
  • info
  • warn
  • error

Default: warn

cache

Sets the Database server for storing cache objects. The configuration is the same as a source. (See: sources)

⚠️ IMPORTANT

This parameter must be configured if you plan to use the cache feature in Metal.

timezone

Sets the server's timezone.

A list of acceptable timezone values can be found here.

Default: UTC

authentication

Sets the authentication configuration for the Metal server.

The parameters that can be configured inside the server section include:

ParameterTypeRequiredDefault valueDecriptionMetal version
providerenum(string)YlocalAuthentication Provider^0.3
default-rolestringN(empty)Default role assigned to the user when is authenticated. (see: roles)^0.3
autocreatestringNfalsePopulate automatically users with the authenticated user if not exist^0.3

Authentication providers :

ProviderDescriptionMetal version
localEnables local authentication through Metal server using declared users^0.3

Example:

yaml
server:
  authentication:
    type: local

ℹ️ TIP

For more informations about authentication, roles and users, please refer to the Authentication guide.

request-limit

Controls the maximum request body size. If this is a number, then the value specifies the number of bytes; if it is a string, the value is passed to the bytes library for parsing. For supported values, see here.

When exeeded, an error PAYLOAD TOO LARGE(413) will occur.

Default: 10mb

response-limit

Controls the maximum response body size. If this is a number, then the value specifies the number of bytes; if it is a string, the value is passed to the bytes library for parsing. For supported values, see here.

When exeeded, an error CONTENT TOO LARGE(413) will occur.

Default: 10mb

response-rate

Controls the maximum request per window. The parameters that can be configured inside the response-rate section include:

ParameterTypeRequiredDescriptionMetal version
windowMsintegerYThe time window for rate limiting, in milliseconds. For example, 60000 milliseconds (60 seconds).^0.3
maxintegerYThe maximum number of requests allowed within the windowMs time window. For example, 600 requests.^0.3
messagestringNThe message to be sent when the rate limit is exceeded. This can be a custom message indicating that the user has made too many requests. For example, "Too many requests from this IP, please try again later."^0.3

If exeeded, an error TOO MANY REQUEST(429) will occur.

Default:

yaml
windowMs: 60000,
max: 600,
message: Too many requests from this IP, please try again later

roles ^0.3

Sets the list of roles with associated permissions used when authentication is enabled with server.authentication. Each role is defined by a unique name and a string of permissions where each character represents a specific permission:

PermissionDescriptionMetal version
cCreate data^0.3
rRead data^0.3
uUpdate data^0.3
dDelete data^0.3
aAdministrate server^0.3
lList schema entities^0.3

Example:

yaml
roles:
  admin: arl
  all-rights: crudla
  guest: r

users ^0.1

Declares a list of Metal users used when authentication is enabled with server.authentication.

Example:

yaml
users:
  admin: 123456
  guest: "654321"

sources ^0.1

This section contains all source declarations and configurations that are applied to each database server endpoint connection, much like a connection string used in development.

Every source is declared with a name, followed by the appropriate data provider configuration and options if needed.

ℹ️ NOTE

When declaring a source, the parameters that can be configured inside are: provider, host, port, user, password, database, and options.

Example:

yaml
sources:
  my-postgresql-db:
    provider: postgres
    host: 192.168.1.113
    port: 5433
    user: root
    password: Azerty123!
    database: sampledb

  my-ms-sql-db:
    provider: mssql
    host: 192.168.1.123
    port: 1433
    user: sa
    password: Azerty123!
    database: SampleDB

The parameters that can be configured inside a source include:

ParameterRequiredDecriptionMetal version
providerYProvider type^0.1
hostNHost server^0.1
portNHost port^0.1
userNProvider user^0.1
passwordNProvider user password^0.1
databaseYProvider database^0.1
optionsNAdditional options^0.1

provider

Defines the data provider type.

The table below describes the different values that can be configured in the provider parameter:

ValueDBMS ProviderMetal version
postgresPostgreSQL^0.1
mssqlAzure Sql Database, Microsoft SQL Server^0.1
mongodbMongoDB^0.1
planConnect to Metal Plan^0.2
filesFiles as tables abstraction data provider^0.2
metalMetal Server via REST^0.2
memoryLocal Memory storage (Non-persistant)^0.2

For more detailed information about how to configure a data provider, See: Data Providers Configurations

ℹ️ NOTE

When using plan as a data provider, you only need to provide the name of the plan as database parameter. Example:

yaml
sources:
  my-source-from-plan:
    provider: plan
    database: my-plan

Example:

If we want to declare a source named my-postgresql-db using the PostgreSQL data provider, we write:

yaml
sources:
  my-postgresql-db:
    provider: postgres

host

Defines the DBMS server host.

Example:

yaml
sources:
  my-postgresql-db:
    host: 10.11.12.13

⚠️ IMPORTANT

For MongoDB, the host must be provided in the URI form mongodb://my-server:my-server-port/. Example: mongodb://localhost:27017/.

ℹ️ NOTE

MS SQL Server can be provided in the form MY-SERVER\MY-INSTANCE.

port

Defines the DBMS TCP port.

Example:

yaml
sources:
  my-postgresql-db:
    port: 5432

⚠️ IMPORTANT

This parameter is unnecessary for MongoDB.

user

Defines the user to connect to the DBMS server.

Example:

yaml
sources:
  my-postgresql-db:
    user: root

password

Defines the DBMS user password.

Example:

yaml
sources:
  my-postgresql-db:
    password: MySecretPassword

database

Defines the name of the database to connect to.

Example:

yaml
sources:
  my-postgresql-db:
    database: mydatabase

options

This parameter defines optional parameters to be passed to the data provider.

Example:

yaml
sources:
  mongo-db1:
    provider: mongodb
    host: mongodb://localhost:27017/
    database: myDatabase
    options:
      connectTimeoutMS: 5000
      serverSelectionTimeoutMS: 5000

ℹ️ NOTE

For more information about how to configure a data provider and its options, See: Data Providers Configurations

schemas ^0.1

This section is used to declare virtual schemas.

A schema is a mapping of DBMS source and tables mapping, and it serves as the main access point. If you want to allow access to a database or a combination of databases, you must declare your schemas here to expose them to the API.

ℹ️ NOTE

When declaring a schema, the parameters that can be configured inside are: source and entities.

⚠️ IMPORTANT

If there's no schemas declaration in config.yml, Metal will expose nothing to the API. In this case, you are planning to use Metal as a scheduled ETL tool (see: Use Case, CRON ETL).

Example:

yaml
schemas:
  my-schema1:
    source: my-mssql-db

  my-schema2:
    entities:
      my-entity1:
        source: my-mongodb-source
        entity: entity1
      my-entity2:
        source: my-postgres-source
        entity: entity2

The parameters that can be configured inside a source include:

ParameterDecriptionMetal version
sourceSource to use^0.1
entitiesDetailed entities configuration^0.1

source

Declare which source to use from the sources section. (See: sources)

⚠️ IMPORTANT

Only one source is allowed in a schema declaration.

Example:

yaml
schemas:
  my-schema1:
    source: my-mssql-db

entities

Used to declare each entity from sources. It is possible to declare entities from different sources. Only declared entities are visible.

When declaring an entity, two parameters must be configured inside:

  • source: the name of a declared source
  • entity: the name of an entity that is in the source

⚠️ IMPORTANT

Only one entities section is allowed in a schema declaration.

Example:

yaml
schemas:
  my-schema2:
    entities:
      my-entity1:
        source: my-mongodb-source
        entity: entity1
      my-entity2:
        source: my-plan
        entity: entity2

ℹ️ TIP

It is possible to combine source and entities in the same schema declaration.

Example:

yaml
schemas:
  my-merged-schema:
    source: my-mssql-source
    entities:
      my-entity1:
        source: my-mongodb-source
        entity: entity1
      my-entity2:
        source: my-postgres-source
        entity: entity2

ai-engines ^0.1

This section declares AI engine processors like Tesseract.js and NLP.js to be used in plans with the command run (see: run)

The parameters that can be configured inside an AI engine include:

ParameterDecriptionMetal version
engineAI engine to use^0.1
modelModel handled by the AI engine^0.1
optionsAdditional options^0.1

For more detailed information about how to configure an AI engine, See: AI Engines Configurations

Example

yaml
ai-engines:
  my-sentiment-analyzer:
    engine: nlpjs
    model: sentiment
    options:
      lang: en

plans ^0.1

This section is used to declare plans which are an ETL steps. It can be used on the fly by calling a schema conneted to a plan or by scheduling as a job.

In each plan you must declare at least one entity in which the steps will be executed

Example

yaml
plans:
  my-plan:
    my-first-entity:
    my-second-entity:

ℹ️ TIP

You may call a declared entity in the plan, in that case the steps hocked to the second entity will be executed

Example

yaml
plans:
  my-plan:
    my-first-entity:
      - select:
          schema: demo
          entity: users
          fields: login, partner_id
    my-second-entity:
      - select:
          schema: demo
          entity: contacts
          fields: id, name, display_name
      - join:
          type: left
          entity: my-first-entity
          left-field: partner_id
          right-field: id

The steps that can be configured inside a plan can be:

Step commandDecriptionMetal version
selectto select data from an entity. If schema is not provided, actual plan's entity data will be used^0.1
insertto insert data to an entity. If schema is not provided, actual plan's entity data will be used^0.1
deleteto delete data from an entity. If schema is not provided, actual plan's entity data will be used^0.1
updateto update data of an entity. If schema is not provided, actual plan's entity data will be used^0.1
debugto enable steps debug^0.1
breakto stop plan execution at this step^0.1
jointo perform data joins (Left,Right,Inner,Full outer and Cross)^0.1
sortto sort actual data^0.1
fieldsfields to keep from actual data^0.1
runto run an AI Engine^0.1
syncto synchronize data from data source to a data destination^0.2
anonymizeto anonymize data of given fields^0.3
remove-duplicatesto remove duplicated rows^0.3
list-entitiesto list entities in a schema^0.3

list-entities

To list entities in a schema. If schema is not provided, a list of actual plan's entities will be returned.

The parameters that can be configured inside select tag are :

NameDecriptionMetal version
schemaname of schema^0.3

Example

yaml
plans:
  my-plan:
    my-entity:
      - list-entities:
        schema: my-schema

select

To select data from an entity. If schema is not provided, actual plan's entity data will be returned.

The parameters that can be configured inside select tag are :

NameDecriptionMetal version
schemaname of schema^0.1
entityname of entity in the schema^0.1
fieldsfields to keep, comma seperated. (see: Optional Parameters)^0.1
filtercondition key:value to filter data. (see: Optional Parameters)^0.1
filter-expressionfree form condition to filter data. (see: Optional Parameters)^0.1
sortsort data, can be asc or desc. (see: Optional Parameters)^0.1
cachetime in seconds to cache data. (see: Optional Parameters)^0.1

Example

yaml
plans:
  my-plan:
    my-entity:
      - select:
        schema: demo
        entity: users
        fields: login, partner_id

insert

To insert data to an entity. If schema is not provided, actual plan's entity data will be modified

The parameters that can be configured inside insert tag are :

NameDescriptionMetal version
schemaname of schema^0.1
entityname of entity in the schema^0.1
datadata to be inserted in the entity. (see: Optional Parameters)^0.1

Example

yaml
plans:
  my-plan:
    my-entity:
      - insert:
          schema: my-schema
          entity: search-engine
          data:
            - name: Google
              url: https://www.google.com
            - name: Yahoo
              url: https://www.yahoo.com
            - name: Bing
              url: https://www.bing.com

delete

To delete data from an entity. If schema is not provided, actual plan's entity data will be modified

The parameters that can be configured inside delete tag are :

NameDescriptionMetal version
schemaname of schema^0.1
entityname of entity in the schema^0.1
filtercondition key:value to filter data. (see: Optional Parameters)^0.1
filter-expressionfree form condition to filter data. (see: Optional Parameters)^0.1

Example

yaml
plans:
  my-plan:
    my-entity:
      - delete:
          schema: my-schema
          entity: users
          filter-expression: "id >= 100"

update

To update data of an entity. If schema is not provided, actual plan's entity data will be modified

The parameters that can be configured inside update tag are :

NameDescriptionMetal version
schemaname of schema^0.1
entityname of entity in the schema^0.1
filtercondition key:value to filter data. (see: Optional Parameters)^0.1
filter-expressionfree form condition to filter data. (see: Optional Parameters)^0.1
datadata to be replaced in the entity. (see: Optional Parameters)^0.1

Example

yaml
plans:
  my-plan:
    my-entity:
      - update:
          schema: my-schema
          entity: users
          filter:
            anonymize: true
          data:
            name: "******"

debug

Enable plan steps debugging to be visible in the metadata of the JSON return. It can be one of the following values : nothing, error

Example

yaml
plans:
  my-plan:
    my-entity:
      - debug:

break

To stop execution of the plan at this step.

Example

yaml
plans:
  my-plan:
    my-entity:
      - break:

join

To perform data joins (Left,Right,Inner,Full outer and Cross)

The parameters that can be configured inside join tag are :

NameDescriptionMetal version
schemaname of schema. If not provided actual plan will be used as a schema^0.1
entityname of entity in the schema^0.1
typeJoin type can be left,right,inner,full-outer,cross^0.1
left-fieldLeft field for equality with right-field^0.1
right-fieldRight field^0.1

The type parameter can be :

ValueDescriptionMetal version
leftLeft Join^0.1
rightRight Join^0.1
innerInner Join^0.1
full-outerFull Outer Join^0.1
crossCross Join^0.1

Example

yaml
plans:
  my-plan:
    my-first-entity:
      - select:
          schema: demo
          entity: users
          fields: login, partner_id
    my-second-entity:
      - select:
          schema: demo
          entity: contacts
          fields: id, name, display_name
      - join:
          type: left
          entity: my-first-entity
          left-field: partner_id
          right-field: id

sort

To sort actual plan's entity data

This command accept a list of one or many entity's fields and sorting order :

  • asc for ascending
  • desc for descending

If sorting order is not provided, ascending will be used

Example

yaml
plans:
  my-plan:
    my-second-entity:
      - select:
          schema: my-schema
          entity: contacts
          fields: id, name, display_name
      - sort:
          id:

fields

To keep fields from actual plan's entity data

yaml
plans:
  my-plan:
    my-first-entity:
      - select:
          schema: demo
          entity: users
    my-second-entity:
      - select:
          schema: demo
          entity: contacts
      - join:
          type: left
          entity: my-first-entity
          left-field: partner_id
          right-field: id
      - fields: id, name, display_name

run

To run an AI Engine on actual plan's entity data.

The parameters that can be configured inside run tag are :

NameDescriptionMetal version
ainame of a declared AI Engine (see: ai-engines)^0.1
inputinput field to perform the processing^0.1
outputOutput result to be stored. If nothing is provided, the entire object will be stored in a field that has the AI Engine name. It accept a list of key:value where the key is a child of the result and the value is the renamed field in the plan's entity^0.1

Example

yaml
plans:
  my-plan:
    my-entity:
      - insert:
          data:
            - url: https://tesseract.projectnaptha.com/img/eng_bw.png
            - url: https://jeroen.github.io/images/testocr.png
            - url: https://www.srcmake.com/uploads/5/3/9/0/5390645/ocr_orig.png
      - run:
          ai: my-ocr
          input: url
          output:
            confidence: ocr_confidence
            text: ocr_text

sync

To synchronize data from source to destination. This will performs Update, Insert and Delete operations on the destination entity to be the exact copy of the data source.

The parameters that can be configured inside sync tag are :

NameDescriptionMetal version
from.schemaname of source schema. If not provided actual plan will be used as a schema^0.3
from.entityname of source entity in the from.schema^0.3
to.schemaname of destination schema. If not provided actual plan will be used as a schema^0.3
to.entityname of destination entity in the to.schema^0.3
idfield that exists in both source and destination entity. It will be used as unique identity for synchronization^0.3

Example

yaml
plans:
  my-plan:
    my-entity:
      - sync:
          from:
            schema: srcschema
            entity: users
          to:
            schema: destschema
            entity: users
          id: user_id

anonymize

To anonymize data of given fields.

It can be unique field or a list of fields seperated with comma

Example

yaml
plans:
  my-plan:
    my-entity:
      - anonymize: contact_name, company_name

remove-duplicates

The remove-duplicates function is designed to remove duplicate rows from a dataset based on specified parameters. Here are the details:

ParametersTypeDefault valueRequiredDescriptionMetal version
keysArray(String)(empty)NoList of key(s) used for comparison^0.3
methodStringhashNoMethod of comparison^0.3
strategyStringfirstNoStrategy to adopt when duplicates are found^0.3
conditionString(empty)NoCondition to apply according to the selected strategy^0.3

Parameters

keys

An array of strings representing the keys to be used for identifying duplicates in the rows. If no keys are provided, the entire row will be considered for duplicate checking.

method

Defines the approach for comparing rows to identify duplicates. Options include:

  • hash: Uses a hash function to generate unique values for each row based on the specified key(s).
  • exact: Compares the specified key(s) directly to find exact matches.
  • ignorecase: Performs a case-insensitive comparison of the specified key(s).

strategy

Specifies the action to take when duplicates are identified. Possible values are:

  • first: Retains the first occurrence of each duplicate row.
  • last: Retains the last occurrence of each duplicate row.
  • lowest: Keeps the duplicate row with the lowest value in a specified field defined in condition.
  • highest: Keeps the duplicate row with the highest value in a specified field defined in condition.
  • custom: Applies a user-defined logic to decide which row to keep.

condition

Determines the condition to apply based on the chosen strategy.

It can be:

  • For lowest and highest, the name of the field to evaluate.
  • For custom, a SQL predicate expression that defines the condition to retain the row (e.g., age is not null and salary > 30000).

These parameters provide flexible options for removing duplicates based on specific requirements and ensuring the integrity of the dataset.

Example

If we want to check duplicates with hash method for the rows that have the same id, contact_name adn then we keep the first row:

yaml
plans:
  my-plan:
    my-entity:
      - remove-duplicates:
          keys:              # <- fields in the row to be used for comparison
            - id
            - contact_name
          method: hash       # <-  method of comparison
          strategy: first    # <-  'first' for keeping the first found row

schedules ^0.1

This section defines the scheduled execution of plans according to a Cron expression.

The parameters that can be configured inside schedule are :

NameTypeRequiredDescriptionMetal version
planStringYname of the plan^0.1
entityStringYname of the entity in the plan^0.1
cronStringYA cron expression string, or @start for once at Metal startup^0.1

Example

yaml
schedules:
  run my-plan every 5 minutes:
    plan: my-plan
    entity: contact
    cron: "*/5 * * * * *"

ℹ️ TIP

By using @start as a cron expression, you can start the job once at Metal startup.

Example:

yaml
schedules:
  run at startup:
    plan: my-plan
    entity: contact
    cron: "@start"

Released under the GNU v3 License.