Skip to content

Configuration File Reference

config.yml is the Metal configuration file, a YAML file used to configure sources, schemas, and plans. This document describes version 1 of the config.yml file format.

Example config.yml

yaml
version: '0.2'

server:
  port: 3000

sources:
  my-source:
    provider: postgres
    host: localhost
    port: 5432
    user: myuser
    password: myStr@ngpa$$w0rd
    database: mydatabase

schemas:
  my-schema:
    sourceName: my-source

version

Defines the version used for the configuration. Accepted values : 0.1, 0.2

Example:

yaml
version: '0.2'

server

Defines the configuration of the Metal server.

ℹ️ NOTE

The parameters that can be configured inside the server section include:

  • port
  • verbosity
  • cache
  • timezone
  • authentication

Example:

yaml
server:
  port: 3000
  verbosity: debug
  cache:
    provider: mongodb
    uri: mongodb://localhost:27017/
    database: metal_cache
    options:
      connectTimeoutMS: 5000
      serverSelectionTimeoutMS: 5000

port

Defines the Metal server's TCP port for API exposure. Default: 3000

verbosity

Sets the console logging verbosity, which can be one of the following values:

  • trace
  • debug
  • info
  • warn
  • error
  • silent

Default: warn

cache

Sets the Database server for storing cache objects. The configuration is the same as a source. (See: sources)

⚠️ IMPORTANT

This parameter must be configured if you plan to use the cache feature in Metal.

timezone

Sets the server's timezone.

A list of acceptable timezone values can be found here.

Default: UTC

authentication

Enables authentication on the Metal server with declared users in the users section. (See: users)

⚠️ IMPORTANT

At this stage, this parameter does not take a value.

yaml
server:
  authentication:

request-limit

Controls the maximum request body size. If this is a number, then the value specifies the number of bytes; if it is a string, the value is passed to the bytes library for parsing. For supported values, see here.

Default: 100mb

users

Declares a list of Metal users used when authentication is enabled with server.authentication.

Example:

yaml
users:
  admin: 123456
  guest: "654321"

sources

This section contains all source declarations and configurations that are applied to each database server endpoint connection, much like a connection string used in development.

Every source is declared with a name, followed by the appropriate data provider configuration and options if needed.

ℹ️ NOTE

When declaring a source, the parameters that can be configured inside are: provider, host, port, user, password, database, and options.

Example:

yaml
sources:
  my-postgresql-db:
    provider: postgres
    host: 192.168.1.113
    port: 5433
    user: root
    password: Azerty123!
    database: sampledb

  my-ms-sql-db:
    provider: mssql
    host: 192.168.1.123
    port: 1433
    user: sa
    password: Azerty123!
    database: SampleDB

provider

Defines the data provider type.

The table below describes the different values that can be configured in the provider parameter:

ValueDBMS Provider
postgresPostgreSQL
mssqlAzure Sql Database, Microsoft SQL Server
mongodbMongoDB
planConnect to Metal Plan (see: plans)
filesFiles as tables abstraction data provider (see: plans)

For more detailed information about how to configure a data provider, See: Data Providers Configurations

ℹ️ NOTE

When using plan as a data provider, you only need to provide the name of the plan as database parameter. Example:

yaml
sources:
  my-source-from-plan:
    provider: plan
    database: my-plan

Example:

If we want to declare a source named my-postgresql-db using the PostgreSQL data provider, we write:

yaml
sources:
  my-postgresql-db:
    provider: postgres

host

Defines the DBMS server host.

Example:

yaml
sources:
  my-postgresql-db:
    host: 10.11.12.13

⚠️ IMPORTANT

For MongoDB, the host must be provided in the URI form mongodb://my-server:my-server-port/. Example: mongodb://localhost:27017/.

ℹ️ NOTE

MS SQL Server can be provided in the form MY-SERVER\MY-INSTANCE.

port

Defines the DBMS TCP port.

Example:

yaml
sources:
  my-postgresql-db:
    port: 5432

⚠️ IMPORTANT

This parameter is unnecessary for MongoDB.

user

Defines the user to connect to the DBMS server.

Example:

yaml
sources:
  my-postgresql-db:
    user: root

password

Defines the DBMS user password.

Example:

yaml
sources:
  my-postgresql-db:
    password: MySecretPassword

database

Defines the name of the database to connect to.

Example:

yaml
sources:
  my-postgresql-db:
    database: mydatabase

options

This parameter defines optional parameters to be passed to the data provider.

Example:

yaml
sources:
  mongo-db1:
    provider: mongodb
    host: mongodb://localhost:27017/
    database: myDatabase
    options:
      connectTimeoutMS: 5000
      serverSelectionTimeoutMS: 5000

ℹ️ NOTE

For more information about how to configure a data provider and its options, See: Data Providers Configurations

schemas

This section is used to declare virtual schemas.

A schema is a mapping of DBMS source and tables mapping, and it serves as the main access point. If you want to allow access to a database or a combination of databases, you must declare your schemas here to expose them to the API.

ℹ️ NOTE

When declaring a schema, the parameters that can be configured inside are: sourceName and entities.

⚠️ IMPORTANT

If there's no schemas declaration in config.yml, Metal will expose nothing to the API. In this case, you are planning to use Metal as a scheduled ETL tool (see: Use Case, CRON ETL).

Example:

yaml
schemas:
  my-schema1:
    sourceName: my-mssql-db

  my-schema2:
    entities:
      my-entity1:
        sourceName: my-mongodb-source
        entityName: entity1
      my-entity2:
        sourceName: my-postgres-source
        entityName: entity2

sourceName

Declare which source to use from the sources section. (See: sources)

⚠️ IMPORTANT

Only one sourceName is allowed in a schema declaration.

Example:

yaml
schemas:
  my-schema1:
    sourceName: my-mssql-db

entities

Used to declare each entity from sources. It is possible to declare entities from different sources. Only declared entities are visible.

When declaring an entity, two parameters must be configured inside:

  • sourceName: the name of a declared source
  • entityName: the name of an entity that is in the source

⚠️ IMPORTANT

Only one entities section is allowed in a schema declaration.

Example:

yaml
schemas:
  my-schema2:
    entities:
      my-entity1:
        sourceName: my-mongodb-source
        entityName: entity1
      my-entity2:
        sourceName: my-plan
        entityName: entity2

ℹ️ TIP

It is possible to combine sourceName and entities in the same schema declaration.

Example:

yaml
schemas:
  my-merged-schema:
    sourceName: my-mssql-source
    entities:
      my-entity1:
        sourceName: my-mongodb-source
        entityName: entity1
      my-entity2:
        sourceName: my-postgres-source
        entityName: entity2

ai-engines

This section declares AI engine processors to be used in plans with the command run (see: run). Implemented AI engines and models include:

  • Tesseract.js: a pure JavaScript port of the popular Tesseract OCR engine (implemented models: All languages)
  • TensorFlow.js: a JavaScript library for training and deploying machine learning models (implemented models: Image classification)
  • NLP.js: a general natural language utility (implemented models: Sentiment Analysis, Guess the language)

ℹ️ NOTE

When declaring an AI engine, the parameters that can be configured inside are: engine, model, and options.

You can find different values for engine and model in the table below:

ValueAI Engine
tesseractjsTesseract.js
tensorflowjsTensorFlow.js
nlpjsNLP.js

Tesseract.js

This AI supports more than 100 languages, automatic text orientation and script detection, and a simple interface for reading paragraph, word, and character bounding boxes.

To use this AI, set the engine to tesseractjs and the model to the desired language (see: Tesseract OCR Data Files). If model is not specified, Tesseract.js defaults to eng for English.

ℹ️ NOTE

No options are available for this AI engine.

Example

yaml
ai-engines:
  my-ocr:
    engine: tesseractjs
    model: eng

TensorFlow.js

TensorFlow.js is an open-source JavaScript library that allows you to develop machine learning and deep learning models. It is a part of the TensorFlow ecosystem and provides a way to run machine learning models, including neural networks, in JavaScript.

Only image classification is implemented in this Metal version.

To use this AI, set the engine to tensorflowjs and the model to image-classify.

ℹ️ NOTE

No options are available for this AI engine.

Example

yaml
ai-engines:
  my-image-classifier:
    engine: tensorflowjs
    model: image-classify

NLP.js

NLP.js is an open-source JavaScript library for natural language processing (NLP) and natural language understanding (NLU). It provides a set of tools and functionalities to work with natural language text, including tasks such as tokenization, part-of-speech tagging, named entity recognition, sentiment analysis, and more. NLP.js is designed to help developers build applications that can understand and interact with human language.

To use this AI, set the engine to nlpjs and the model to:

  • sentiment for Sentiment Analysis
  • lang-guess for Guessing the language

sentiment

This model accepts lang for options to define the language of the text to be analyzed. For supported languages, see NLP.js Sentiment Analysis Languages.

Example

yaml
ai-engines:
  my-sentiment-analyzer:
    engine: nlpjs
    model: sentiment
    options:
      lang: en

guess-lang

This model accepts accept for options to define a list of languages to guess, separated by a comma. For supported languages, see NLP.js Language Support.

Example

yaml
ai-engines:
  my-language-guesser:
    engine: nlpjs
    model: guess-lang
    options:
      accept: en,fr

plans

This section is used to declare plans which are an ETL steps. It can be used on the fly by calling a schema conneted to a plan or by scheduling as a job.

In each plan you must declare at least one entity in which the steps will be executed

Example

yaml
plans:
  my-plan:
      my-first-entity:
      my-second-entity:

ℹ️ TIP

You may call a declared entity in the plan, in that case the steps hocked to the second entity will be executed

Example

yaml
plans:
  my-plan:
    my-first-entity:
      - select:
          schemaName: demo
          entityName: users
          fields: login, partner_id
    my-second-entity:        
      - select:
          schemaName: demo
          entityName: contacts
          fields: id, name, display_name
      - join:
          type: left
          entityName: my-first-entity
          leftField: partner_id
          rightField: id

ℹ️ NOTE

a plan is a list of steps that can be :

Step commandDecription
selectto select data from an entity. If schema is not provided, actual plan's entity data will be used
insertto insert data to an entity. If schema is not provided, actual plan's entity data will be used
deleteto delete data from an entity. If schema is not provided, actual plan's entity data will be used
updateto update data of an entity. If schema is not provided, actual plan's entity data will be used
debugto enable steps debug
breakto stop plan execution at this step
jointo perform data joins (Left,Right,Inner,Full outer and Cross)
sortto sort actual data
fieldsfields to keep from actual data
runto run an AI Engine
syncto synchronize data from data source to a data destination

select

To select data from an entity. If schema is not provided, actual plan's entity data will be used.

The parameters that can be configured inside select tag are :

NameDescription
schemaNamename of schema
entityNamename of entity in the schemaName
fieldsfields to keep, comma seperated. (see: Optional Parameters)
filtercondition key:value to filter data. (see: Optional Parameters)
filterExpressionfree form condition to filter data. (see: Optional Parameters)
sortsort data, can be asc or desc. (see: Optional Parameters)
cachetime in seconds to cache data. (see: Optional Parameters)

Example

yaml
plans:
  my-plan:
    my-entity:
      - select:
        schemaName: demo
        entityName: users
        fields: login, partner_id

insert

To insert data to an entity. If schema is not provided, actual plan's entity data will be used

The parameters that can be configured inside insert tag are :

NameDescription
schemaNamename of schema
entityNamename of entity in the schemaName
datadata to be inserted in the entityName. (see: Optional Parameters)

Example

yaml
plans:
  my-plan:
    my-entity:
      - insert:
          schemaName: my-schema
          entityName: search-engine
          data:
            - name: Google
              url: https://www.google.com
            - name: Yahoo
              url: https://www.yahoo.com
            - name: Bing
              url: https://www.bing.com

delete

To delete data from an entity. If schema is not provided, actual plan's entity data will be used

The parameters that can be configured inside delete tag are :

NameDescription
schemaNamename of schema
entityNamename of entity in the schemaName
filtercondition key:value to filter data. (see: Optional Parameters)
filterExpressionfree form condition to filter data. (see: Optional Parameters)

Example

yaml
plans:
  my-plan:
    my-entity:
      - delete:
          schemaName: my-schema
          entityName: users
          filterExpression: "id >= 100"

update

To update data of an entity. If schema is not provided, actual plan's entity data will be used

The parameters that can be configured inside update tag are :

NameDescription
schemaNamename of schema
entityNamename of entity in the schemaName
filtercondition key:value to filter data. (see: Optional Parameters)
filterExpressionfree form condition to filter data. (see: Optional Parameters)
datadata to be replaced in the entityName. (see: Optional Parameters)

Example

yaml
plans:
  my-plan:
    my-entity:
      - update:
          schemaName: my-schema
          entityName: users
          filter:
            anonymize: true
          data:
            name: "******"

debug

Enable plan steps debugging to be visible in the metadata of the JSON return. It can be one of the following values : nothing, error

Example

yaml
plans:
  my-plan:
    my-entity:
      - debug:

break

To stop execution of the plan at this step.

Example

yaml
plans:
  my-plan:
    my-entity:
      - break:

join

To perform data joins (Left,Right,Inner,Full outer and Cross)

The parameters that can be configured inside join tag are :

NameDescription
schemaNamename of schema. If not provided actual plan will be used as a schema
entityNamename of entity in the schemaName
typeJoin type can be left,right,inner,fullOuter,cross
leftFieldLeft field for equality with rightField
rightFieldRight field

type parameter

  • left : for perfoming Left Join
  • right : for perfoming Right Join
  • inner : for perfoming Inner Join
  • fullOuter : for perfoming Full Outer Join
  • cross : for perfoming Cross Join

Example

yaml
plans:
  my-plan:
    my-first-entity:
      - select:
          schemaName: demo
          entityName: users
          fields: login, partner_id
    my-second-entity:        
      - select:
          schemaName: demo
          entityName: contacts
          fields: id, name, display_name
      - join:
          type: left
          entityName: my-first-entity
          leftField: partner_id
          rightField: id

sort

To sort actual plan's entity data

This command accept a list of one or many entity's fields and sorting order :

  • asc for ascending
  • desc for descending

If sorting order is not provided, ascending will be used

Example

yaml
plans:
  my-plan:
    my-second-entity:        
      - select:
          schemaName: my-schema
          entityName: contacts
          fields: id, name, display_name
      - sort:
          id:

fields

To keep fields from actual plan's entity data

yaml
plans:
  my-plan:
      my-first-entity:
        - select:
            schemaName: demo
            entityName: users
      my-second-entity:        
        - select:
            schemaName: demo
            entityName: contacts
        - join:
            type: left
            entityName: my-first-entity
            leftField: partner_id
            rightField: id
        - fields: id, name, display_name

run

To run an AI Engine on actual plan's entity data.

The parameters that can be configured inside run tag are :

NameDescription
ainame of a declared AI Engine (see: ai-engines)
inputinput field to perform the processing
outputOutput result to be stored. If nothing is provided, the entire object will be stored in a field that has the AI Engine name. It accept a list of key:value where the key is a child of the result and the value is the renamed field in the plan's entity

Example

yaml
plans:
  my-plan:
    my-entity:
      - insert:
          data:
          - url: https://tesseract.projectnaptha.com/img/eng_bw.png
          - url: https://jeroen.github.io/images/testocr.png
          - url: https://www.srcmake.com/uploads/5/3/9/0/5390645/ocr_orig.png
      - run:
          ai: my-ocr
          input: url
          output:
            confidence: ocr_confidence
            text: ocr_text

sync

To synchronize data from source to destination. This will performs Update, Insert and Delete operations on the destination entity to be the exact copy of the data source.

The parameters that can be configured inside sync tag are :

NameDescription
source.schemaNamename of source schema. If not provided actual plan will be used as a schema
source.entityNamename of source entity in the source.schemaName
destination.schemaNamename of destination schema. If not provided actual plan will be used as a schema
destination.entityNamename of destination entity in the destination.schemaName
onfield that exists in both source and destination entity. It will be used as unique identity for synchronization

Example

yaml
plans:
  my-plan:
    my-entity:
      - sync:
          source:
            schemaName: srcschema
            entityName: users
          destination:
            schemaName: destschema
            entityName: users
          on: user_id

schedules

This section defines the scheduled execution of plans according to a Cron expression.

The parameters that can be configured inside schedule are :

NameTypeRequiredDescription
planNameStringYname of the plan
entityNameStringYname of the entity in the planName
cronStringYA cron expression string, or @start for once at Metal startup

Example

yaml
schedules:
  run my-plan every 5 minutes:
    planName: my-plan
    entityName: contact
    cron: "*/5 * * * * *"

ℹ️ TIP

By using @start as a cron expression, you can start the job once at Metal startup.

Example:

yaml
schedules:
  run at startup:
    planName: my-plan
    entityName: contact
    cron: "@start"

Released under the GNU v3 License.