Configuration File Reference
config.yml
is the Metal configuration file, a YAML file used to configure sources, schemas, and plans. This document describes version 1 of the config.yml
file format.
Example config.yml
version: '0.2'
server:
port: 3000
sources:
my-source:
provider: postgres
host: localhost
port: 5432
user: myuser
password: myStr@ngpa$$w0rd
database: mydatabase
schemas:
my-schema:
sourceName: my-source
version
Defines the version used for the configuration. Accepted values : 0.1
, 0.2
Example:
version: '0.2'
server
Defines the configuration of the Metal server.
ℹ️ NOTE
The parameters that can be configured inside the server
section include:
port
verbosity
cache
timezone
authentication
Example:
server:
port: 3000
verbosity: debug
cache:
provider: mongodb
uri: mongodb://localhost:27017/
database: metal_cache
options:
connectTimeoutMS: 5000
serverSelectionTimeoutMS: 5000
port
Defines the Metal server's TCP port for API exposure. Default: 3000
verbosity
Sets the console logging verbosity, which can be one of the following values:
trace
debug
info
warn
error
silent
Default: warn
cache
Sets the Database server for storing cache objects. The configuration is the same as a source. (See: sources)
⚠️ IMPORTANT
This parameter must be configured if you plan to use the cache feature in Metal.
timezone
Sets the server's timezone.
A list of acceptable timezone values can be found here.
Default: UTC
authentication
Enables authentication on the Metal server with declared users in the users
section. (See: users)
⚠️ IMPORTANT
At this stage, this parameter does not take a value.
server:
authentication:
request-limit
Controls the maximum request body size. If this is a number, then the value specifies the number of bytes; if it is a string, the value is passed to the bytes library for parsing. For supported values, see here.
Default: 100mb
users
Declares a list of Metal users used when authentication is enabled with server.authentication
.
Example:
users:
admin: 123456
guest: "654321"
sources
This section contains all source declarations and configurations that are applied to each database server endpoint connection, much like a connection string used in development.
Every source is declared with a name, followed by the appropriate data provider configuration and options if needed.
ℹ️ NOTE
When declaring a source, the parameters that can be configured inside are: provider
, host
, port
, user
, password
, database
, and options
.
Example:
sources:
my-postgresql-db:
provider: postgres
host: 192.168.1.113
port: 5433
user: root
password: Azerty123!
database: sampledb
my-ms-sql-db:
provider: mssql
host: 192.168.1.123
port: 1433
user: sa
password: Azerty123!
database: SampleDB
provider
Defines the data provider type.
The table below describes the different values that can be configured in the provider
parameter:
Value | DBMS Provider |
---|---|
postgres | PostgreSQL |
mssql | Azure Sql Database, Microsoft SQL Server |
mongodb | MongoDB |
plan | Connect to Metal Plan (see: plans) |
files | Files as tables abstraction data provider (see: plans) |
For more detailed information about how to configure a data provider, See: Data Providers Configurations
ℹ️ NOTE
When using plan
as a data provider, you only need to provide the name of the plan as database
parameter. Example:
sources:
my-source-from-plan:
provider: plan
database: my-plan
Example:
If we want to declare a source named my-postgresql-db
using the PostgreSQL data provider, we write:
sources:
my-postgresql-db:
provider: postgres
host
Defines the DBMS server host.
Example:
sources:
my-postgresql-db:
host: 10.11.12.13
⚠️ IMPORTANT
For MongoDB, the host must be provided in the URI form mongodb://my-server:my-server-port/
. Example: mongodb://localhost:27017/
.
ℹ️ NOTE
MS SQL Server can be provided in the form MY-SERVER\MY-INSTANCE
.
port
Defines the DBMS TCP port.
Example:
sources:
my-postgresql-db:
port: 5432
⚠️ IMPORTANT
This parameter is unnecessary for MongoDB.
user
Defines the user to connect to the DBMS server.
Example:
sources:
my-postgresql-db:
user: root
password
Defines the DBMS user password.
Example:
sources:
my-postgresql-db:
password: MySecretPassword
database
Defines the name of the database to connect to.
Example:
sources:
my-postgresql-db:
database: mydatabase
options
This parameter defines optional parameters to be passed to the data provider.
Example:
sources:
mongo-db1:
provider: mongodb
host: mongodb://localhost:27017/
database: myDatabase
options:
connectTimeoutMS: 5000
serverSelectionTimeoutMS: 5000
ℹ️ NOTE
For more information about how to configure a data provider and its options, See: Data Providers Configurations
schemas
This section is used to declare virtual schemas.
A schema is a mapping of DBMS source and tables mapping, and it serves as the main access point. If you want to allow access to a database or a combination of databases, you must declare your schemas here to expose them to the API.
ℹ️ NOTE
When declaring a schema, the parameters that can be configured inside are: sourceName
and entities
.
⚠️ IMPORTANT
If there's no schemas
declaration in config.yml
, Metal will expose nothing to the API. In this case, you are planning to use Metal as a scheduled ETL tool (see: Use Case, CRON ETL).
Example:
schemas:
my-schema1:
sourceName: my-mssql-db
my-schema2:
entities:
my-entity1:
sourceName: my-mongodb-source
entityName: entity1
my-entity2:
sourceName: my-postgres-source
entityName: entity2
sourceName
Declare which source to use from the sources
section. (See: sources)
⚠️ IMPORTANT
Only one sourceName
is allowed in a schema declaration.
Example:
schemas:
my-schema1:
sourceName: my-mssql-db
entities
Used to declare each entity from sources. It is possible to declare entities from different sources. Only declared entities are visible.
When declaring an entity, two parameters must be configured inside:
sourceName
: the name of a declared sourceentityName
: the name of an entity that is in the source
⚠️ IMPORTANT
Only one entities
section is allowed in a schema declaration.
Example:
schemas:
my-schema2:
entities:
my-entity1:
sourceName: my-mongodb-source
entityName: entity1
my-entity2:
sourceName: my-plan
entityName: entity2
ℹ️ TIP
It is possible to combine sourceName
and entities
in the same schema declaration.
Example:
schemas:
my-merged-schema:
sourceName: my-mssql-source
entities:
my-entity1:
sourceName: my-mongodb-source
entityName: entity1
my-entity2:
sourceName: my-postgres-source
entityName: entity2
ai-engines
This section declares AI engine processors to be used in plans with the command run
(see: run). Implemented AI engines and models include:
- Tesseract.js: a pure JavaScript port of the popular Tesseract OCR engine (implemented models: All languages)
- TensorFlow.js: a JavaScript library for training and deploying machine learning models (implemented models: Image classification)
- NLP.js: a general natural language utility (implemented models: Sentiment Analysis, Guess the language)
ℹ️ NOTE
When declaring an AI engine, the parameters that can be configured inside are: engine
, model
, and options
.
You can find different values for engine
and model in the table below:
Value | AI Engine |
---|---|
tesseractjs | Tesseract.js |
tensorflowjs | TensorFlow.js |
nlpjs | NLP.js |
Tesseract.js
This AI supports more than 100 languages, automatic text orientation and script detection, and a simple interface for reading paragraph, word, and character bounding boxes.
To use this AI, set the engine
to tesseractjs
and the model
to the desired language (see: Tesseract OCR Data Files). If model
is not specified, Tesseract.js defaults to eng
for English.
ℹ️ NOTE
No options
are available for this AI engine.
Example
ai-engines:
my-ocr:
engine: tesseractjs
model: eng
TensorFlow.js
TensorFlow.js is an open-source JavaScript library that allows you to develop machine learning and deep learning models. It is a part of the TensorFlow ecosystem and provides a way to run machine learning models, including neural networks, in JavaScript.
Only image classification is implemented in this Metal version.
To use this AI, set the engine
to tensorflowjs
and the model
to image-classify
.
ℹ️ NOTE
No options
are available for this AI engine.
Example
ai-engines:
my-image-classifier:
engine: tensorflowjs
model: image-classify
NLP.js
NLP.js is an open-source JavaScript library for natural language processing (NLP) and natural language understanding (NLU). It provides a set of tools and functionalities to work with natural language text, including tasks such as tokenization, part-of-speech tagging, named entity recognition, sentiment analysis, and more. NLP.js is designed to help developers build applications that can understand and interact with human language.
To use this AI, set the engine
to nlpjs
and the model
to:
sentiment
for Sentiment Analysislang-guess
for Guessing the language
sentiment
This model accepts lang
for options
to define the language of the text to be analyzed. For supported languages, see NLP.js Sentiment Analysis Languages.
Example
ai-engines:
my-sentiment-analyzer:
engine: nlpjs
model: sentiment
options:
lang: en
guess-lang
This model accepts accept
for options
to define a list of languages to guess, separated by a comma. For supported languages, see NLP.js Language Support.
Example
ai-engines:
my-language-guesser:
engine: nlpjs
model: guess-lang
options:
accept: en,fr
plans
This section is used to declare plans which are an ETL steps. It can be used on the fly by calling a schema conneted to a plan or by scheduling as a job.
In each plan you must declare at least one entity in which the steps will be executed
Example
plans:
my-plan:
my-first-entity:
my-second-entity:
ℹ️ TIP
You may call a declared entity in the plan, in that case the steps hocked to the second entity will be executed
Example
plans:
my-plan:
my-first-entity:
- select:
schemaName: demo
entityName: users
fields: login, partner_id
my-second-entity:
- select:
schemaName: demo
entityName: contacts
fields: id, name, display_name
- join:
type: left
entityName: my-first-entity
leftField: partner_id
rightField: id
ℹ️ NOTE
a plan is a list of steps that can be :
Step command | Decription |
---|---|
select | to select data from an entity. If schema is not provided, actual plan's entity data will be used |
insert | to insert data to an entity. If schema is not provided, actual plan's entity data will be used |
delete | to delete data from an entity. If schema is not provided, actual plan's entity data will be used |
update | to update data of an entity. If schema is not provided, actual plan's entity data will be used |
debug | to enable steps debug |
break | to stop plan execution at this step |
join | to perform data joins (Left,Right,Inner,Full outer and Cross) |
sort | to sort actual data |
fields | fields to keep from actual data |
run | to run an AI Engine |
sync | to synchronize data from data source to a data destination |
select
To select data from an entity. If schema is not provided, actual plan's entity data will be used.
The parameters that can be configured inside select
tag are :
Name | Description |
---|---|
schemaName | name of schema |
entityName | name of entity in the schemaName |
fields | fields to keep, comma seperated. (see: Optional Parameters) |
filter | condition key:value to filter data. (see: Optional Parameters) |
filterExpression | free form condition to filter data. (see: Optional Parameters) |
sort | sort data, can be asc or desc . (see: Optional Parameters) |
cache | time in seconds to cache data. (see: Optional Parameters) |
Example
plans:
my-plan:
my-entity:
- select:
schemaName: demo
entityName: users
fields: login, partner_id
insert
To insert data to an entity. If schema is not provided, actual plan's entity data will be used
The parameters that can be configured inside insert
tag are :
Name | Description |
---|---|
schemaName | name of schema |
entityName | name of entity in the schemaName |
data | data to be inserted in the entityName . (see: Optional Parameters) |
Example
plans:
my-plan:
my-entity:
- insert:
schemaName: my-schema
entityName: search-engine
data:
- name: Google
url: https://www.google.com
- name: Yahoo
url: https://www.yahoo.com
- name: Bing
url: https://www.bing.com
delete
To delete data from an entity. If schema is not provided, actual plan's entity data will be used
The parameters that can be configured inside delete
tag are :
Name | Description |
---|---|
schemaName | name of schema |
entityName | name of entity in the schemaName |
filter | condition key:value to filter data. (see: Optional Parameters) |
filterExpression | free form condition to filter data. (see: Optional Parameters) |
Example
plans:
my-plan:
my-entity:
- delete:
schemaName: my-schema
entityName: users
filterExpression: "id >= 100"
update
To update data of an entity. If schema is not provided, actual plan's entity data will be used
The parameters that can be configured inside update
tag are :
Name | Description |
---|---|
schemaName | name of schema |
entityName | name of entity in the schemaName |
filter | condition key:value to filter data. (see: Optional Parameters) |
filterExpression | free form condition to filter data. (see: Optional Parameters) |
data | data to be replaced in the entityName . (see: Optional Parameters) |
Example
plans:
my-plan:
my-entity:
- update:
schemaName: my-schema
entityName: users
filter:
anonymize: true
data:
name: "******"
debug
Enable plan steps debugging to be visible in the metadata of the JSON return. It can be one of the following values : nothing, error
Example
plans:
my-plan:
my-entity:
- debug:
break
To stop execution of the plan at this step.
Example
plans:
my-plan:
my-entity:
- break:
join
To perform data joins (Left,Right,Inner,Full outer and Cross)
The parameters that can be configured inside join
tag are :
Name | Description |
---|---|
schemaName | name of schema. If not provided actual plan will be used as a schema |
entityName | name of entity in the schemaName |
type | Join type can be left ,right ,inner ,fullOuter ,cross |
leftField | Left field for equality with rightField |
rightField | Right field |
type
parameter
left
: for perfoming Left Joinright
: for perfoming Right Joininner
: for perfoming Inner JoinfullOuter
: for perfoming Full Outer Joincross
: for perfoming Cross Join
Example
plans:
my-plan:
my-first-entity:
- select:
schemaName: demo
entityName: users
fields: login, partner_id
my-second-entity:
- select:
schemaName: demo
entityName: contacts
fields: id, name, display_name
- join:
type: left
entityName: my-first-entity
leftField: partner_id
rightField: id
sort
To sort actual plan's entity data
This command accept a list of one or many entity's fields and sorting order :
asc
for ascendingdesc
for descending
If sorting order is not provided, ascending will be used
Example
plans:
my-plan:
my-second-entity:
- select:
schemaName: my-schema
entityName: contacts
fields: id, name, display_name
- sort:
id:
fields
To keep fields from actual plan's entity data
plans:
my-plan:
my-first-entity:
- select:
schemaName: demo
entityName: users
my-second-entity:
- select:
schemaName: demo
entityName: contacts
- join:
type: left
entityName: my-first-entity
leftField: partner_id
rightField: id
- fields: id, name, display_name
run
To run an AI Engine on actual plan's entity data.
The parameters that can be configured inside run
tag are :
Name | Description |
---|---|
ai | name of a declared AI Engine (see: ai-engines) |
input | input field to perform the processing |
output | Output result to be stored. If nothing is provided, the entire object will be stored in a field that has the AI Engine name. It accept a list of key:value where the key is a child of the result and the value is the renamed field in the plan's entity |
Example
plans:
my-plan:
my-entity:
- insert:
data:
- url: https://tesseract.projectnaptha.com/img/eng_bw.png
- url: https://jeroen.github.io/images/testocr.png
- url: https://www.srcmake.com/uploads/5/3/9/0/5390645/ocr_orig.png
- run:
ai: my-ocr
input: url
output:
confidence: ocr_confidence
text: ocr_text
sync
To synchronize data from source to destination. This will performs Update, Insert and Delete operations on the destination entity to be the exact copy of the data source.
The parameters that can be configured inside sync
tag are :
Name | Description |
---|---|
source.schemaName | name of source schema. If not provided actual plan will be used as a schema |
source.entityName | name of source entity in the source.schemaName |
destination.schemaName | name of destination schema. If not provided actual plan will be used as a schema |
destination.entityName | name of destination entity in the destination.schemaName |
on | field that exists in both source and destination entity. It will be used as unique identity for synchronization |
Example
plans:
my-plan:
my-entity:
- sync:
source:
schemaName: srcschema
entityName: users
destination:
schemaName: destschema
entityName: users
on: user_id
schedules
This section defines the scheduled execution of plans according to a Cron expression.
The parameters that can be configured inside schedule are :
Name | Type | Required | Description |
---|---|---|---|
planName | String | Y | name of the plan |
entityName | String | Y | name of the entity in the planName |
cron | String | Y | A cron expression string, or @start for once at Metal startup |
Example
schedules:
run my-plan every 5 minutes:
planName: my-plan
entityName: contact
cron: "*/5 * * * * *"
ℹ️ TIP
By using @start
as a cron expression, you can start the job once at Metal startup.
Example:
schedules:
run at startup:
planName: my-plan
entityName: contact
cron: "@start"