Configuration File Reference
config.yml
is the Metal configuration file, a YAML file used to configure sources, schemas, and plans. This document describes version 1 of the config.yml
file format.
Example config.yml
version: "0.3"
server:
port: 3000
sources:
my-source:
provider: postgres
host: localhost
port: 5432
user: myuser
password: myStr@ngpa$$w0rd
database: mydatabase
schemas:
my-schema:
source: my-source
version
^0.1
Defines the version used for the configuration. Accepted values : 0.1
, 0.2
, 0.3
Example:
version: "0.3"
server
^0.1
Defines the configuration of the Metal server.
The parameters that can be configured inside the server
section include:
Parameter | Type | Required | Decription | Metal version |
---|---|---|---|---|
port | integer | N | Server TCP port | ^0.1 |
verbosity | string | N | Logging level | ^0.1 |
cache | object | N | Cache configuration | ^0.1 |
timezone | string | N | Timezone setting | ^0.1 |
authentication | string | Y | Configure user authentication | ^0.3 |
request-limit | string | N | Define request limit | ^0.1 |
response-limit | string | N | Define response limit | ^0.3 |
response-rate | object | N | Define response rate limit | ^0.3 |
Example:
server:
port: 3000
verbosity: debug
cache:
provider: mongodb
uri: mongodb://localhost:27017/
database: metal_cache
options:
connectTimeoutMS: 5000
serverSelectionTimeoutMS: 5000
port
Defines the Metal server's TCP port for API exposure.
Default: 3000
verbosity
Sets the console logging verbosity, which can be one of the following values:
trace
debug
info
warn
error
Default: warn
cache
Sets the Database server for storing cache objects. The configuration is the same as a source. (See: sources)
⚠️ IMPORTANT
This parameter must be configured if you plan to use the cache feature in Metal.
timezone
Sets the server's timezone.
A list of acceptable timezone values can be found here.
Default: UTC
authentication
Sets the authentication configuration for the Metal server.
The parameters that can be configured inside the server
section include:
Parameter | Type | Required | Default value | Decription | Metal version |
---|---|---|---|---|---|
provider | enum(string) | Y | local | Authentication Provider | ^0.3 |
default-role | string | N | (empty) | Default role assigned to the user when is authenticated. (see: roles) | ^0.3 |
autocreate | string | N | false | Populate automatically users with the authenticated user if not exist | ^0.3 |
Authentication providers :
Provider | Description | Metal version |
---|---|---|
local | Enables local authentication through Metal server using declared users | ^0.3 |
Example:
server:
authentication:
type: local
ℹ️ TIP
For more informations about authentication, roles and users, please refer to the Authentication guide.
request-limit
Controls the maximum request body size. If this is a number, then the value specifies the number of bytes; if it is a string, the value is passed to the bytes library for parsing. For supported values, see here.
When exeeded, an error PAYLOAD TOO LARGE(413) will occur.
Default: 10mb
response-limit
Controls the maximum response body size. If this is a number, then the value specifies the number of bytes; if it is a string, the value is passed to the bytes library for parsing. For supported values, see here.
When exeeded, an error CONTENT TOO LARGE(413) will occur.
Default: 10mb
response-rate
Controls the maximum request per window. The parameters that can be configured inside the response-rate
section include:
Parameter | Type | Required | Description | Metal version |
---|---|---|---|---|
windowMs | integer | Y | The time window for rate limiting, in milliseconds. For example, 60000 milliseconds (60 seconds). | ^0.3 |
max | integer | Y | The maximum number of requests allowed within the windowMs time window. For example, 600 requests. | ^0.3 |
message | string | N | The message to be sent when the rate limit is exceeded. This can be a custom message indicating that the user has made too many requests. For example, "Too many requests from this IP, please try again later." | ^0.3 |
If exeeded, an error TOO MANY REQUEST(429) will occur.
Default:
windowMs: 60000,
max: 600,
message: Too many requests from this IP, please try again later
roles
^0.3
Sets the list of roles with associated permissions used when authentication is enabled with server.authentication
. Each role is defined by a unique name and a string of permissions where each character represents a specific permission:
Permission | Description | Metal version |
---|---|---|
c | Create data | ^0.3 |
r | Read data | ^0.3 |
u | Update data | ^0.3 |
d | Delete data | ^0.3 |
a | Administrate server | ^0.3 |
l | List schema entities | ^0.3 |
Example:
roles:
admin: arl
all-rights: crudla
guest: r
users
^0.1
Declares a list of Metal users used when authentication is enabled with server.authentication
.
Example:
users:
admin: 123456
guest: "654321"
sources
^0.1
This section contains all source declarations and configurations that are applied to each database server endpoint connection, much like a connection string used in development.
Every source is declared with a name, followed by the appropriate data provider configuration and options if needed.
ℹ️ NOTE
When declaring a source, the parameters that can be configured inside are: provider
, host
, port
, user
, password
, database
, and options
.
Example:
sources:
my-postgresql-db:
provider: postgres
host: 192.168.1.113
port: 5433
user: root
password: Azerty123!
database: sampledb
my-ms-sql-db:
provider: mssql
host: 192.168.1.123
port: 1433
user: sa
password: Azerty123!
database: SampleDB
The parameters that can be configured inside a source include:
Parameter | Required | Decription | Metal version |
---|---|---|---|
provider | Y | Provider type | ^0.1 |
host | N | Host server | ^0.1 |
port | N | Host port | ^0.1 |
user | N | Provider user | ^0.1 |
password | N | Provider user password | ^0.1 |
database | Y | Provider database | ^0.1 |
options | N | Additional options | ^0.1 |
provider
Defines the data provider type.
The table below describes the different values that can be configured in the provider
parameter:
Value | DBMS Provider | Metal version |
---|---|---|
postgres | PostgreSQL | ^0.1 |
mssql | Azure Sql Database, Microsoft SQL Server | ^0.1 |
mongodb | MongoDB | ^0.1 |
plan | Connect to Metal Plan | ^0.2 |
files | Files as tables abstraction data provider | ^0.2 |
metal | Metal Server via REST | ^0.2 |
memory | Local Memory storage (Non-persistant) | ^0.2 |
For more detailed information about how to configure a data provider, See: Data Providers Configurations
ℹ️ NOTE
When using plan
as a data provider, you only need to provide the name of the plan as database
parameter. Example:
sources:
my-source-from-plan:
provider: plan
database: my-plan
Example:
If we want to declare a source named my-postgresql-db
using the PostgreSQL data provider, we write:
sources:
my-postgresql-db:
provider: postgres
host
Defines the DBMS server host.
Example:
sources:
my-postgresql-db:
host: 10.11.12.13
⚠️ IMPORTANT
For MongoDB, the host must be provided in the URI form mongodb://my-server:my-server-port/
. Example: mongodb://localhost:27017/
.
ℹ️ NOTE
MS SQL Server can be provided in the form MY-SERVER\MY-INSTANCE
.
port
Defines the DBMS TCP port.
Example:
sources:
my-postgresql-db:
port: 5432
⚠️ IMPORTANT
This parameter is unnecessary for MongoDB.
user
Defines the user to connect to the DBMS server.
Example:
sources:
my-postgresql-db:
user: root
password
Defines the DBMS user password.
Example:
sources:
my-postgresql-db:
password: MySecretPassword
database
Defines the name of the database to connect to.
Example:
sources:
my-postgresql-db:
database: mydatabase
options
This parameter defines optional parameters to be passed to the data provider.
Example:
sources:
mongo-db1:
provider: mongodb
host: mongodb://localhost:27017/
database: myDatabase
options:
connectTimeoutMS: 5000
serverSelectionTimeoutMS: 5000
ℹ️ NOTE
For more information about how to configure a data provider and its options, See: Data Providers Configurations
schemas
^0.1
This section is used to declare virtual schemas.
A schema is a mapping of DBMS source and tables mapping, and it serves as the main access point. If you want to allow access to a database or a combination of databases, you must declare your schemas here to expose them to the API.
ℹ️ NOTE
When declaring a schema, the parameters that can be configured inside are: source
and entities
.
⚠️ IMPORTANT
If there's no schemas
declaration in config.yml
, Metal will expose nothing to the API. In this case, you are planning to use Metal as a scheduled ETL tool (see: Use Case, CRON ETL).
Example:
schemas:
my-schema1:
source: my-mssql-db
my-schema2:
entities:
my-entity1:
source: my-mongodb-source
entity: entity1
my-entity2:
source: my-postgres-source
entity: entity2
The parameters that can be configured inside a source include:
Parameter | Decription | Metal version |
---|---|---|
source | Source to use | ^0.1 |
entities | Detailed entities configuration | ^0.1 |
source
Declare which source to use from the sources
section. (See: sources)
⚠️ IMPORTANT
Only one source
is allowed in a schema declaration.
Example:
schemas:
my-schema1:
source: my-mssql-db
entities
Used to declare each entity from sources. It is possible to declare entities from different sources. Only declared entities are visible.
When declaring an entity, two parameters must be configured inside:
source
: the name of a declared sourceentity
: the name of an entity that is in the source
⚠️ IMPORTANT
Only one entities
section is allowed in a schema declaration.
Example:
schemas:
my-schema2:
entities:
my-entity1:
source: my-mongodb-source
entity: entity1
my-entity2:
source: my-plan
entity: entity2
ℹ️ TIP
It is possible to combine source
and entities
in the same schema declaration.
Example:
schemas:
my-merged-schema:
source: my-mssql-source
entities:
my-entity1:
source: my-mongodb-source
entity: entity1
my-entity2:
source: my-postgres-source
entity: entity2
ai-engines
^0.1
This section declares AI engine processors like Tesseract.js and NLP.js to be used in plans with the command run
(see: run)
The parameters that can be configured inside an AI engine include:
Parameter | Decription | Metal version |
---|---|---|
engine | AI engine to use | ^0.1 |
model | Model handled by the AI engine | ^0.1 |
options | Additional options | ^0.1 |
For more detailed information about how to configure an AI engine, See: AI Engines Configurations
Example
ai-engines:
my-sentiment-analyzer:
engine: nlpjs
model: sentiment
options:
lang: en
plans
^0.1
This section is used to declare plans which are an ETL steps. It can be used on the fly by calling a schema conneted to a plan or by scheduling as a job.
In each plan you must declare at least one entity in which the steps will be executed
Example
plans:
my-plan:
my-first-entity:
my-second-entity:
ℹ️ TIP
You may call a declared entity in the plan, in that case the steps hocked to the second entity will be executed
Example
plans:
my-plan:
my-first-entity:
- select:
schema: demo
entity: users
fields: login, partner_id
my-second-entity:
- select:
schema: demo
entity: contacts
fields: id, name, display_name
- join:
type: left
entity: my-first-entity
left-field: partner_id
right-field: id
The steps that can be configured inside a plan can be:
Step command | Decription | Metal version |
---|---|---|
select | to select data from an entity. If schema is not provided, actual plan's entity data will be used | ^0.1 |
insert | to insert data to an entity. If schema is not provided, actual plan's entity data will be used | ^0.1 |
delete | to delete data from an entity. If schema is not provided, actual plan's entity data will be used | ^0.1 |
update | to update data of an entity. If schema is not provided, actual plan's entity data will be used | ^0.1 |
debug | to enable steps debug | ^0.1 |
break | to stop plan execution at this step | ^0.1 |
join | to perform data joins (Left,Right,Inner,Full outer and Cross) | ^0.1 |
sort | to sort actual data | ^0.1 |
fields | fields to keep from actual data | ^0.1 |
run | to run an AI Engine | ^0.1 |
sync | to synchronize data from data source to a data destination | ^0.2 |
anonymize | to anonymize data of given fields | ^0.3 |
remove-duplicates | to remove duplicated rows | ^0.3 |
list-entities | to list entities in a schema | ^0.3 |
list-entities
To list entities in a schema. If schema is not provided, a list of actual plan's entities will be returned.
The parameters that can be configured inside select
tag are :
Name | Decription | Metal version |
---|---|---|
schema | name of schema | ^0.3 |
Example
plans:
my-plan:
my-entity:
- list-entities:
schema: my-schema
select
To select data from an entity. If schema is not provided, actual plan's entity data will be returned.
The parameters that can be configured inside select
tag are :
Name | Decription | Metal version |
---|---|---|
schema | name of schema | ^0.1 |
entity | name of entity in the schema | ^0.1 |
fields | fields to keep, comma seperated. (see: Optional Parameters) | ^0.1 |
filter | condition key:value to filter data. (see: Optional Parameters) | ^0.1 |
filter-expression | free form condition to filter data. (see: Optional Parameters) | ^0.1 |
sort | sort data, can be asc or desc . (see: Optional Parameters) | ^0.1 |
cache | time in seconds to cache data. (see: Optional Parameters) | ^0.1 |
Example
plans:
my-plan:
my-entity:
- select:
schema: demo
entity: users
fields: login, partner_id
insert
To insert data to an entity. If schema is not provided, actual plan's entity data will be modified
The parameters that can be configured inside insert
tag are :
Name | Description | Metal version |
---|---|---|
schema | name of schema | ^0.1 |
entity | name of entity in the schema | ^0.1 |
data | data to be inserted in the entity . (see: Optional Parameters) | ^0.1 |
Example
plans:
my-plan:
my-entity:
- insert:
schema: my-schema
entity: search-engine
data:
- name: Google
url: https://www.google.com
- name: Yahoo
url: https://www.yahoo.com
- name: Bing
url: https://www.bing.com
delete
To delete data from an entity. If schema is not provided, actual plan's entity data will be modified
The parameters that can be configured inside delete
tag are :
Name | Description | Metal version |
---|---|---|
schema | name of schema | ^0.1 |
entity | name of entity in the schema | ^0.1 |
filter | condition key:value to filter data. (see: Optional Parameters) | ^0.1 |
filter-expression | free form condition to filter data. (see: Optional Parameters) | ^0.1 |
Example
plans:
my-plan:
my-entity:
- delete:
schema: my-schema
entity: users
filter-expression: "id >= 100"
update
To update data of an entity. If schema is not provided, actual plan's entity data will be modified
The parameters that can be configured inside update
tag are :
Name | Description | Metal version |
---|---|---|
schema | name of schema | ^0.1 |
entity | name of entity in the schema | ^0.1 |
filter | condition key:value to filter data. (see: Optional Parameters) | ^0.1 |
filter-expression | free form condition to filter data. (see: Optional Parameters) | ^0.1 |
data | data to be replaced in the entity . (see: Optional Parameters) | ^0.1 |
Example
plans:
my-plan:
my-entity:
- update:
schema: my-schema
entity: users
filter:
anonymize: true
data:
name: "******"
debug
Enable plan steps debugging to be visible in the metadata of the JSON return. It can be one of the following values : nothing, error
Example
plans:
my-plan:
my-entity:
- debug:
break
To stop execution of the plan at this step.
Example
plans:
my-plan:
my-entity:
- break:
join
To perform data joins (Left,Right,Inner,Full outer and Cross)
The parameters that can be configured inside join
tag are :
Name | Description | Metal version |
---|---|---|
schema | name of schema. If not provided actual plan will be used as a schema | ^0.1 |
entity | name of entity in the schema | ^0.1 |
type | Join type can be left ,right ,inner ,full-outer ,cross | ^0.1 |
left-field | Left field for equality with right-field | ^0.1 |
right-field | Right field | ^0.1 |
The type
parameter can be :
Value | Description | Metal version |
---|---|---|
left | Left Join | ^0.1 |
right | Right Join | ^0.1 |
inner | Inner Join | ^0.1 |
full-outer | Full Outer Join | ^0.1 |
cross | Cross Join | ^0.1 |
Example
plans:
my-plan:
my-first-entity:
- select:
schema: demo
entity: users
fields: login, partner_id
my-second-entity:
- select:
schema: demo
entity: contacts
fields: id, name, display_name
- join:
type: left
entity: my-first-entity
left-field: partner_id
right-field: id
sort
To sort actual plan's entity data
This command accept a list of one or many entity's fields and sorting order :
asc
for ascendingdesc
for descending
If sorting order is not provided, ascending will be used
Example
plans:
my-plan:
my-second-entity:
- select:
schema: my-schema
entity: contacts
fields: id, name, display_name
- sort:
id:
fields
To keep fields from actual plan's entity data
plans:
my-plan:
my-first-entity:
- select:
schema: demo
entity: users
my-second-entity:
- select:
schema: demo
entity: contacts
- join:
type: left
entity: my-first-entity
left-field: partner_id
right-field: id
- fields: id, name, display_name
run
To run an AI Engine on actual plan's entity data.
The parameters that can be configured inside run
tag are :
Name | Description | Metal version |
---|---|---|
ai | name of a declared AI Engine (see: ai-engines) | ^0.1 |
input | input field to perform the processing | ^0.1 |
output | Output result to be stored. If nothing is provided, the entire object will be stored in a field that has the AI Engine name. It accept a list of key:value where the key is a child of the result and the value is the renamed field in the plan's entity | ^0.1 |
Example
plans:
my-plan:
my-entity:
- insert:
data:
- url: https://tesseract.projectnaptha.com/img/eng_bw.png
- url: https://jeroen.github.io/images/testocr.png
- url: https://www.srcmake.com/uploads/5/3/9/0/5390645/ocr_orig.png
- run:
ai: my-ocr
input: url
output:
confidence: ocr_confidence
text: ocr_text
sync
To synchronize data from source to destination. This will performs Update, Insert and Delete operations on the destination entity to be the exact copy of the data source.
The parameters that can be configured inside sync
tag are :
Name | Description | Metal version |
---|---|---|
from.schema | name of source schema. If not provided actual plan will be used as a schema | ^0.3 |
from.entity | name of source entity in the from.schema | ^0.3 |
to.schema | name of destination schema. If not provided actual plan will be used as a schema | ^0.3 |
to.entity | name of destination entity in the to.schema | ^0.3 |
id | field that exists in both source and destination entity. It will be used as unique identity for synchronization | ^0.3 |
Example
plans:
my-plan:
my-entity:
- sync:
from:
schema: srcschema
entity: users
to:
schema: destschema
entity: users
id: user_id
anonymize
To anonymize data of given fields.
It can be unique field or a list of fields seperated with comma
Example
plans:
my-plan:
my-entity:
- anonymize: contact_name, company_name
remove-duplicates
The remove-duplicates
function is designed to remove duplicate rows from a dataset based on specified parameters. Here are the details:
Parameters | Type | Default value | Required | Description | Metal version |
---|---|---|---|---|---|
keys | Array(String) | (empty) | No | List of key(s) used for comparison | ^0.3 |
method | String | hash | No | Method of comparison | ^0.3 |
strategy | String | first | No | Strategy to adopt when duplicates are found | ^0.3 |
condition | String | (empty) | No | Condition to apply according to the selected strategy | ^0.3 |
Parameters
keys
An array of strings representing the keys to be used for identifying duplicates in the rows. If no keys are provided, the entire row will be considered for duplicate checking.
method
Defines the approach for comparing rows to identify duplicates. Options include:
hash
: Uses a hash function to generate unique values for each row based on the specified key(s).exact
: Compares the specified key(s) directly to find exact matches.ignorecase
: Performs a case-insensitive comparison of the specified key(s).
strategy
Specifies the action to take when duplicates are identified. Possible values are:
first
: Retains the first occurrence of each duplicate row.last
: Retains the last occurrence of each duplicate row.lowest
: Keeps the duplicate row with the lowest value in a specified field defined incondition
.highest
: Keeps the duplicate row with the highest value in a specified field defined incondition
.custom
: Applies a user-defined logic to decide which row to keep.
condition
Determines the condition to apply based on the chosen strategy.
It can be:
- For
lowest
andhighest
, the name of the field to evaluate.- For
custom
, a SQL predicate expression that defines the condition to retain the row (e.g.,age is not null and salary > 30000
).
These parameters provide flexible options for removing duplicates based on specific requirements and ensuring the integrity of the dataset.
Example
If we want to check duplicates with hash method for the rows that have the same id
, contact_name
adn then we keep the first row:
plans:
my-plan:
my-entity:
- remove-duplicates:
keys: # <- fields in the row to be used for comparison
- id
- contact_name
method: hash # <- method of comparison
strategy: first # <- 'first' for keeping the first found row
schedules
^0.1
This section defines the scheduled execution of plans according to a Cron expression.
The parameters that can be configured inside schedule are :
Name | Type | Required | Description | Metal version |
---|---|---|---|---|
plan | String | Y | name of the plan | ^0.1 |
entity | String | Y | name of the entity in the plan | ^0.1 |
cron | String | Y | A cron expression string, or @start for once at Metal startup | ^0.1 |
Example
schedules:
run my-plan every 5 minutes:
plan: my-plan
entity: contact
cron: "*/5 * * * * *"
ℹ️ TIP
By using @start
as a cron expression, you can start the job once at Metal startup.
Example:
schedules:
run at startup:
plan: my-plan
entity: contact
cron: "@start"