Skip to content

Data Providers Configurations

These configurations define the setups required for connecting to various Data providers such as Microsoft SQL Server, PostgreSQL, MongoDB or files as data sources.

Each configuration specifies the necessary parameters such as host, port, user credentials, and additional options for optimal performance and security.

Azure SQL Database/Microsoft SQL Server v0.1+

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to mssql for Azure SQL Database/Microsoft SQL Server
hostStringYServer to connect to. Use localhost\instance for named instances.
portIntegerNPort to connect to (default: 1433). Don't set when connecting to named instance.
userStringYUser name for authentication.
passwordStringYPassword for authentication.
databaseStringYDatabase to connect to (default: dependent on server configuration).

Optional parameters:

ParameterTypeDescription
domainStringDomain for domain login to SQL Server.
connectionTimeoutIntegerConnection timeout in milliseconds (default: 15000).
requestTimeoutIntegerRequest timeout in milliseconds (default: 15000).
streamBooleanStream recordsets/rows instead of returning them all at once as an argument of callback.
parseJSONBooleanParse JSON recordsets to JS objects.
arrayRowModeStringReturn row results as an array instead of a keyed object.
encryptBooleanUse true for Azure.
trustServerCertificateBooleanUse true for local dev / self-signed certs.

Example:

yaml
sources:
  my-sql-db:
    provider: mssql
    host: mydbserver
    port: 1433
    user: sa
    password: myStr@ngpa$$w0rd
    database: mydatabase
    options:
      encrypt: false
      trustServerCertificate: true
      connectionTimeout: 15000
      requestTimeout: 15000

PostgreSQL v0.1+

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to postgres for PostgreSQL
hostStringYServer to connect to.
portIntegerNPort to connect to (default: 5432).
userStringYUser name for authentication.
passwordStringYPassword for authentication.
databaseStringYDatabase to connect to (default: dependent on server configuration).

Optional parameters:

ParameterTypeDescription
connectionStringStringConnection string. Example:postgres://user:password@host:5432/database
sslStringOptions passed directly tonode.TLSSocket. Supports alltls.connect
typesStringCustom type parsers.
statement_timeoutNumberNumber of milliseconds before a statement in query will time out, default is no timeout.
query_timeoutNumberNumber of milliseconds before a query call will timeout, default is no timeout.
application_nameStringThe name of the application that created this Client instance.
connectionTimeoutMillisNumberNumber of milliseconds to wait for connection, default is no timeout.
idle_in_transaction_session_timeoutNumberNumber of milliseconds before terminating any session with an open idle transaction, default is no timeout.
idleTimeoutMillisNumberNumber of milliseconds a client must sit idle in the pool and not be checked out before it is disconnected, default is 10000 (10 seconds). Set to 0 to disable auto-disconnection of idle clients.
maxNumberMaximum number of clients the pool should contain, default is 10.
allowExitOnIdleBooleanSettingtrue allows the node event loop to exit as soon as all clients in the pool are idle. Default istrue.

Example:

yaml
sources:
  my-postgres-db:
    provider: postgres
    host: mydbserver
    port: 5432
    user: admin
    password: myStr@ngpa$$w0rd
    database: mydatabase
    options:
      connectionTimeoutMillis: 30000
      idleTimeoutMillis: 10000
      max: 10
      allowExitOnIdle: true

Metal Server v0.2+

This is used to connect to another instance of Metal Server via REST

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to metal for Metal Server
hostStringYURL of the target server to connect to
userStringYUser name for authentication.
passwordStringYPassword for authentication.
databaseStringYName of the schema on the remote Metal server

Example:

yaml
sources:
  my-metal-schema:
    provider: metal
    host: http://metalserver:3001
    user: myapiuser
    password: myStr@ngpa$$w0rd
    database: myschema

MongoDB v0.1+

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to mongodb for MongoDB
hostStringYURI to connect to.
databaseStringYDatabase to connect to (default: dependent on server configuration).

Optional parameters:

ParameterTypeDescription
connectTimeoutMSIntegerSpecifies the number of milliseconds to wait before timeout on a TCP connection. (default: 30000)
directConnectionBooleanSpecifies whether to force dispatch all operations to the host specified in the connection URI. (default: false)
familyNumberSpecifies the version of the Internet Protocol (IP). Valid values are: 4, 6, 0, or null. 0 and null settings attempt to connect with IPv6 and fall back to IPv4 upon failure. (default: null)
forceServerObjectIdBooleanSpecifies whether to force the server to assign _id values to documents instead of the driver. (default: false)
ignoreUndefinedBooleanSpecifies whether the BSON serializer should ignore undefined fields. (default: false)
keepAliveBooleanSpecifies whether to enable keepAlive on the TCP socket. (default: true)
keepAliveInitialDelayIntegerSpecifies the number of milliseconds to wait before initiating keepAlive on the TCP socket. (default: 120000)
maxPoolSizeIntegerSpecifies the maximum number of connections that a connection pool may have at a given time. (default: 100)
maxIdleTimeMSIntegerSpecifies the maximum amount of time a connection can remain idle in the connection pool before being removed and closed. (default: )
minPoolSizeIntegerSpecifies the minimum number of connections that must exist at any moment in a single connection pool. (default: 0)
noDelayBooleanSpecifies whether to use the TCP socket no-delay option. (default: true)
socketTimeoutMSIntegerSpecifies the number of milliseconds to wait before timeout on a TCP socket. (default: 360000)
tlsBooleanSpecifies whether to establish a Transport Layer Security (TLS) connection with the instance. This is automatically set to true when using a DNS seedlist (SRV) in the connection string. You can override this behavior by setting the value to false. (default: false)
waitQueueTimeoutMSIntegerSpecifies the maximum amount of time in milliseconds that operation execution can wait for a connection to become available. (default: 0)

Example:

yaml
sources:
  my-mongodb-db:
    provider: mongodb
    host: mongodb://my-mongodb-server:27017/
    database: my-database
    options:
      maxIdleTimeMS: 15000
      connectTimeoutMS: 5000

Plan v0.2+

Used to connect to a Metal ETL Plan

Parameters:

ParameterTypeRequiredDescription
providerStringYSet to plan for plan
databaseStringYName of the plan to connect to.

Example:

yaml
sources:
  my-plan-source:
    provider: plan
    database: my-plan

Memory v0.2+

Non-persistant Memory Database

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to memory for In memory data provider

Optional parameters:

ParameterTypeRequiredDescription
autocreateBooleanNif set to true, entity will be created automatically (default: false)

Example:

yaml
sources:
  my-memory:
    provider: memory
    options:
      autocreate: true

MySql v0.4+

The MySql data provider is used to connect to a MySql database. It supports various parameters to customize the connection and data retrieval.

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to mysql for MySql data provider
hostStringYServer to connect to (default: localhost).
portIntegerNPort to connect to (default: 3306).
userStringYUser name for authentication.
passwordStringYPassword for authentication.
databaseStringYDatabase to connect to.

Optional parameters:

ParameterTypeRequiredDescription
waitForConnectionsBooleanNDetermines the pool's action when no connections are available and the limit has been reached. If true, the pool will queue the connection request and call it when one becomes available. If false, the pool will immediately call back with an error. (Default: true)
connectionLimitNumberNThe maximum number of connections to create at once. (Default: 10)
maxIdleNumberNThe maximum number of idle connections. (Default: same as connectionLimit)
idleTimeoutNumberNThe idle connections timeout, in milliseconds. (Default: 60000)
queueLimitNumberNThe maximum number of connection requests the pool will queue before returning an error from getConnection. If set to 0, there is no limit to the number of queued connection requests. (Default: 0)
enableKeepAliveBooleanNEnable keep-alive on the socket. (Default: true)
keepAliveInitialDelayNumberNSets the initial delay (in milliseconds) before sending the first TCP keepalive probe on an idle socket. (Default: 0)

Example:

yaml
sources:
  my-mysql-db:
    provider: mysql
    host: mydbserver
    port: 3306
    user: root
    password: myStr@ngpa$$w0rd
    database: mydatabase
    options:
      waitForConnections: true
      connectionLimit: 10
      maxIdle: 10
      idleTimeout: 60000
      queueLimit: 0
      enableKeepAlive: true
      keepAliveInitialDelay: 0

CosmosDB v0.4+

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to cosmosdb for Azure CosmosDB
hostStringYThe endpoint URL of your CosmosDB account (e.g., https://your-account.documents.azure.com)
databaseStringYThe name of the database to connect to

Optional parameters:

ParameterTypeDefaultDescription
keyStringYThe primary or secondary key for your CosmosDB account
partitionKeyStringNThe partition key path for the container
maxRetriesInteger3Maximum number of retries for failed operations
requestTimeoutInteger60000Request timeout in milliseconds
connectionModeStringDirectConnection mode (Direct or Gateway)
protocolStringTcpProtocol to use (Tcp or Http)
retryAfterInteger1000Time to wait between retries in milliseconds

Example:

yaml
sources:
  my-cosmosdb:
    provider: cosmosdb
    host: https://mycosmos.documents.azure.com
    database: mydatabase
    options:
    key: your-primary-key-here
      partitionKey: /id
      maxRetries: 3
      requestTimeout: 60000
      connectionMode: Direct
      protocol: Tcp

Files v0.3+

Files is a unique data provider that offers a seamless experience akin to accessing tables while interacting with file-based data. This versatile tool accommodates various content types and storage options, catering to diverse user preferences and requirements.

Example:

yaml
sources:
  my-files:
    provider: files

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to files for Files data provider

Optional parameters:

ParameterTypeRequiredDescription
storageStringYThe storage where the files are stored, see: Storage Types
contentObjectYContains pattern of files and associated content type, including JSON, CSV, and XLS, with optional parameters for customizing the content type settings., see: Content Types
autocreateBooleanNif set to true, when interacting with entities that do not exist, files with same entity name will be created automatically (default: false)

storage v0.3+

Storage types can be set with the parameter options.storage as shown in the example below:

yaml
sources:
  my-files:
    provider: files
    options:
      storage: fs

List of managed storage types:

ParameterDescriptionMetal version
az-blobAzure Blob Storagev0.3+
az-fileAzure File Sharev0.4+
az-datalakeAzure Data Lake Storage Gen2v0.4+
fsLocal file systemv0.3+
ftpFTP serverv0.3+
smbSMB/CIFSv0.4+
s3Amazon S3v0.4+

fs (Filesystem) v0.3+

This refers to the local file system

Optional Parameters:

ParameterTypeRequiredDescription
storageStringYSet to fs for Local file system
autocreateBooleanNif set to true, entity will be created automatically (default: false)
fs-folderStringYThe path where files are stored (default: .)

Example:

yaml
sources:
  my-local-files:
    provider: files
    options:
      storage: fs
      fs-folder: ./data/
      ...

ftp (FTP Server) v0.3+

This refers to use a FTP server

Optional Parameters:

ParameterTypeRequiredDescription
storageStringYSet to ftp for FTP server
autocreateBooleanNif set to true, entity will be created automatically, default: false
ftp-hostStringYFTP server host
ftp-portNumber (1-65535)NFTP server port , default: 21
ftp-userStringYFTP server username
ftp-passwordStringYFTP server password
ftp-secureBooleanNEnable Secure FTP connection (FTPS) , default: false
ftp-folderStringNRemote folder on the FTP server , default: /

Example:

yaml
sources:
  my-ftp-files:
    provider: files
    options:
      storage: ftp
      ftp-host: ftp.server.com
      ftp-port: 21
      ftp-user: ftp-user
      ftp-password: ftppass
      ftp-folder: /
      ...

smb (SMB/CIFS) v0.4+

This refers to use a SMB/CIFS server

Optional Parameters:

ParameterTypeRequiredDescription
storageStringYSet to smb for SMB/CIFS server
autocreateBooleanNif set to true, entity will be created automatically, default: false
smb-hostStringYSMB/CIFS server host
smb-portNumber (1-65535)NSMB/CIFS server port , default: 445
smb-userStringYSMB/CIFS server username
smb-passwordStringYSMB/CIFS server password
smb-shareStringYSMB/CIFS share name
smb-folderStringNRemote folder on the SMB/CIFS server , default: /

Example:

yaml
sources:
  my-smb-files:
    provider: files
    options:
      storage: smb
      smb-host: smb.server.com
      smb-port: 445
      smb-user: smb-user
      smb-password: smb-pass
      smb-share: share
      smb-folder: /path/to/folder
      ...

az-blob (Azure Blob Storage) v0.3+

This refers to use a Azure Blob Storage

Optional Parameters:

ParameterTypeRequiredDescription
storageStringYSet to az-blob for Azure Blob Storage
az-blob-connection-stringStringYAzure Blob Connection String
az-blob-containerStringYAzure Blob Container name
az-blob-autocreateStringNIf set to true then the container will be created with the provided name, defult: false

Example:

yaml
sources:
  my-local-files:
    provider: files
    options:
      storage: az-blob
      az-blob-connection-string: UseDevelopmentStorage=true
      az-blob-container: datacontainer1
      az-blob-autocreate: true
      ...

az-file (Azure File Share) v0.4+

This refers to use an Azure File Share storage.

Optional Parameters:

ParameterTypeRequiredDescription
storageStringYSet to az-file for Azure File Share
autocreateBooleanNif set to true, entity will be created automatically, default: false
az-file-connection-stringStringYAzure Storage connection string
az-file-share-nameStringYAzure File Share name
az-file-directoryStringNRemote directory in the share, default: /

Example:

yaml
sources:
  my-az-files:
    provider: files
    options:
      storage: az-file
      az-file-connection-string: "DefaultEndpointsProtocol=https;AccountName=mystorageaccount;AccountKey=accountkey;EndpointSuffix=core.windows.net"
      az-file-share-name: myshare
      az-file-directory: /path/to/files
      ...

az-datalake (Azure Data Lake Storage Gen2) v0.4+

This refers to use Azure Data Lake Storage Gen2

Required Parameters:

ParameterTypeRequiredDescription
storageStringYSet to az-datalake for Azure Data Lake Storage Gen2
autocreateBooleanNif set to true, entity will be created automatically, default: false
az-datalake-storage-accountStringYAzure storage account name for authentication
az-datalake-storage-keyStringYAzure storage account key for authentication
az-datalake-container-nameStringYName of the container to store files in
az-datalake-endpointStringYAzure endpoint (default: core.windows.net for Azure, can be different for Azure Stack)

Example:

yaml
sources:
  my-az-datalake-files:
    provider: files
    options:
      storage: az-datalake
      az-datalake-storage-account: your-storage-account
      az-datalake-storage-key: your-storage-key
      az-datalake-container-name: your-container
      az-datalake-endpoint: core.windows.net
      ...

s3 (Amazon S3) v0.4+

This refers to use Amazon S3 storage

Required Parameters:

ParameterTypeRequiredDescription
storageStringYSet to s3 for Amazon S3 storage
autocreateBooleanNif set to true, entity will be created automatically, default: false
s3-access-key-idStringYAWS access key ID for authentication
s3-secret-access-keyStringYAWS secret access key for authentication
s3-regionStringYAWS region where the S3 bucket is located
s3-bucketStringYName of the S3 bucket to store files in
s3-endpointStringNOptional endpoint URL for S3-compatible services (default: AWS S3 endpoint)

Example:

yaml
sources:
  my-s3-files:
    provider: files
    options:
      storage: s3
      s3-access-key-id: your-access-key-id
      s3-secret-access-key: your-secret-access-key
      s3-region: us-east-1
      s3-bucket: your-bucket-name
      s3-endpoint: http://localhost:9000 # Optional for S3-compatible services
      ...

content v0.3+

Content types can be set with the parameter options.content where you can associate a content type to a file pattern, as shown in the example below. This feature allows for flexible data processing and supports various file formats, including JSON, CSV, and XLS.

It acts also as a filter to determine the list of files to process (see: REST API Entity Listing). By specifying the content type for each file pattern, you can efficiently manage and process your data.

Example:

yaml
sources:
  my-files:
    provider: files
    options:
      content:
        "*.json":
          type: json
        "*.csv":
          type: csv
        "sample_*.xlsx":
          type: xls
          xls-sheet: Sheet2
        "my-other-files_*.xlsx":
          type: xls
          xls-sheet: Sheet1

List of managed content types:

ParameterDescriptionMetal version
jsonJSON filesv0.3+
csvCSV filesv0.3+
xlsXLSX files (Excel 2007+)v0.3+
xmlXML filesv0.4+

json v0.3+

ParameterTypeDescription
📜 json-pathStringthe JSON path of the Data Array in the JSON file (default: empty string).

Example:

yaml
sources:
  my-json-files:
    provider: files
    options:
      content:
        "*.json":
          type: json
          json-path: rows
      ...

csv v0.3+

ParameterTypeDescription
csv-delimiterStringThe delimiting character (default: ,).
csv-newlineStringThe newline sequence. Must be one of \r, \n, or \r\n (default: \n).
csv-headerBooleanIf true, the first row of parsed data will be interpreted as field names (default: true).
csv-quoteStringThe character used to quote fields (default: ").
csv-skip-emptyString|BooleanIf true, lines that are completely empty (those which evaluate to an empty string) will be skipped. If set to greedy, lines that don't have any content (those which have only whitespace after parsing) will also be skipped (default: greedy).

Example:

yaml
sources:
  my-csv-files:
    provider: files
    options:
      content:
        "*.csv":
          type: csv
          csv-delimiter: ","
          csv-newline: "\n"
          csv-header: true
          csv-quote: "\""
      ...

xls v0.3+

ℹ️ NOTE

Only XLSX files created with Excel 2007 and later are supported.

ParameterTypeDescription
xls-sheetStringSpecify which sheet to use, default first sheet.
xls-starting-cellStringSpecify the starting cell (e.g., "B2"), default "A1".
xls-defaultAnyDefault value for empty cells.
xls-parse-datesBooleanParse dates from cells, default false.
xls-date-formatStringSpecify the date format for parsing dates.

Example:

yaml
sources:
  my-xls-files:
    provider: files
    options:
      content:
        "*.xlsx":
          type: xls
          xls-sheet: Sheet1
          xls-starting-cell: E6
    ...

xml v0.4+

ParameterTypeDescription
xml-pathStringSpecify the XML path to use, default whole XML.
xml-ignore-attributesBooleanIgnore XML attributes, default true.
xml-attribute-prefixStringPrefix for XML attributes, default @.
xml-remove-ns-prefixBooleanRemove namespace string from tag and attribute names, default true.

Example:

yaml
sources:
  my-xml-files:
    provider: files
    options:
      content:
        "*.xml":
          type: xml
          xml-path: data
          xml-ignore-attributes: false
          xml-attribute-prefix: "@"
          xml-remove-ns-prefix: true

WebService v0.4+

Used to connect to a WebService

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to webservice for Web Service
hostStringYURL of the target server to connect to

Optional parameters:

ParameterTypeRequiredDescription
typeStringYType of the webservice (see: Web service types)
endpointsObjectYList of Endpoints configuration for interacting with the websrvice

type

Defines the type of webservices:

ParameterDescriptionMetal version
restRESTful web servicev0.4+
soapSOAP web servicev0.4+

endpoints

this section contains the configuration for the endpoints of the webservice. Each endpoint is defined by a key (e.g. session) and an object that contains the configuration:

EndpointTypeRequiredDescription
sessionObjectNEndpoint configuration for session or login
collection-readObjectYEndpoint configuration for collection read
item-createObjectNEndpoint configuration for item to create
item-updateObjectNEndpoint configuration for item to update
item-deleteObjectNEndpoint configuration for item to delete

session

This endpoint is used to establish a connection with the webservice and obtain any necessary authentication tokens or session IDs.

The endpoint configuration includes the HTTP method to use, the relative URL to request, and any data to be sent with the request. Additionally, it can include headers to be added after a successful login session.

ParameterTypeDescriptionJS Context variable
📜 <method or operation>StringThe Key is the method or operation to use (e.g. get,listMovies). see: Method or Operation key$schema, $entity
📜 dataObjectData to send with the request. If not set, object in Optional Parameter data will be passed AsIs to the webservice$schema, $entity
📜 session-headersObjectHeaders to add after login is successful$schema, $entity, $request,$response

Method or Operation key:

  • For RESTful webservices, the key is the method (e.g. get) and the value is the relative URL to request
  • For SOAP webservices, the key is the operation (e.g. listMovies) with empty value

Example:

In this example we login to DummyJSON.com with sending username and password in the body of the POST request and storing the access token in a header.

yaml
rest-dummyjson: # https://dummyjson.com/docs
  provider: webservice
  host: https://dummyjson.com/
  options:
    type: rest
    endpoints:
      session: 
        post: /user/login
        data: 
          username: emilys
          password: emilyspass
        session-headers: 
          Authorization: "Bearer: ${{ $response.body.accessToken }}"

collection-read

This endpoint is used to read data from a collection. The endpoint configuration includes the HTTP method to use, the relative URL to request, and any data to be sent with the request.

ParameterTypeDescriptionJS Context variable
📜 <method or operation>StringThe Key is the method or operation to use (e.g. get,listMovies). see: Method or Operation key$schema, $entity
📜 dataObjectData to send with the request. If not set, object in Optional Parameter data will be passed AsIs to the webservice$schema, $entity
📜 responseStringresponse path to get data$schema, $entity, $request,$response

Example:

In this example we get a RESTful GET request a list of all dog breeds from Dog.ceo and return data in response.body.message

yaml
rest-dog: # https://dog.ceo/dog-api/documentation/
  provider: webservice
  host: https://dog.ceo/api
  options:
    type: rest
    endpoints:
      collection-read: 
        get: /breeds/list/all# [!code highlight]
        response: message# [!code highlight]

item-create

This endpoint is used to create a new item. The endpoint configuration includes the HTTP method to use, the relative URL to request, and any data to be sent with the request.

ParameterTypeDescriptionJS Context variable
📜 <method or operation>StringThe Key is the method or operation to use (e.g. get,listMovies). see: Method or Operation key$schema, $entity
📜 dataObjectData to send with the request. If not set, object in Optional Parameter data will be passed AsIs to the webservice$schema, $entity, $row

Example:

In this example we create item in a RESTful POST

yaml
rest-fakerestapi: # https://fakerestapi.azurewebsites.net/index.html
  provider: webservice
  host: https://fakerestapi.azurewebsites.net/api/v1/
  options:
    type: rest
    endpoints:
      item-create: 
        post: /

item-update

This endpoint is used to update an existing item. The endpoint configuration includes the HTTP method to use, the relative URL to request, and any data to be sent with the request.

ParameterTypeDescriptionJS Context variable
📜 <method or operation>StringThe Key is the method or operation to use (e.g. get,listMovies). see: Method or Operation key$schema, $entity
📜 dataObjectData to send with the request. If not set, object in Optional Parameter data will be passed AsIs to the webservice$schema, $entity, $row

Example:

In this example we update item in a RESTful PUT using JS Context variables $schema and $row to build the URL

yaml
rest-fakerestapi: # https://fakerestapi.azurewebsites.net/index.html
  provider: webservice
  host: https://fakerestapi.azurewebsites.net/api/v1/
  options:
    type: rest
    endpoints:
      item-update: 
        put: /${{ $entity }}/${{ $row.id }}

item-delete

This endpoint is used to delete an existing item. The endpoint configuration includes the HTTP method to use, the relative URL to request, and any data to be sent with the request.

ParameterTypeDescriptionJS Context variable
📜 <method or operation>StringThe Key is the method or operation to use (e.g. get,listMovies). see: Method or Operation key$schema, $entity
📜 dataObjectData to send with the request. If not set, object in Optional Parameter data will be passed AsIs to the webservice$schema, $entity, $row

Example:

In this example we update item in a RESTful DELETE using JS Context variables $schema and $row to build the URL

yaml
rest-fakerestapi: # https://fakerestapi.azurewebsites.net/index.html
  provider: webservice
  host: https://fakerestapi.azurewebsites.net/api/v1/
  options:
    type: rest
    endpoints:
      item-delete: 
        delete: /${{ $entity }}/${{ $row.id }}

Released under the GNU v3 License.