Skip to content

Data Providers Configurations

These configurations define the setups required for connecting to various Data providers such as Microsoft SQL Server, PostgreSQL, MongoDB or files as data sources.

Each configuration specifies the necessary parameters such as host, port, user credentials, and additional options for optimal performance and security.

Azure SQL Database/Microsoft SQL Server ^0.1

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to mssql for Azure SQL Database/Microsoft SQL Server
hostStringYServer to connect to. Use localhost\instance for named instances.
portIntegerNPort to connect to (default: 1433). Don't set when connecting to named instance.
userStringYUser name for authentication.
passwordStringYPassword for authentication.
databaseStringYDatabase to connect to (default: dependent on server configuration).

Optional parameters:

ParameterTypeDescription
domainStringDomain for domain login to SQL Server.
connectionTimeoutIntegerConnection timeout in milliseconds (default: 15000).
requestTimeoutIntegerRequest timeout in milliseconds (default: 15000).
streamBooleanStream recordsets/rows instead of returning them all at once as an argument of callback.
parseJSONBooleanParse JSON recordsets to JS objects.
arrayRowModeStringReturn row results as an array instead of a keyed object.
pool.maxIntegerMaximum number of connections in the pool (default: 10).
pool.minIntegerMinimum number of connections in the pool (default: 0).
pool.idleTimeoutMillisIntegerNumber of milliseconds before closing an unused connection in the pool (default: 30000).
options.encryptBooleanUse true for Azure.
options.trustServerCertificateBooleanUse true for local dev / self-signed certs.

Example:

yaml
sources:
  my-sql-db:
    provider: mssql
    host: mydbserver
    port: 1433
    user: sa
    password: myStr@ngpa$$w0rd
    database: mydatabase
    options:
      connectionTimeout: 15000
      requestTimeout: 15000
      pool:
        max: 10
        min: 0
        idleTimeoutMillis: 30000
      options:
        encrypt: false
        trustServerCertificate: true

PostgreSQL ^0.1

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to postgres for PostgreSQL
hostStringYServer to connect to.
portIntegerNPort to connect to (default: 5432).
userStringYUser name for authentication.
passwordStringYPassword for authentication.
databaseStringYDatabase to connect to (default: dependent on server configuration).

Optional parameters:

ParameterTypeDescription
connectionStringStringConnection string. Example:postgres://user:password@host:5432/database
sslStringOptions passed directly tonode.TLSSocket. Supports alltls.connect
typesStringCustom type parsers.
statement_timeoutNumberNumber of milliseconds before a statement in query will time out, default is no timeout.
query_timeoutNumberNumber of milliseconds before a query call will timeout, default is no timeout.
application_nameStringThe name of the application that created this Client instance.
connectionTimeoutMillisNumberNumber of milliseconds to wait for connection, default is no timeout.
idle_in_transaction_session_timeoutNumberNumber of milliseconds before terminating any session with an open idle transaction, default is no timeout.
idleTimeoutMillisNumberNumber of milliseconds a client must sit idle in the pool and not be checked out before it is disconnected, default is 10000 (10 seconds). Set to 0 to disable auto-disconnection of idle clients.
maxNumberMaximum number of clients the pool should contain, default is 10.
allowExitOnIdleBooleanSettingtrue allows the node event loop to exit as soon as all clients in the pool are idle. Default istrue.

Example:

yaml
sources:
  my-postgres-db:
    provider: postgres
    host: mydbserver
    port: 5432
    user: admin
    password: myStr@ngpa$$w0rd
    database: mydatabase
    options:
      connectionTimeoutMillis: 30000
      idleTimeoutMillis: 10000
      max: 10
      allowExitOnIdle: true

Metal Server ^0.2

This is used to connect to another instance of Metal Server via REST

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to metal for Metal Server
hostStringYURL of the target server to connect to
userStringYUser name for authentication.
passwordStringYPassword for authentication.
databaseStringYName of the schema on the remote Metal server

Example:

yaml
sources:
  my-metal-schema:
    provider: metal
    host: http://metalserver:3001
    user: myapiuser
    password: myStr@ngpa$$w0rd
    database: myschema

MongoDB ^0.1

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to mongodb for MongoDB
hostStringYURI to connect to.
databaseStringYDatabase to connect to (default: dependent on server configuration).

Optional parameters:

ParameterTypeDefault ValueDescription
connectTimeoutMSinteger30000Specifies the number of milliseconds to wait before timeout on a TCP connection.
directConnectionbooleanfalseSpecifies whether to force dispatch all operations to the host specified in the connection URI.
familynumbernullSpecifies the version of the Internet Protocol (IP). Valid values are: 4, 6, 0, or null. 0 and null settings attempt to connect with IPv6 and fall back to IPv4 upon failure.
forceServerObjectIdbooleanfalseSpecifies whether to force the server to assign _id values to documents instead of the driver.
ignoreUndefinedbooleanfalseSpecifies whether the BSON serializer should ignore undefined fields.
keepAlivebooleantrueSpecifies whether to enable keepAlive on the TCP socket.
keepAliveInitialDelayinteger120000Specifies the number of milliseconds to wait before initiating keepAlive on the TCP socket.
loggerobjectnullSpecifies a custom logger for the client to use.
loggerLevelstringnullSpecifies the logger level used by the driver. Valid choices are: error, warn, info, and debug.
maxPoolSizeinteger100Specifies the maximum number of connections that a connection pool may have at a given time.
maxIdleTimeMSintegerSpecifies the maximum amount of time a connection can remain idle in the connection pool before being removed and closed.
minPoolSizeinteger0Specifies the minimum number of connections that must exist at any moment in a single connection pool.
noDelaybooleantrueSpecifies whether to use the TCP socket no-delay option.
pkFactoryobjectnullSpecifies a primary key factory object that generates custom _id keys.
promiseLibraryobjectnullSpecifies the Promise library class the application uses (e.g. Bluebird). This library must be compatible with ES6.
promoteBuffersbooleanfalseSpecifies whether to promote Binary BSON values to native Node.js Buffer type data.
promoteLongsbooleantrueSpecifies whether to convert Long values to a number if they fit inside 53 bits of resolution.
promoteValuesbooleantrueSpecifies whether to promote BSON values to Node.js native types when possible. When set to false, it uses wrapper types to present BSON values.
rawbooleanfalseSpecifies whether to return document results as raw BSON buffers.
serializeFunctionsbooleanfalseSpecifies whether to serialize functions on any object passed to the server.
serverApistring or enumnullSpecifies the API version that operations must conform to.
srvMaxHostsinteger0Sets the maximum number of hosts the driver can connect to when using the DNS seedlist (SRV) connection protocol, identified by the mongodb+srv connection string prefix. When set to 0, the driver does not limit the number of hosts.
srvServiceNamestringmongodbSpecifies the SRV record service name to which the driver should connect.
socketTimeoutMSinteger360000Specifies the number of milliseconds to wait before timeout on a TCP socket.
tlsbooleanfalseSpecifies whether to establish a Transport Layer Security (TLS) connection with the instance. This is automatically set to true when using a DNS seedlist (SRV) in the connection string. You can override this behavior by setting the value to false.
validateOptionsbooleanfalseSpecifies whether to error when the method parameters contain an unknown or incorrect option. If false, the driver produces warnings only.
waitQueueTimeoutMSinteger0Specifies the maximum amount of time in milliseconds that operation execution can wait for a connection to become available.
writeConcernstring or integernullSpecifies the write concern.

Example:

yaml
sources:
  my-mongodb-db:
    provider: mongodb
    host: mongodb://my-mongodb-server:27017/
    database: my-database
    options:
      maxIdleTimeMS: 15000
      connectTimeoutMS: 5000

Plan ^0.2

Used to connect to a Metal ETL Plan

Parameters:

ParameterTypeRequiredDescription
providerStringYSet to plan for plan
databaseStringYName of the plan to connect to.

Example:

yaml
sources:
  my-plan-source:
    provider: plan
    database: my-plan

Memory ^0.2

Non-persistant Memory Database

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to memory for In memory data provider

Optional parameters:

ParameterTypeDefault ValueRequiredDescription
autocreateBooleanfalseNif set to true, entity will be created automatically

Example:

yaml
sources:
  my-memory:
    provider: memory
    options:
      autocreate: true

MySql //TODO

Files ^0.3

Files is a unique data provider that offers a seamless experience akin to accessing tables while interacting with file-based data. This versatile tool accommodates various content types and storage options, catering to diverse user preferences and requirements.

Example:

yaml
sources:
  my-files:
    provider: files

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to files for Files data provider

Optional parameters:

ParameterTypeDefault valueRequiredDescription
storageStringYThe storage where the files are stored, see: Storage Types
contentObjectYContains pattern of files and associated content type, including JSON, CSV, and XLS, with optional parameters for customizing the content type settings., see: Content Types
autocreateBooleanfalseNif set to true, when interacting with entities that do not exist, files with same entity name will be created automatically

Storage Types

Storage types can be set with the parameter options.storage as shown in the example below:

yaml
sources:
  my-files:
    provider: files
    options:
      storage: fs

List of managed storage types:

ParameterDescription
az-blobAzure Blob Storage
fsLocal file system
ftpFTP server

Filesystem

This refers to the local file system

Optional Parameters:

ParameterTypeDefault valueRequiredDescription
storageStringYSet to fs for Local file system
autocreateBooleanfalseNif set to true, entity will be created automatically
fs-folderString.YThe path where files are stored

Example:

yaml
sources:
  my-local-files:
    provider: files
    options:
      storage: fs
      fs-folder: ./data/
      ...

Azure Blob Storage

This refers to use a Azure Blob Storage

Optional Parameters:

ParameterTypeRequiredDescription
storageStringYSet to azureBlob for Azure Blob Storage
az-blob-connection-stringStringYAzure Blob Connection String
az-blob-containerStringYAzure Blob Container name
az-blob-autocreateStringfalseIf set to true then the container will be created with the provided name

Example:

yaml
sources:
  my-local-files:
    provider: files
    options:
      storage: az-blob
      az-blob-connection-string: UseDevelopmentStorage=true
      az-blob-container: datacontainer1
      az-blob-autocreate: true
      ...

FTP Server

This refers to use a FTP server

Optional Parameters:

ParameterTypeDefault ValueRequiredDescription
storageStringYSet to ftp for FTP server
autocreateBooleanfalseNif set to true, entity will be created automatically
ftp-hostStringYFTP server host
ftp-portNumber (1-65535)21NFTP server port
ftp-userStringYFTP server username
ftp-passwordStringYFTP server password
ftp-secureBooleanfalseNEnable Secure FTP connection (FTPS)
ftp-folderString/NRemote folder on the FTP server

Example:

yaml
sources:
  my-ftp-files:
    provider: files
    options:
      storage: ftp
      ftp-host: ftp.server.com
      ftp-port: 21
      ftp-user: ftp-user
      ftp-password: ftppass
      ftp-folder: /
      ...

Content Types

Content types can be set with the parameter options.content where you can associate a content type to a file pattern, as shown in the example below. This feature allows for flexible data processing and supports various file formats, including JSON, CSV, and XLS.

It acts also as a filter to determine the list of files to process (see: REST API Entity Listing). By specifying the content type for each file pattern, you can efficiently manage and process your data.

Example:

yaml
sources:
  my-files:
    provider: files
    options:
      content:
        "*.json":
          type: json
        "*.csv":
          type: csv
        "sample_*.xlsx":
          type: xls
          xls-sheet: Sheet2
        "my-other-files_*.xlsx":
          type: xls
          xls-sheet: Sheet1

List of managed content types:

ParameterDescription
jsonJSON files
csvCSV files
xlsXLSX files (Excel 2007+)

JSON

ParameterTypeDefault valueDescription
json-pathString'' (empty string)the JSON path of the Data Array in the JSON file.

Example:

yaml
sources:
  my-json-files:
    provider: files
    options:
      content:
        "*.json":
          type: json
          json-path: rows
      ...

CSV

ParameterTypeDefault valueDescription
csv-delimiterString,The delimiting character.
csv-newlineString\nThe newline sequence. Must be one of \r, \n, or \r\n.
csv-headerBooleantrueIf true, the first row of parsed data will be interpreted as field names.
csv-quoteString"The character used to quote fields.
csv-skip-emptyString|Booleangreedy | true|falseIf true, lines that are completely empty (those which evaluate to an empty string) will be skipped. If set to greedy, lines that don't have any content (those which have only whitespace after parsing) will also be skipped.

Example:

yaml
sources:
  my-csv-files:
    provider: files
    options:
      content:
        "*.csv":
          type: csv
          csv-delimiter: ","
          csv-newline: "\n"
          csv-header: true
          csv-quote: "\""
      ...

XLS

ℹ️ NOTE

Only XLSX files created with Excel 2007 and later are supported.

ParameterTypeDefault valueDescription
xls-sheetStringSpecify which sheet to use, default first sheet.
xls-starting-cellStringA1Specify the starting cell (e.g., "B2"), default "A1".
xls-defaultanyDefault value for empty cells.
xls-parse-datesBooleanfalseParse dates from cells, default false.
xls-date-formatStringSpecify the date format for parsing dates.

Example:

yaml
sources:
  my-xls-files:
    provider: files
    options:
      content:
        "*.xlsx":
          type: xls
          xls-sheet: Sheet1
          xls-starting-cell: E6
    ...

Released under the GNU v3 License.