Skip to content

Data Providers Configurations

These configurations define the setups required for connecting to various Data providers such as Microsoft SQL Server, PostgreSQL, MongoDB or files as data sources.

Each configuration specifies the necessary parameters such as host, port, user credentials, and additional options for optimal performance and security.

Azure SQL Database/Microsoft SQL Server

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to mssql for Azure SQL Database/Microsoft SQL Server
hostStringYServer to connect to. Use localhost\instance for named instances.
portIntegerNPort to connect to (default: 1433). Don't set when connecting to named instance.
userStringYUser name for authentication.
passwordStringYPassword for authentication.
databaseStringYDatabase to connect to (default: dependent on server configuration).

Optional parameters:

ParameterTypeDescription
domainStringDomain for domain login to SQL Server.
connectionTimeoutIntegerConnection timeout in milliseconds (default: 15000).
requestTimeoutIntegerRequest timeout in milliseconds (default: 15000).
streamBooleanStream recordsets/rows instead of returning them all at once as an argument of callback.
parseJSONBooleanParse JSON recordsets to JS objects.
arrayRowModeStringReturn row results as an array instead of a keyed object.
pool.maxIntegerMaximum number of connections in the pool (default: 10).
pool.minIntegerMinimum number of connections in the pool (default: 0).
pool.idleTimeoutMillisIntegerNumber of milliseconds before closing an unused connection in the pool (default: 30000).
options.encryptBooleanUse true for Azure.
options.trustServerCertificateBooleanUse true for local dev / self-signed certs.

Example:

yaml
sources:
  my-sql-db:
    provider: mssql       
    host: mydbserver      
    port: 1433            
    user: sa              
    password: myStr@ngpa$$w0rd  
    database: mydatabase  
    options:
      connectionTimeout: 15000
      requestTimeout: 15000   
      pool:
        max: 10                  
        min: 0                   
        idleTimeoutMillis: 30000 
      options:
        encrypt: false               
        trustServerCertificate: true

PostgreSQL

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to postgres for PostgreSQL
hostStringYServer to connect to.
portIntegerNPort to connect to (default: 5432).
userStringYUser name for authentication.
passwordStringYPassword for authentication.
databaseStringYDatabase to connect to (default: dependent on server configuration).

Optional parameters:

ParameterTypeDescription
connectionStringStringConnection string. Example:postgres://user:password@host:5432/database
sslStringOptions passed directly tonode.TLSSocket. Supports alltls.connect
typesStringCustom type parsers.
statement_timeoutNumberNumber of milliseconds before a statement in query will time out, default is no timeout.
query_timeoutNumberNumber of milliseconds before a query call will timeout, default is no timeout.
application_nameStringThe name of the application that created this Client instance.
connectionTimeoutMillisNumberNumber of milliseconds to wait for connection, default is no timeout.
idle_in_transaction_session_timeoutNumberNumber of milliseconds before terminating any session with an open idle transaction, default is no timeout.
idleTimeoutMillisNumberNumber of milliseconds a client must sit idle in the pool and not be checked out before it is disconnected, default is 10000 (10 seconds). Set to 0 to disable auto-disconnection of idle clients.
maxNumberMaximum number of clients the pool should contain, default is 10.
allowExitOnIdleBooleanSettingtrue allows the node event loop to exit as soon as all clients in the pool are idle. Default istrue.

Example:

yaml
sources:
  my-postgres-db:
    provider: postgres     
    host: mydbserver  
    port: 5432        
    user: admin       
    password: myStr@ngpa$$w0rd 
    database: mydatabase 
    options:
      connectionTimeoutMillis: 30000 
      idleTimeoutMillis: 10000
      max: 10                  
      allowExitOnIdle: true

Metal Server

This is used to connect to another instance of Metal Server via REST

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to metal for Metal Server
hostStringYURL of the target server to connect to
userStringYUser name for authentication.
passwordStringYPassword for authentication.
databaseStringYName of the schema on the remote Metal server

Example:

yaml
sources:
  my-metal-schema:
    provider: metal     
    host: http://metalserver:3001  
    user: myapiuser       
    password: myStr@ngpa$$w0rd 
    database: myschema

MongoDB

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to mongodb for MongoDB
hostStringYURI to connect to.
databaseStringYDatabase to connect to (default: dependent on server configuration).

Optional parameters:

ParameterTypeDefault ValueDescription
connectTimeoutMSinteger30000Specifies the number of milliseconds to wait before timeout on a TCP connection.
directConnectionbooleanfalseSpecifies whether to force dispatch all operations to the host specified in the connection URI.
familynumbernullSpecifies the version of the Internet Protocol (IP). Valid values are: 4, 6, 0, or null. 0 and null settings attempt to connect with IPv6 and fall back to IPv4 upon failure.
forceServerObjectIdbooleanfalseSpecifies whether to force the server to assign _id values to documents instead of the driver.
ignoreUndefinedbooleanfalseSpecifies whether the BSON serializer should ignore undefined fields.
keepAlivebooleantrueSpecifies whether to enable keepAlive on the TCP socket.
keepAliveInitialDelayinteger120000Specifies the number of milliseconds to wait before initiating keepAlive on the TCP socket.
loggerobjectnullSpecifies a custom logger for the client to use.
loggerLevelstringnullSpecifies the logger level used by the driver. Valid choices are: error, warn, info, and debug.
maxPoolSizeinteger100Specifies the maximum number of connections that a connection pool may have at a given time.
maxIdleTimeMSintegerSpecifies the maximum amount of time a connection can remain idle in the connection pool before being removed and closed.
minPoolSizeinteger0Specifies the minimum number of connections that must exist at any moment in a single connection pool.
noDelaybooleantrueSpecifies whether to use the TCP socket no-delay option.
pkFactoryobjectnullSpecifies a primary key factory object that generates custom _id keys.
promiseLibraryobjectnullSpecifies the Promise library class the application uses (e.g. Bluebird). This library must be compatible with ES6.
promoteBuffersbooleanfalseSpecifies whether to promote Binary BSON values to native Node.js Buffer type data.
promoteLongsbooleantrueSpecifies whether to convert Long values to a number if they fit inside 53 bits of resolution.
promoteValuesbooleantrueSpecifies whether to promote BSON values to Node.js native types when possible. When set to false, it uses wrapper types to present BSON values.
rawbooleanfalseSpecifies whether to return document results as raw BSON buffers.
serializeFunctionsbooleanfalseSpecifies whether to serialize functions on any object passed to the server.
serverApistring or enumnullSpecifies the API version that operations must conform to.
srvMaxHostsinteger0Sets the maximum number of hosts the driver can connect to when using the DNS seedlist (SRV) connection protocol, identified by the mongodb+srv connection string prefix. When set to 0, the driver does not limit the number of hosts.
srvServiceNamestringmongodbSpecifies the SRV record service name to which the driver should connect.
socketTimeoutMSinteger360000Specifies the number of milliseconds to wait before timeout on a TCP socket.
tlsbooleanfalseSpecifies whether to establish a Transport Layer Security (TLS) connection with the instance. This is automatically set to true when using a DNS seedlist (SRV) in the connection string. You can override this behavior by setting the value to false.
validateOptionsbooleanfalseSpecifies whether to error when the method parameters contain an unknown or incorrect option. If false, the driver produces warnings only.
waitQueueTimeoutMSinteger0Specifies the maximum amount of time in milliseconds that operation execution can wait for a connection to become available.
writeConcernstring or integernullSpecifies the write concern.

Example:

yaml
sources:
  my-mongodb-db:
    provider: mongodb
    host: mongodb://my-mongodb-server:27017/
    database: my-database
    options:
      maxIdleTimeMS: 15000
      connectTimeoutMS: 5000

Plans

Used to connect to a Metal ETL Plan

Parameters:

ParameterTypeRequiredDescription
providerStringYSet to plan for plan
databaseStringYName of the plan to connect to.

Example:

yaml
sources:
  my-plan-source:
    provider: plan
    database: my-plan

Memory

Used to store data in memory (Non-persistant)

Parameters:

ParameterTypeRequiredDescription
providerStringYSet to memory for In memory data provider

Example:

yaml
sources:
  my-memory:
    provider: memory

Files

Files is a unique data provider that offers a seamless experience akin to accessing tables while interacting with file-based data. This versatile tool accommodates various content types and storage options, catering to diverse user preferences and requirements.

Key features include:

  • File Content types: JSON, CSV
  • Storage Providers: Local File System, Azure Blob

Example:

yaml
sources:
  my-files:
    provider: files

Primary parameters:

ParameterTypeRequiredDescription
providerStringYSet to files for Files data provider

Optional parameters:

ParameterTypeDefault valueRequiredDescription
storageTypeStringYThe storage where the files are stored, see: Content Types
contentTypeStringYThe content type of files, see: Storage Types
autoCreateBooleanfalseNif set to true, when interacting with entities that do not exist, files with same entity name will be created automatically

Storage Types

Storage types can be set with the parameter options.storageType as shown in the example below:

yaml
sources:
  my-files:
    provider: files
    options:
      storageType: fileSystem

List of managed storage types:

ParameterDescription
fileSystemLocal file system
azureBlobAzure Blob Storage

Filesystem

This refers to the local file system

Optional Parameters:

ParameterTypeDefault valueRequiredDescription
storageTypeStringYSet to fileSystem for Local file system
fsFolderString.YThe path where files are stored

Example:

yaml
sources:
  my-local-files:
    provider: files
    options:
      storageType: fileSystem
      fsFolder: ./data/
      ...

Azure Blob Storage

This refers to use a Azure Blob Storage

Optional Parameters:

ParameterTypeRequiredDescription
storageTypeStringYSet to azureBlob for Azure Blob Storage
azureBlobConnectionStringStringYAzure Blob Connection String
azureBlobContainerNameStringYAzure Blob Container name
azureBlobCreateContainerIfNotExistsStringfalseIf set to true then the container will be created with the provided name

Example:

yaml
sources:
  my-local-files:
    provider: files
    options:
      storageType: azureBlob
      azureBlobConnectionString: UseDevelopmentStorage=true
      azureBlobContainerName: datacontainer1
      azureBlobCreateContainerIfNotExists: true
      ...

Content Types

Content types can be set with the parameter options.contentType as shown in the example below:

yaml
sources:
  my-files:
    provider: files
    options:
      contentType: json

List of managed content types:

ParameterDescription
jsonJSON files
csvCSV files

Optional parameters


JSON
ParameterTypeDefault valueDescription
jsonArrayPathString'' (empty string)the JSON path of the Data Array in the JSON file.

Example:

yaml
sources:
  my-json-files:
    provider: files
    options:
      contentType: json
      jsonArrayPath: rows
      ...
CSV
ParameterTypeDefault valueDescription
csvDelimiterString,The delimiting character.
csvNewlineString\nThe newline sequence. Must be one of \r, \n, or \r\n.
csvHeaderBooleantrueIf true, the first row of parsed data will be interpreted as field names.
csvQuoteCharString"The character used to quote fields.
csvSkipEmptyLinesString|Booleangreedy | true|falseIf true, lines that are completely empty (those which evaluate to an empty string) will be skipped. If set to greedy, lines that don't have any content (those which have only whitespace after parsing) will also be skipped.

Example:

yaml
sources:
  my-csv-files:
    provider: files
    options:
      contentType: csv
      csvDelimiter: ","
      csvNewline: "\n"
      csvHeader: true
      csvQuoteChar: '"'
      ...

Released under the GNU v3 License.