Supported File URL Formats for Writers

The File URL attribute let type file url directly, or it let you open URL File Dialog.

The URL shown below can also contain placeholders – dollar sign or hash sign.

[Important]Important

Dollar sign and hash sign serve for different purposes.

  • Dollar sign should be used when each of multiple output files contains only a specified number of records based on the Records per file attribute.

  • Hash sign should be used when each of multiple output files only contains records corresponding to the value of specified Partition key.

[Note]Note

Hash signs in URL examples in this section serve to separate a compressed file (zip, gz) from its contents. These are not placeholders!

[Important]Important

To ensure graph portability, forward slashes must be used when defining the path in URLs (even on Microsoft Windows).

Here we present some examples of possible URL for Writers:

Writing to Local Files

  • /path/filename.out

    Writes specified file on disk.

  • /path1/filename1.out;/path2/filename2.out

    Writes two specified files on disk.

  • /path/filename$.out

    Writes some number of files on disk. The dollar sign represents one digit. Thus, the output files can have the names from filename0.out to filename9.out. The dollar sign is used when Records per file is set.

  • /path/filename$$.out

    Writes some number of files on disk. Two dollar signs represent two digits. Thus, the output files can have the names from filename00.out to filename99.out. The dollar sign is used when Records per file is set.

  • /path/filename#.out

    Writes some number of files on disk. If Partition file tag is set to Key file tag, the hash sign in the file name is replaced with Partition key field value. Otherwise, the hash sign is replaced with number.

  • zip:(/path/file$.zip)

    Writes some number of compressed files on disk. The dollar sign represents one digit. Thus, the compressed output files can have the names from file0.zip to file9.zip. The dollar sign is used when Records per file is set.

  • zip:(/path/file$.zip)#innerfolder/filename.out

    Writes specified file inside the compressed files on disk. The dollar sign represents one digit. Thus, the compressed output files containing the specified filename.out file can have the names from file0.zip to file9.zip. The dollar sign is used when Records per file is set.

  • gzip:(/path/file$.gz)

    Writes some number of compressed files on disk. The dollar sign represents one digit. Thus, the compressed output files can have the names from file0.gz to file9.gz. The dollar sign is used when Records per file is set.

[Note]Note

Although CloverETL can read data from a .tar file, writing to a .tar file is not supported.

Writing to Remote Files

  • ftp://user:password@server/path/filename.out

    Writes specified filename.out file on remote server connected via ftp protocol using username and password.

  • sftp://user:password@server/path/filename.out

    Writes specified filename.out file on remote server connected via sftp protocol using username and password.

    If certificate-based authentication is used, certificates are placed in ${PROJECT}/ssh-keys/ directory and each private key file name has suffix .key. Only certificates without password are currently supported. The certificate-based authentication has URL without password:

    sftp://username@server/path/filename.txt

  • zip:(ftp://username:password@server/path/file.zip)#innerfolder/filename.txt

    Writes specified filename.txt file compressed in the file.zip file on remote server connected via ftp protocol using username and password.

  • zip:(ftp://username:password@server/path/file.zip)#innerfolder/filename.txt

    Writes specified filename.txt file compressed in the file.zip file on remote server connected via ftp protocol.

  • zip:(zip:(ftp://username:password@server/path/name.zip)#innerfolder/file.zip)#innermostfolder/filename.txt

    Writes specified filename.txt file compressed in the file.zip file that is also compressed in the name.zip file on remote server connected via ftp protocol using username and password.

  • gzip:(ftp://username:password@server/path/file.gz)

    Writes the first file compressed in the file.gz file on remote server connected via ftp protocol.

  • http://username:password@server/filename.out

    Writes specified filename.out file on remote server connected via WebDAV protocol using username and password.

  • s3://access_key_id:secret_access_key@s3.amazonaws.com/bucketname/path/filename.out

    Writes to path/filename.out object located in Amazon S3 web storage service in bucket bucketname using access key ID and secret access key.

    See Amazon S3 URL.

    It is recommended to connect to S3 via region-specific S3 URL: s3://s3.eu-central-1.amazonaws.com/bucket.name/. The region-specific URL have much better performance than the generic one (s3://s3.amazonaws.com/bucket.name/).

    See recommendation on Amazon S3 URL.

    [Note]Note

    s3:// URL protocol is available since CloverETL 4.1. More information about the deprecated http:// S3 protocol can be found in CloverETL 4.0 User Guide.

  • hdfs://CONN_ID/path/filename.dat

    Writes a file on the Hadoop distributed file system (HDFS). To which HDFS NameNode to connect to is defined in a Hadoop connection with ID CONN_ID. This example file URL writes a file with /path/filename.dat absolute HDFS path.

  • smb://domain%3Buser:password@server/path/filename.txt

    Writes a files to a Windows share (Microsoft SMB/CIFS protocol). The server part may be a DNS name, an IP address or a NetBIOS name. Userinfo part of the URL (domain%3Buser:password) is not mandatory and any URL reserved character it contains should be escaped using the %-encoding similarly as the semicolon ; character with %3B in the example (the semicolon is escaped because it collides with default Clover file URL separator). Also note that the dollar sign $ in the URL path (e.g. in case of writing to an Administrative share) is reserved for the file partitioning feature so it too needs be escaped (with %24).

    The SMB protocol is implemented in the JCIFS library which may be configured using Java system properties. See Setting Client Properties in JCIFS documentation for list of all configurable properties.

Writing to Output Port

  • port:$0.FieldName:discrete

    If this URL is used, output port of the Writer must be connected to another component. Output metadata must contain a FieldName of one of the following data types: string, byte or cbyte. Each data record that is received by the Writer through the input port is processed according to the input metadata, sent out through the optional output port, and written as the value of the specified field of the metadata of the output edge. Next records are parsed in the same way as described here.

Using Proxy in Writers

  • http:(direct:)//seznam.cz

    Without proxy.

  • http:(proxy://user:password@212.93.193.82:443)//seznam.cz

    Proxy setting for http protocol.

  • ftp:(proxy://user:password@proxyserver:1234)//seznam.cz

    Proxy setting for ftp protocol.

  • ftp:(proxy://proxyserver:443)//server/path/file.dat

    Proxy setting for ftp protocol.

  • sftp:(proxy://66.11.122.193:443)//user:password@server/path/file.dat

    Proxy setting for sftp protocol.

Writing to Dictionary

  • dict:keyName:source

    Writes data to a file URL specified in dictionary. Target file URL is retrieved from specified dictionary entry.

  • dict:keyName:discrete [1]

    Writes data to dictionary. Creates ArrayList<byte[]>

  • dict:keyName:stream [2]

    Writes data to dictionary. Creates WritableByteChannel

Sandbox Resource as Data Source

A sandbox resource, whether it is a shared, local or partitioned sandbox, is specified in the graph under the fileURL attributes as a so called sandbox URL like:

sandbox://data/path/to/file/file.dat

where "data" is code for sandbox and "path/to/file/file.dat" is the path to the resource from the sandbox root. URL is evaluated by CloverETL Server during graph execution and a component (reader or writer) obtains the opened stream from the server. This may be a stream to a local file or to some other remote resource. Thus, a graph does not have to run on the node which has local access to the resource. There may be more sandbox resources used in the graph and each of them may be on a different node. In such cases, CloverETL Server would choose the node with the most local resources to minimalize remote streams.

The sandbox URL has a specific use for parallel data processing. When the sandbox URL with the resource in a partitioned sandbox is used, that part of the graph/phase runs in parallel, according to the node allocation specified by the list of partitioned sandbox locations. Thus, each worker has its own local sandbox resource. CloverETL Server evaluates the sandbox URL on each worker and provides an open stream to a local resource to the component.

See also

Supported File URL Formats for Readers
URL File Dialog


[1]  The discrete processing type uses byte array for storing data.

[2]  The stream processing type uses an output stream that must be created before running a graph (from Java code).