EmailFilter filters input records according to the specified condition.
|Component||Same input metadata||Sorted inputs||Inputs||Outputs||Java||CTL||Auto-propagated metadata|
|Input||0||For input data records||Any|
|Output||0||For valid data records||Input 0|
|1||For rejected data records||Any |
 Metadata on the output port 0 contain any of the input data fields plus up to two additional fields. Fields whose names are the same as those in the input metadata are filled in with input values of these fields.
Metadata cannot be propagated through this component.
Metadata on the output port 0 contain any of the input data fields plus up to two additional fields. Fields whose names are the same as those in the input metadata are filled in with input values of these fields.
Table 62.3. Error Fields for EmailFilter
|Field number||Field name||Data type||Description|
|FieldA||the Error field attribute value||string||Error field|
|FieldB||the Status field attribute value||integer ||Status field|
 The following error codes are the most common:
|Field list||yes||List of selected input field names whose values should be verified as valid or non-valid e-mail addresses. Expressed as a sequence of field names separated by colon, semicolon, or pipe.|
|Level of inspection||Various methods used for the e-mail address verification can be specified. Each level includes and extends its predecessor(s) on the left. See Level of Inspection for more information.||SYNTAX | DOMAIN (default) | SMTP | MAIL|
|Accept empty||By default, even empty field is accepted as a valid address.
This can be switched off by setting to ||true (default) | false|
|Error field||Name of the output field to which an error message can be written (for rejected records only).|
|Status field||Name of the output field to which an error code can be written (for rejected records only).|
|Multi delimiter||Regular expression that serves to split an individual field value to multiple e-mail addresses. If empty, each field is treated as a single e-mail address.||[,;] (default) | other|
|Accept condition||By default, a record is accepted even if at least one field value is verified
as a valid e-mail address.
If set to ||LENIENT (default) | STRICT|
|E-mail buffer size||Maximum number of records that are read into memory after which they are bulk processed. See Buffer and Cache Size for more information.||2000 (default) | 1-N|
|E-mail cache size||Maximum number of cached e-mail address verification results. See Buffer and Cache Size for more information.||2000 (default) | 0 (caching is turned off) | 1-N|
|Domain cache size||Maximum number of cached DNS query results. Is ignored at ||3000 (default) | 0 (caching is turned off) | 1-N|
|Domain retry timeout (ms)||Timeout in millisecond for each DNS query attempt. Thus, maximum time in milliseconds spent to resolving equals to Domain retry timeout multiplicated by Domain retry count.||800 (default) | 1-N|
|Domain retry count||Number of retries for failed DNS queries.||2 (default) | 1-N|
|Domain query A records||By default, according to the SMTP standard, if no MX
record could be found, A record should be searched. If set to
||true (default) | false|
|SMTP connect attempts (ms,...)||Attempts for connection and HELO. Expressed as a sequence of numbers separated by comma. The numbers are delays between individual attempts to connect.||1000,2000 (default)|
|SMTP anti-greylisting attempts (s,...)||Anti-greylisting feature. Attempts and delays between individual attempts expressed as a sequence of number separated by comma. If empty, anti-greylisting is turned off. See SMTP Grey-Listing Attempts for more information.||30,120,240 (default)|
|SMTP request timeout (s)||TCP timeout in seconds after which a SMTP request fails.||300 (default) | 1-N|
|SMTP concurrent limit||Maximum number of parallel tasks when anti-greylisting is on.||10 (default) | 1-N|
|Mail From||The ||CloverETL <firstname.lastname@example.org> (default) | other|
|Mail Subject||The ||Hello, this is a test message (default) | other|
|Mail Body||The ||Hello,\nThis is CloverETL text message.\n\nPlease ignore and don't respond. Thank you, have a nice day! (default) | other|
EmailFilter receives incoming records through its input port and verifies specified fields for valid e-mail addresses. Data records that are accepted as valid are sent out through the optional first output port if connected. Specified fields from the rejected inputs can be sent out through the optional second output port, if it is connected to other component. Metadata on the optional second output port may also contain up to two additional fields with information about an error.
Increasing E-mail buffer size avoids unnecessary repeated queries to DNS system and SMTP servers by processing more records in a single query. On the other hand, increasing E-mail cache size might produce even better performance since addresses stored in cache can be verified in an instant. However, both parameters require extra memory so set it to the largest values you can afford on your system.
By default, even an empty field from input data records
specified in the List of fields is considered to
be a valid e-mail address. The Accept empty
attribute is set to
true by default. If you want to
be more strict, you can switch this attribute to
In other words, this means that at least one valid e-mail address is sufficient for considering the record accepted.
On the other hand, in case of Accept
condition set to
STRICT, all e-mail
addresses in the List of fields must be valid
(either including or excluding empty values depending on the
Accept empty attribute).
Thus, be careful when setting these two attributes:
Accept empty and Accept
condition. If there is an empty field among fields
specified in List of fields, and all other
non-empty values are verified as invalid addresses, such record gets
accepted if both Accept condition is set to
LENIENT and Accept empty is
true. However, in reality, such record does
not contain any useful and valid e-mail address, it contains only an
empty string which assures that such record is accepted.
At the first level of validation
SYNTAX), the syntax of e-mail expressions is
checked and even both non-strict conditions and international
characters (except TLD) are allowed.
At the second level of validation
DOMAIN) - which is the default one - a DNS
system is queried for domain validity and mail exchange server
information. The following four attributes can be set to optimize
the ratio of performance to false-negative responses:
Domain cache size, Domain retry
timeout, Domain retry count and
Domain query A records. The number of queries
sent to a DNS server is specified by the Domain retry
count attribute. Its default value is 2. Time interval
between individual queries that are sent is defined by
Domain retry timeout in milliseconds. By
default, it is set to 800 milliseconds. Thus, the whole time during which
the queries are being resolved is equal to Domain retry
count x Domain retry timeout. The
results of queries can be cached. The number of cached results is
defined by Domain cache size. By default,
3000 results are cached. If you set this attribute to 0, you turn
the caching off. You can also decide whether A records should be
searched, if no MX record is found (Domain query A
records). By default, it is set to
true. Thus, A record is searched, if MX record
is not found. However, you can switch this off by setting the
false. This way you can speed the
searching two times, although this breaks the SMTP
At the third level of validation (
attempts are made to connect SMTP server. You need to specify the
number of attempts and time intervals between individual attempts.
This is defined using the SMTP connect
attempts attribute. This attribute is a sequence of
integer numbers separated by commas. Each number is the time (in
seconds) between two attempts to connect the server. Thus, the
first number is the interval between the first and the second
attempts, the second number is the interval between the second and
the third attempts, etc. The default value is three attempts with
time intervals between the first and the second attempts equal to
1000 and between the second and the third attempts equal to 2000
Additionally, the EmailFilter component
SMTP/MAIL test for specified number of times
and delays. Only after the last retry fails, the address is
considered as invalid.
At the fourth level (
<email@example.com>, its subject is
Hello, this is a test message. And its default
body is as follows:
Hello,\nThis is CloverETL test
message.\n\nPlease ignore and don't respond. Thank you and have a
To turn anti-greylisting feature, you can specify the SMTP grey-listing attempts attribute. Its default value is 30,120,240. These numbers means that four attempts can be made with time intervals between them that equal to 30 seconds (between the first and the second), 120 seconds (between the second and the third) and 240 seconds (between the third and the fourth). You can change the default values by any other comma separated sequence of integer numbers. The maximum number of parallel tasks that are performed when anti-greylisting is turned on is specified by the SMTP concurrent limit attribute. Its default value is 10.