Fast Way to Build Test Data Files

Regardless of the fact that modern solutions use XML format and databases for data access and exchange, flat or text files are still important. The legacy systems support and evolution require test files in the plain text format. Also, it is very fast format because no complex parsing like XML is required.

The test text file can be created of modified in any popular editor. It is the simplest file format. At the other hand, you can't create million data rows manually with acceptable uniqueness and quality. That means some kind of automation is required for data creation.

DTM Flat File Generator: import structure options

There are two ways: create custom scripts or utilize the software designed for test data generation purpose. In this article, we dwell on DTM Flat File Generator.

Define Structure of Test File

The first problem that the user of test data generator has is file structure initializing. DTM Flat File Generator offers three ways to do that:

  1. Use existing data file as an example. The program will import column structure, separators and other information about existing data. This way seems preferred if you already have test file and need to generate more rows.
  2. Import structure from the text file with column definitions: name, data type and size. Data type and size are optional and can be used for fixed-width files.
  3. Enter the structure manually.

The software offers the complete set of operation for file structure: adding a new column in the required position, column modification or deletion and changing the column order.

DTM Flat File Generator: predefined and custom data generators

Built-in Data Generators Menu

The next step is assigning data generator for each column. The generator is a description of data to be generated for the column. Sample generator is "Random integer between 100 and 250" or "Country code".

At the moment, the program has 23 built-in generators collected into four groups: general or random data (integer, string, date, time), personal data like name or e-mail, geographic data like city or ZIP code and business data like company name or web site address.

Also, the program has three configurable general purpose generators: by the regular expression, by example and custom generator uses pattern engine. The pattern engine is built-in data generation language that helps users to describe complex data with relationships or references. Moreover, it allows users to specify more test data properties and access external data sources.

With the custom generator and $Table / $Query functions the file can be populated by data extracted from the database (table or SQL statement). The default database connection feature helps to make mentioned functions call easier: $Table(Customers,CustomerName).

Microsoft Excel spreadsheet, Microsoft Access database and another text file are also acceptable as a source.

Output Options of the Test File Generator

The program can add empty data rows to the output file. The empty row has ho structure but contains CR (carriage return) and LF (line feed) symbols only. The empty rows frequency can be specified in the product settings.

DTM Flat File Generator: output options

DTM Flat File Generator offers four output text formats: tab-delimited, CSV (comma-separated values), fixed width and custom delimited. The output file can contain column name list as the first row. The user should provide each column length for fixed width output format. The "append" mode allows the user to add new rows to existing data file. It is a right way to expand test data set by new rows.

File with custom separator must be used to create solution-specific test files like pipeline separated (Name|Phone|Address).

In closing a few words about performance. It depends on a number of columns, column-to-column dependencies, and column level data generator complexity. For 4 to 8 column file with simple generators, you can expect to get 25 to 150 thousand rows per second depends on disk system performance and CPU speed.