How to populate MongoDB database with DTM Data Generator for JSON

The "No SQL" databases are very popular in recent years. This article describes how to populate this kind of data storage using DTM Data Generator for JSON. As an example, we'll use MongoDB because it is one of the most popular NoSQL solutions.

DTM Data Generator for JSON is a part of the DTM Data Generator's product family that was designed to generate JSON documents in a bulk manner. As you know NoSQL databases have no "flat" tables but use complex structures with internal hierarchy instead. The JSON format is a perfect way to define data set compatible with mentioned type of structures.

DTM Data Generator for JSON allows users to define document structure manually or import it from existing JSON file. Any node of the hierarchy can be associated with one of predefined test data generators. The most useful generators are random values (numbers, dates, times, etc), personal data (names, phone numbers, addresses, etc) and business data (company name, department, position, etc).

Data Generation and Sources

Also, the software offers built-in language for complex cases, data dependencies, and relationships definition. There is Pattern Engine Language for advanced users and developers who need to create complex values instead of predefined.

The user can provide "repeater" for each node. That means the data generation tool will generate mentioned mode a few times instead of one by default. That helps to build huge data arrays by a few click only.

DTM Data Generator for JSON enables the user to extract data from external XML, Excel, text files as well as load them from any database using the unified interface like ODBC or OLE DB. To prepare source database status the user can define prologue SQL script. The epilogue script can be defined for execution environment cleanup like counters reset or temporary tables removing.

The test data generation solution designed for creation a lot of JSON files, one per document copy, for each execution.

Well, now we have structure and assigned generators. After the project execution, we have a few JSON files ready to load to the database.

Data Loading

We'll use import utility (mongoimport) for loading purpose. This tool can import content from JSON files we generated ad a few other sources. By default, it works with local MongoDB at port 27017. The alternate database can be specified by -h (host) command line switch. In case the database requires authentication, -u and -p parameters helps the user to provide login name and password. To specify database and collection we will use -d and -c command line switches.

The --file <file name> clause allows us to specify the input file. DTM Data Generator for JSON stores all generated files in a single folder with 1.json, 2.json, ... NNN.json names. Also, we should provide file format by "--type json" parameter.

There is simplest command line for our case:

mongoimport -d testDb -c testOrders --file d:\results\1.json

This command loads 1.json file to "testDb" database, collection "testOrders". However, we have hundreds or even thousands test data files and can't run this command for each one manually. To do that we'll use simplest PowerShell scenario:

for($i=0; $i -le 1000; $i++) 
  { mongoimport -d testDb -c testOrders --file "d:\results\$i.json"; }

It will prepare file name and run import 1000 times for each of generated JSON documents. Of course, it seems too slow because we will rerun the import tool for each file. If it is possible in your own case just collect all data in the same JSON file.

See Also