Virtual customers: producing demo data

When using applications that require or offer the possibility to connect to a database, let’s say for identifying a customer, we’re often confronted with the following problem: we simply don’t have enough sample data available. Either our customer can’t give them out for obvious reasons, or we want to measure our systems performance before we contact possible customers.

I was confronted with the same problem in a customers project, so I decided to put an end to that misery by building a demo data machine. In my case I had to build up a customer database, so I wanted to he following information for my fictional customers:

  • A first and a last name,
  • A date of birth,
  • A virtual address.

The date of birth seemed easy: I’d just pick a random date within a given range, let’s say between 1940 and 2005. The more tricky part were the names, as I’d need databases for them to at least sound reasonable. Of course I could call my persons A and B, but that did not make me happy. So I checked out Statistik Austria and found a file containing all first names which have at least been registered 5 times. That would work perfectly. I also discovered a file containing last names, apparently from the U.S.; so my customers would have a combination of a European first name, and an American last name. Close enough for demo data.

For the virtual address I used the ZIP directory of the Austrian Post. For street names, I used the street register provided by Statistik Austria. I only used Viennese street names for the demo.

I decided to build an application in C#. Note that R would work perfectly fine here as well, also Stored Procedures and SQL could be used to do the job. But I felt like coding in C# again.

The concept so far: I’d use source text files containing my master data. For each virtual customer, I’d pick a random first and last name, as well as a random street and a valid ZIP-city combination. Then I’d enter the customer to a table within a SQL Server 2008 (Express) database.

Bildschirmfoto 2013-02-27 um 12.42.14 Bildschirmfoto 2013-02-27 um 12.42.00 Bildschirmfoto 2013-02-27 um 12.41.31 Bildschirmfoto 2013-02-27 um 12.41.44

After that, I designed a table for my customers in the SQL Server Management Studio:

Bildschirmfoto 2013-02-27 um 12.44.20

With Visual Studio 2010, it is pretty simple to connect to the database. The Entity Data Model deals with the connection and also the mapping of database objects, such as my table. I’d also be able to input new rows (insert) directly within my code without too many magic.

First, I prepared my databases:


string[] allLinesFirstName = File.ReadAllLines(path + "firstNamesDE.csv");
string[] allLinesLastName = File.ReadAllLines(path + "lastNamesEN.csv");
string[] allLinesStreetName = File.ReadAllLines(path + "streetnamesViennaAT.csv");
string[] allLinesZIPCity = File.ReadAllLines(path + "zipCityAT.csv");

 Then I used the Entity Data Model to connect to my database and imported the table:

Entity Data Model

After that, the rest is pretty simple as my object model was already created by .NET automatically. I just looped for a desired amount of time over my customer creation routine, which basically does the following:


// create the same line id for ZIP and city (as they reside in the same line)
int zipcityLineId = rand.Next(allLinesZIPCity.Length-1);
var entity = new person
{
id = Guid.NewGuid(),
dateofbirth = Helpers.GetRandomDate(new DateTime(1940, 01, 01), new DateTime(2005, 01, 01)),
firstname = allLinesFirstName[rand.Next(allLinesFirstName.Length - 1)],
lastname = allLinesLastName[rand.Next(allLinesLastName.Length - 1)],
address = allLinesStreetName[rand.Next(allLinesStreetName.Length - 1)] + " " + rand.Next(1, 100),
ZIP = allLinesZIPCity[zipcityLineId].Substring(0, 4),
city = allLinesZIPCity[zipcityLineId].Substring(5)
};
// add the new person to my collection
myPersons.Add(entity);

Then, after a certain amount of customers have been created, I would enter them into the database. I did that for every 1.000 entries, but you may want to change that number. Just make sure not to enter each and every entry directly into the database, as this will be quite slow.


foreach (var p in myPersons)
{
mydb.AddTopeople(p);
}
mydb.SaveChanges();

The  result itself is satisfying:

Sample Customer Data