One of the interesting challenges I’ve faced in the last year was creating a flat file schema for a well established custom flat file format that was in use at one of my clients.  What made this flat file format different from most that I had dealt with in the past was that this flat file contained repeating delimited records, each record containing tags that were found in the middle of the record.

Now my responsibility on this project was to output flat files of this format which wouldn’t be a big deal since all the records had the same number of delimited elements and I could just make a generic repeating record in my schema with the right number of elements. However I couldn’t help but think that one day it might be a requirement to read in files of this format as well and designing the schema correctly now could save a lot of rework later (it turns out that this was a good call since one of my colleagues has been tasked with exactly this requirement). In order to future proof the schema it must have knowledge of the tags and be able to read in instance files and apply the correct record type to each line, the obvious problem with this being that you can only apply a tag at the record level.

Let’s illustrate this with an example. Say we have a file that contains details about animals, though the type of details might differ if the animal is a cat or a dog. Regardless what the type of animal is, there are some common elements that are shared between the two and these appear before the tag that identifies whether the record in question is dealing with a dog or cat. After the identifier the data elements might differ (there could even be a different number of elements supplied but I won’t touch on that).

Example file

A very generic approach to creating this schema would be to have a generic repeating record structure as below which can easily be created using the flat file schema generation wizard.

AnimalGenericSchema

Now the problem with this is that while it is easy to write output files with this schema by flattening your dog and cat structures from source documents into this generic structure in a map, it is a bit more problematic when you are picking up files of this type and are mapping to a typed structure (far from impossible but difficult and harder to understand, especially as the number of record types grow), forcing you to keep more logic in your map than you really should have to. Validating an instance file gives us rather ill-defined XML.

AnimalGenericOutput

The way I got around this problem was to have a generic Animal repeating record which contains all the common elements as well as a choice record segment (see this MSDN article for some more details on choice group nodes) called DiscreteAnimalType. Within this choice record I created a record for each of the different animal types with their respective elements, and I applied a tag identifier at the record level. Note that I started with the generic schema structure created by the flat file schema generation wizard and then adjusted it manually in the schema designer.

AnimalTypedSchema

Validating the same instance file now gives us a much better defined XML structure.

AnimalTypedOutput

Does anyone else have a better approach to this problem (I have of course simplified it a lot for the sake of this blog post). I hope this makes people think about future proofing their flat file schemas a bit more.

Advertisements