Master data, metadata, reference data...
Data as a source for extracting business value is becoming increasingly important as techniques and tools - and not least computing capacity - increase.
But what is Data?
According to the National Encyclopedia:
data (Latin, plural of daʹtum 'given', 'gift'; in plural also: 'expenses'), representation of facts, concepts or instructions in a form suitable for transmission, interpretation or processing by humans or machines.
Nationalencyklopedin, data. http://www.ne.se/uppslagsverk/encyklopedi/lång/data
Data is thus representations of information, either in the form of physical documents or stored in IT systems in various formats. Whether analog or digital, data can be more or less structured. Information captured via a paper form with predefined fields is of course more structured than an article or letter which usually has a freer format in terms of structure.
In digital form, relational databases with predefined keys and attributes dominate where everything is clearly structured. The image below, taken from the book "Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information™" written by Danette McGilvray, shows an order (Sales Order) and the data it contains.
Despite the poor resolution of the image, we see that it contains fields in text and numerical form, where some fields are linked to the customer and some to the product to which the order refers.
With this image, Danette McGilvray wants to show that an order contains different types of data and gives clear definitions of what she refers to as "Master data", "Metadata", "Reference Data" and "Transactional Record".
Here is a clipping from the book of definitions:
The purpose of the image above is to show one of the sources that Prime Arch uses to define data categories. Another source found online is an article written by Semarchy: "Back to Basics: Transactional, Golden, Reference and Other Master Data Types Explained". In this there is the following image that shows the most common data categories:
By combining both of these sources supplemented with examples from various other sources, a gross list of data categories at the top level looks like this:
- Reporting Data
- Transactional data
- Master data
- Reference Data
- Metadata
- Log data
- Unstructured data
- Big data
Categorization of data according to Prime Arch
In the Data dimension in Prime Arch, D11 Data Domain and D21 Data Group are used to broadly categorize data.
An example of a categorization of data according to Prime Arch at Level 1 then looks like this:
We have chosen Wikipedia as the source for most of the categories as these are clearly described and generally valid, ensuring that these definitions are in good agreement with, for example, McGilvray and Semarchy.
These six data domains can be used to divide all data that an organization creates, stores and uses, but the list can also be supplemented with, for example, log data according to the gross list above.
We also want to give examples of how the categorization can take place at Level 2, in data groups. Each Data Domain can be broken down into Data Groups according to the following map:
This list is not complete, but hopefully it gives a good picture of what is hidden within each domain and can be a good start for a data categorization.