We’re taking the first steps towards refactoring our Metadata Objects for Linking Environmental Sciences (MOLES) schema to be more easily understood and implementable and to support (if not conform with) the new Observations and Measurements OGC specification. In doing so it became obvious to me that I need to think about the relationship between MOLES entities and ISO discovery metadata.

ISO19115 specifies that

...a dataset (DS_DataSet) must have one or more related Metadata entity sets (MD_Metadata). Metadata may optionally relate to a Feature, Feature Attribute, Feature Type, Feature Property Type (a Metaclass instantiated by Feature association role, Feature attribute type, and Feature operation), and aggregations of datasets (DS_Aggregate). Dataset aggregations may be specified (subclassed) as a general association (DS_OtherAggregate), a dataset series (DS_Series), or a special activity (DS_Initiative). MD_Metadata also applies to other classes of information and services not shown in this diagram (see MD_ScopeCode, B.5.25).

Let’s have a look at the MD_ScopeCode, which is the value of the MD_Metadata attribute hierarchyLevel:

  MD_ScopeCode    CodeS    Definition: class of information to which the referencing entity applies  
  attribute    001    information applies to the attribute class  
  attributeType    002    information applies to the characteristic of a feature  
  collectionHardware    003    information applies to the collection hardware class  
  collectionSession    004    information applies to the collection session  
  dataset    005    information applies to the dataset  
  series    006    information applies to the series  
  nonGeographicDataset    007    information applies to non-geographic data  
  dimensionGroup    008    information applies to a dimension group  
  feature    009    information applies to a feature  
  featureType    010    information applies to a feature type  
  propertyType    011    information applies to a property type  
  fieldSession    012    information applies to a field session  
  software    013    information applies to a computer program or routine  
  service    014    information applies to a ... service ...  
  model    015    information applies to a copy or imitation of an existing or hypothetical object  
  tile    016    information applies to a tile, a spatial subset of geographic data  

I’m not convinced I understand all of those, particularly the model type, but also the collectionHardware, collectionSession and dimensionGroup types. Anyone who can shed some light on those would be welcome to comment below or email me …

Obviously metadata should also apply to other entities. In particular, within the NDG we consider that the observation station (this caused us difficult in finding an appropriate noun inclusive of simulation hardware, but covering a ship, physical location, or a field trip etc), the data production tool (DPT, aka instrument, but inclusive of simulation software also known as models), and activities are also first class citizens of metadata. Perhaps collectionHardware and collectionSession might be relevant for the DPT and some activities. I don’t know.

We also consider the deployment to be an important entity: a deployment links one or more data entities to a (DPT, activity, observation station) triplet, and may itself have properties. It’s worth noting that in the observations and measurements framework, the concept of Observation binds a value of a property determined by a procedure to a feature. In the NDG world, data features live within data entities, and some part of what O&M calls a “procedure” is an attribute of a data production tool, but most of a “procedure” is in my mind synonymous with a deployment[^1]. Values and properties live within data entities too (a data entity is described by an application schema of GML which can include from the O&M namespace).

Leaving aside a resolution of a formal data model for MOLES, the first class entities will need to support metadata, and so there needs to be a scope code for the appropriate first class entities.

John Hockaday on the metadata list, initially suggested extending the scope codes to cover:

  profile    there are many community profiles being developed  
  document    a general "grab bag" type for documents.  
  repository    ... suitable for something like a RDBMS  
  codeList    there are many codeLists in ISO 19115, ISO 19119 and ISO 19139. These codeLists are extensible.  
  modelRun or modelSession    to distinguish from model (but see below)  
  applicationSchema    information about GML application schema themselves  
  portrayalCatalogue    for finding OGC Symbology Encoding or Styled Layer Descriptors for OGC Web Services.  

Eventually he implemented some of those in a new codelist

  modelSession    information applies to a model session or model run for a particular model  
  document    information applies to a document such as a publication, report, record etc.  
  profile    information applies to a profile of an ISO TC 211 standard or specification  
  dataRepository    information applies to a data repository such as a Catalogue Service, Relational Database, WebRegistry  
  codeList    information applies to a code list according to the CT_CodelistCatalogue format  
  project    information applies to a project or programme  

Actually, even with this definition of modelSession to augment model (which he thought might be used for things like metadata about UML descriptions), I still have problems. Within NDG and NumSim, we have the concept of model code bases and experiments, and I think these need to be kept separate but linked.

Personally I don’t like the dataRepository one … but I can live with it.

Project is ok, but we would prefer activity, because we decided that, activities should include activities, and the parents may well be projects … but not always … (e.g. campaigns within a formal project may themselves have sub-campaigns etc).

At this point I might consider a slightly different extension set (which is of course the point of having extensible codelists). Given I’m not sure about these collection thingies, and given a tilt towards O&M, I might want to have

  document    as above  
  profile    as above  
  codeList    as above  
  dataRepository    as above  
  activity    information applying to a project, programme or other activity  
  productionTool    information about an instrument or algorithm  
  observationStation    information about the characteristics, location and/or platform which carried out, or is capable of, an observation or simulation.  
  deployment    information linking a data entity, activity, productionTool and platform in a procedure  

Now I can have an algorithm (computer model) described in a productionTool metadata document and the particular data entity it produces is a data entity (of course), with the particular switches, initial conditions etc, described in a deployment (although I suspect there should and will be ambiguity as to whether the attributes of a productionTool could inherit most if not all of the characteristics of a deployment).

A deployment most closely corresponds to an O&M observation in that we deploy a tool in a or at a particular station for an activity to make a measurement, and I’d love a better (compound) noun than ObservationStation … [^1]: Actually the overlap between an observation and a procedure is significant, something that is pointed out in the O&M spec itself

comments (2)

Stephen M Richard (on Friday 23 October, 2009)

See a metadata resource type compilation for Earth Sciences at http://spreadsheets.google.com/pub?key=rwesR0kcs37P-CjV4sY_iMA&single=true&gid=0&output=html

John Hockaday (on Wednesday 03 March, 2010)

My understanding, using examples, of ‘collectionHardware’ is an instrument or platform for collecting data or information. Fore example, camera, satellite, survey ship, sensor, thermometer, barometer, etc. etc.

My understanding of a ‘collectionSession’ is data collected via the ‘collectionHardware’ during a particular period. For example, photos taken at a party, maximum and minimum temperatures of a given day, week or month, a satellite scene, a period where the survey ship is between ports, etc. etc.

I agree, I don’t understand ‘dimensionGroup’. I think it may relate to a classification of some sort. For example, rainfall is classified into 0-10mm, 11-20mm, 21-30mm, 31-40mm, > 41mm.

My believe ‘model’ can relate to two types. An abstract model such as a UML model and an algorithmic model such as the prediction of tsunami damage to coastal areas.

Using this understanding I would lump productionTool and observationStation under ‘collectionHardware’.

I’m disappointed that I didn’t see this wiki before deciding on the additional terms. I like the ‘activity’ term. It covers ‘programmes’, ‘projects’ but this may also be considered to apply to ‘fieldSession’ or ‘surveySession’.