We’re taking the first steps towards refactoring our Metadata Objects for Linking Environmental Sciences (MOLES) schema to be more easily understood and implementable and to support (if not conform with) the new Observations and Measurements OGC specification. In doing so it became obvious to me that I need to think about the relationship between MOLES entities and ISO discovery metadata.
ISO19115 specifies that
...a dataset (DS_DataSet) must have one or more related Metadata entity sets (MD_Metadata). Metadata may optionally relate to a Feature, Feature Attribute, Feature Type, Feature Property Type (a Metaclass instantiated by Feature association role, Feature attribute type, and Feature operation), and aggregations of datasets (DS_Aggregate). Dataset aggregations may be specified (subclassed) as a general association (DS_OtherAggregate), a dataset series (DS_Series), or a special activity (DS_Initiative). MD_Metadata also applies to other classes of information and services not shown in this diagram (see MD_ScopeCode, B.5.25).
Let’s have a look at the MD_ScopeCode, which is the value of the MD_Metadata attribute hierarchyLevel:
|MD_ScopeCode||CodeS||Definition: class of information to which the referencing entity applies|
|attribute||001||information applies to the attribute class|
|attributeType||002||information applies to the characteristic of a feature|
|collectionHardware||003||information applies to the collection hardware class|
|collectionSession||004||information applies to the collection session|
|dataset||005||information applies to the dataset|
|series||006||information applies to the series|
|nonGeographicDataset||007||information applies to non-geographic data|
|dimensionGroup||008||information applies to a dimension group|
|feature||009||information applies to a feature|
|featureType||010||information applies to a feature type|
|propertyType||011||information applies to a property type|
|fieldSession||012||information applies to a field session|
|software||013||information applies to a computer program or routine|
|service||014||information applies to a ... service ...|
|model||015||information applies to a copy or imitation of an existing or hypothetical object|
|tile||016||information applies to a tile, a spatial subset of geographic data|
I’m not convinced I understand all of those, particularly the model type, but also the collectionHardware, collectionSession and dimensionGroup types. Anyone who can shed some light on those would be welcome to comment below or email me …
Obviously metadata should also apply to other entities. In particular, within the NDG we consider that the observation station (this caused us difficult in finding an appropriate noun inclusive of simulation hardware, but covering a ship, physical location, or a field trip etc), the data production tool (DPT, aka instrument, but inclusive of simulation software also known as models), and activities are also first class citizens of metadata. Perhaps collectionHardware and collectionSession might be relevant for the DPT and some activities. I don’t know.
We also consider the deployment to be an important entity: a deployment links one or more data entities to a (DPT, activity, observation station) triplet, and may itself have properties. It’s worth noting that in the observations and measurements framework, the concept of Observation binds a value of a property determined by a procedure to a feature. In the NDG world, data features live within data entities, and some part of what O&M calls a “procedure” is an attribute of a data production tool, but most of a “procedure” is in my mind synonymous with a deployment[^1]. Values and properties live within data entities too (a data entity is described by an application schema of GML which can include from the O&M namespace).
Leaving aside a resolution of a formal data model for MOLES, the first class entities will need to support metadata, and so there needs to be a scope code for the appropriate first class entities.
John Hockaday on the metadata list, initially suggested extending the scope codes to cover:
|profile||there are many community profiles being developed|
|document||a general "grab bag" type for documents.|
|repository||... suitable for something like a RDBMS|
|codeList||there are many codeLists in ISO 19115, ISO 19119 and ISO 19139. These codeLists are extensible.|
|modelRun or modelSession||to distinguish from model (but see below)|
|applicationSchema||information about GML application schema themselves|
|portrayalCatalogue||for finding OGC Symbology Encoding or Styled Layer Descriptors for OGC Web Services.|
Eventually he implemented some of those in a new codelist
|modelSession||information applies to a model session or model run for a particular model|
|document||information applies to a document such as a publication, report, record etc.|
|profile||information applies to a profile of an ISO TC 211 standard or specification|
|dataRepository||information applies to a data repository such as a Catalogue Service, Relational Database, WebRegistry|
|codeList||information applies to a code list according to the CT_CodelistCatalogue format|
|project||information applies to a project or programme|
Actually, even with this definition of modelSession to augment model (which he thought might be used for things like metadata about UML descriptions), I still have problems. Within NDG and NumSim, we have the concept of model code bases and experiments, and I think these need to be kept separate but linked.
Personally I don’t like the dataRepository one … but I can live with it.
Project is ok, but we would prefer activity, because we decided that, activities should include activities, and the parents may well be projects … but not always … (e.g. campaigns within a formal project may themselves have sub-campaigns etc).
At this point I might consider a slightly different extension set (which is of course the point of having extensible codelists). Given I’m not sure about these collection thingies, and given a tilt towards O&M, I might want to have
|activity||information applying to a project, programme or other activity|
|productionTool||information about an instrument or algorithm|
|observationStation||information about the characteristics, location and/or platform which carried out, or is capable of, an observation or simulation.|
|deployment||information linking a data entity, activity, productionTool and platform in a procedure|
Now I can have an algorithm (computer model) described in a productionTool metadata document and the particular data entity it produces is a data entity (of course), with the particular switches, initial conditions etc, described in a deployment (although I suspect there should and will be ambiguity as to whether the attributes of a productionTool could inherit most if not all of the characteristics of a deployment).
A deployment most closely corresponds to an O&M observation in that we deploy a tool in a or at a particular station for an activity to make a measurement, and I’d love a better (compound) noun than ObservationStation … [^1]: Actually the overlap between an observation and a procedure is significant, something that is pointed out in the O&M spec itself
Stephen M Richard (on Friday 23 October, 2009)
See a metadata resource type compilation for Earth Sciences at http://spreadsheets.google.com/pub?key=rwesR0kcs37P-CjV4sY_iMA&single=true&gid=0&output=html
John Hockaday (on Wednesday 03 March, 2010)
My understanding, using examples, of ‘collectionHardware’ is an instrument or platform for collecting data or information. Fore example, camera, satellite, survey ship, sensor, thermometer, barometer, etc. etc.
My understanding of a ‘collectionSession’ is data collected via the ‘collectionHardware’ during a particular period. For example, photos taken at a party, maximum and minimum temperatures of a given day, week or month, a satellite scene, a period where the survey ship is between ports, etc. etc.
I agree, I don’t understand ‘dimensionGroup’. I think it may relate to a classification of some sort. For example, rainfall is classified into 0-10mm, 11-20mm, 21-30mm, 31-40mm, > 41mm.
My believe ‘model’ can relate to two types. An abstract model such as a UML model and an algorithmic model such as the prediction of tsunami damage to coastal areas.
Using this understanding I would lump productionTool and observationStation under ‘collectionHardware’.
I’m disappointed that I didn’t see this wiki before deciding on the additional terms. I like the ‘activity’ term. It covers ‘programmes’, ‘projects’ but this may also be considered to apply to ‘fieldSession’ or ‘surveySession’.