The Scope of ISO19115

We’re taking the first steps towards refactoring our Metadata Objects for Linking Environmental Sciences (MOLES) schema to be more easily understood and implementable and to support (if not conform with) the new Observations and Measurements OGC specification. In doing so it became obvious to me that I need to think about the relationship between MOLES entities and ISO discovery metadata.

ISO19115 specifies that

...a dataset (DS_DataSet) must have one or more related Metadata entity sets (MD_Metadata). Metadata may optionally relate to a Feature, Feature Attribute, Feature Type, Feature Property Type (a Metaclass instantiated by Feature association role, Feature attribute type, and Feature operation), and aggregations of datasets (DS_Aggregate). Dataset aggregations may be specified (subclassed) as a general association (DS_OtherAggregate), a dataset series (DS_Series), or a special activity (DS_Initiative). MD_Metadata also applies to other classes of information and services not shown in this diagram (see MD_ScopeCode, B.5.25).

Let’s have a look at the MD_ScopeCode, which is the value of the MD_Metadata attribute hierarchyLevel:

MD_ScopeCode	CodeS	Definition: class of information to which the referencing entity applies
attribute	001	information applies to the attribute class
attributeType	002	information applies to the characteristic of a feature
collectionHardware	003	information applies to the collection hardware class
collectionSession	004	information applies to the collection session
dataset	005	information applies to the dataset
series	006	information applies to the series
nonGeographicDataset	007	information applies to non-geographic data
dimensionGroup	008	information applies to a dimension group
feature	009	information applies to a feature
featureType	010	information applies to a feature type
propertyType	011	information applies to a property type
fieldSession	012	information applies to a field session
software	013	information applies to a computer program or routine
service	014	information applies to a ... service ...
model	015	information applies to a copy or imitation of an existing or hypothetical object
tile	016	information applies to a tile, a spatial subset of geographic data

I’m not convinced I understand all of those, particularly the model type, but also the collectionHardware, collectionSession and dimensionGroup types. Anyone who can shed some light on those would be welcome to comment below or email me …

Obviously metadata should also apply to other entities. In particular, within the NDG we consider that the observation station (this caused us difficult in finding an appropriate noun inclusive of simulation hardware, but covering a ship, physical location, or a field trip etc), the data production tool (DPT, aka instrument, but inclusive of simulation software also known as models), and activities are also first class citizens of metadata. Perhaps collectionHardware and collectionSession might be relevant for the DPT and some activities. I don’t know.

We also consider the deployment to be an important entity: a deployment links one or more data entities to a (DPT, activity, observation station) triplet, and may itself have properties. It’s worth noting that in the observations and measurements framework, the concept of Observation binds a value of a property determined by a procedure to a feature. In the NDG world, data features live within data entities, and some part of what O&M calls a “procedure” is an attribute of a data production tool, but most of a “procedure” is in my mind synonymous with a deployment[^1]. Values and properties live within data entities too (a data entity is described by an application schema of GML which can include from the O&M namespace).

Leaving aside a resolution of a formal data model for MOLES, the first class entities will need to support metadata, and so there needs to be a scope code for the appropriate first class entities.

John Hockaday on the metadata list, initially suggested extending the scope codes to cover:

profile	there are many community profiles being developed
document	a general "grab bag" type for documents.
repository	... suitable for something like a RDBMS
codeList	there are many codeLists in ISO 19115, ISO 19119 and ISO 19139. These codeLists are extensible.
modelRun or modelSession	to distinguish from model (but see below)
applicationSchema	information about GML application schema themselves
portrayalCatalogue	for finding OGC Symbology Encoding or Styled Layer Descriptors for OGC Web Services.

Eventually he implemented some of those in a new codelist

modelSession	information applies to a model session or model run for a particular model
document	information applies to a document such as a publication, report, record etc.
profile	information applies to a profile of an ISO TC 211 standard or specification
dataRepository	information applies to a data repository such as a Catalogue Service, Relational Database, WebRegistry
codeList	information applies to a code list according to the CT_CodelistCatalogue format
project	information applies to a project or programme

Actually, even with this definition of modelSession to augment model (which he thought might be used for things like metadata about UML descriptions), I still have problems. Within NDG and NumSim, we have the concept of model code bases and experiments, and I think these need to be kept separate but linked.

Personally I don’t like the dataRepository one … but I can live with it.

Project is ok, but we would prefer activity, because we decided that, activities should include activities, and the parents may well be projects … but not always … (e.g. campaigns within a formal project may themselves have sub-campaigns etc).

At this point I might consider a slightly different extension set (which is of course the point of having extensible codelists). Given I’m not sure about these collection thingies, and given a tilt towards O&M, I might want to have

document	as above
profile	as above
codeList	as above
dataRepository	as above
activity	information applying to a project, programme or other activity
productionTool	information about an instrument or algorithm
observationStation	information about the characteristics, location and/or platform which carried out, or is capable of, an observation or simulation.
deployment	information linking a data entity, activity, productionTool and platform in a procedure

Now I can have an algorithm (computer model) described in a productionTool metadata document and the particular data entity it produces is a data entity (of course), with the particular switches, initial conditions etc, described in a deployment (although I suspect there should and will be ambiguity as to whether the attributes of a productionTool could inherit most if not all of the characteristics of a deployment).

A deployment most closely corresponds to an O&M observation in that we deploy a tool in a or at a particular station for an activity to make a measurement, and I’d love a better (compound) noun than ObservationStation … [^1]: Actually the overlap between an observation and a procedure is significant, something that is pointed out in the O&M spec itself

comments (2)

Stephen M Richard (on Friday 23 October, 2009)

See a metadata resource type compilation for Earth Sciences at http://spreadsheets.google.com/pub?key=rwesR0kcs37P-CjV4sY_iMA&single=true&gid=0&output=html

John Hockaday (on Wednesday 03 March, 2010)

My understanding, using examples, of ‘collectionHardware’ is an instrument or platform for collecting data or information. Fore example, camera, satellite, survey ship, sensor, thermometer, barometer, etc. etc.

My understanding of a ‘collectionSession’ is data collected via the ‘collectionHardware’ during a particular period. For example, photos taken at a party, maximum and minimum temperatures of a given day, week or month, a satellite scene, a period where the survey ship is between ports, etc. etc.

I agree, I don’t understand ‘dimensionGroup’. I think it may relate to a classification of some sort. For example, rainfall is classified into 0-10mm, 11-20mm, 21-30mm, 31-40mm, > 41mm.

My believe ‘model’ can relate to two types. An abstract model such as a UML model and an algorithmic model such as the prediction of tsunami damage to coastal areas.

Using this understanding I would lump productionTool and observationStation under ‘collectionHardware’.

I’m disappointed that I didn’t see this wiki before deciding on the additional terms. I like the ‘activity’ term. It covers ‘programmes’, ‘projects’ but this may also be considered to apply to ‘fieldSession’ or ‘surveySession’.

John