Schema Enforcer¶
Many GATE applications are developed through a process which starts with experts manually annotating documents in order for the application developer to understand what is required and which can later be used for testing and evaluation. This is usually done using either GATE Teamware or within GATE Developer using the Schema Annotation Editor
Either approach requires that each of the annotation types being created is described by an XML based Annotation Schema. The Schema Enforcer (part of the Schema_Tools plugin) uses these same schemas to create an annotation set, the contents of which, strictly matches the provided schemas.
Need for Schema Enforcer¶
● When creating an application, you often end up with lots of
annotations and features which are not needed in the final output
● If pushing the output into a MIMIR index, or if storage space is an
issue, it's particularly important to get rid of these
● You can tidy up the output using the AnnotationSetTransfer PR
to move selected annotation types to a new set, but there's still
the problem of the features
● Schema Enforcer PR will ensure that annotations and features
will only appear in the final output set if they adhere strictly to the
annotation schemas used
● Straightforward to use - load Schema Tools plugin and just list
the schemas to be used in the runtime parameters (they must be
loaded in GATE already)
Matching Conditions¶
The Schema Enforcer will copy an annotation if and only if....
- ● the type of the annotation matches one of the supplied schemas
- ● all required features are present and valid (i.e. meet the requirements for being copied to the ’clean’ annotation)
Each feature of an annotation is copied to the new annotation if and only if....
- ● the feature name matches a feature in the schema describing the annotation
- ● the value of the feature is of the same type as specified in the schema
- ● if the feature is defined, in the schema, as an enumerated type then the value must match one of the permitted values
The Schema Enforcer has no initialization parameters and is configured via the following runtime parameters:
Run time parameters¶
● inputASName: this defines the annotation set from which annotations will be copied. If nothing is specified, the default annotation set will be used.
● outputASName: this defines the annotation set to which the annotations will be transferred. This must be an empty or non-existent annotation set.
● schemas: a list of schemas that will be enforced when duplicating the input annotation set.
● useDefaults : if true then the default value for required features (specified using the value attribute in the XML schema) will be used to help complete an otherwise invalid annotation, defaults to false.

Here we are passing only three schemas that are required for us with the corresponding features specified in it. Now a new AnnotationSet "set" is created with particular annotations and corresponding features specified in the schema.
