Advanced Jape¶
Debugging JAPE Grammars¶
Read the error messages, they are helpful!
- • line numbers etc. refer to the original JAPE files
- • description usually highlights the exact problem
1 2 3 4 5 6 7 8 9 | |
When trying to understand how annotations were created by a grammer try the new enableDebugging option of the Jape Transducer run time parameters:
addedByPR: the name of the JAPE PR running the grammar that produced the annotation
addedByPhase: the name of the phase (usually the filename) in which the annotation was created
addedByRule: the name of the rule responsible for creating the annotation
These are given as the features of the annotations generated by the japeTransducer
Using Java in JAPE¶
Beyond Simple Actions
It’s often useful to do more complex operations on the RHS than simply adding annotations, e.g.
- • Set a new feature on one of the matched annotations
- • Delete annotations from the input
- • More complex feature value mappings, e.g. concatenate several LHS features to make one RHS one.
- • Collect statistics, e.g. count the number of matched annotations and store the count as a document feature.
JAPE has no special syntax for these operations, but allows blocks of arbitrary Java code on the RHS.
Gate API¶
GATE Feature Maps¶
Feature Maps. . .
• are simply Java Maps, with added support for firing events.
• are used to provide parameter values when creating and configuring resources.
• are used to store metadata on many GATE objects.
All GATE resources are feature bearers (they implement gate.util.FeatureBearer):
1 2 3 4 5 | |
Creating a new FeatureMap
FeatureMap fm=Factory.newFeatureMap();
GATE Documents¶
A GATE Document comprises:
- • a DocumentContent object;
- • a Default annotation set (which has no name);
- • zero or more named annotation sets;
A Document is also a type of Resource, so it also has:
- •a name;
- •features
Main Document API Calls¶

Annotation Sets¶
GATE Annotation Sets. . .
• maintain a set of Node objects (which are associated with offsets in the document content);
• and a set of annotations (which have a start and an end node).
• implement the gate.AnnotationSet interface;
• which extends Set(Annotation).
• implement several get() methods for obtaining the included annotations according to various constraints.
• are created, deleted, and managed by the Document they belong to.
Main AnnotationSet API Calls¶
Nodes
/ / Get the node with the smallest offset.
public Node firstNode();
/ / Get the node with the largest offset.
public Node lastNode();
Creating new Annotations
/ / Create (and add) a new annotation
public Integer add(Long start, Long end, String type, FeatureMap features);
/ / Create (and add) a new annotation
public Integer add(Node start, Node end, String type, FeatureMap features);
Getting Annotations by ID, or type¶
Getting Annotations by position¶

Combined get methods¶

Annotations¶
GATE Annotations. . .
• are metadata associated with a document segment;
• have a type (String);
• have a start and an end Node (gate.Node);
• have features;
• are created, deleted and managed by annotation sets.
Note
Always use an annotation set to create a new annotation! Do not use the constructor.
Annotation API¶
Main Annotation methods:
public String getType();
public Node getStartNode();
public Node getEndNode();
public FeatureMap getFeatures();
gate.Node
public Long getOffset();
JAPE With Java RHS Template¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
Every jape grammar is coverted into a java class internally ,the phase name is given as the java class name and it consists of a method called "doit()" method which takes the parameters(doc,bindings,inputAS,outpuAS and ontology (these will be dicussed in next section),all the RHS code we write in the rule goes into this method.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | |
Java Block Variables¶
The variables available to Java RHS blocks are:
- doc The document currently being processed.
- inputAS The AnnotationSet specified by the inputASName runtime parameter to the JAPE transducer PR. Read or delete annotations from here.
- outputAS The AnnotationSet specified by the outputASName runtime parameter to the JAPE transducer PR. Create new annotations in here.
- ontology The ontology (if any) provided as a runtime parameter to the JAPE transducer PR.
- bindings The bindings map. . .
Bindings¶
• bindings is a Map from string to AnnotationSet
• Keys are labels from the LHS.
• Values are the annotations matched by the label.
1 2 3 4 5 | |
• bindings.get("uniTown") contains one annotation (the Lookup)
• bindings.get("orgName") contains three annotations (two Tokens plus the Lookup)
A Simple Example¶
This is a simple example of a Java RHS that prints the type and features of each annotation it matches.
1 2 3 4 5 6 7 8 9 10 11 12 | |
Named Java Blocks¶
1 2 3 4 5 | |
• You can label a Java block with a label from the LHS
• The block will only be called if there is at least one annotation bound to the label
• Within the Java block there is a variable labelAnnots referring to the AnnotationSet bound to the label
i.e. AnnotationSet xyAnnots = bindings.get("xy")
• you can have any number of :bind.Type = {} assignment expressions and blocks of Java code, separated by commas.
Common Idioms for Java RHS¶
1.Setting a new feature on one of the matched annotations
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
2.Modify the Java RHS block to add a generalCategory feature to the matched Token annotation holding the first two characters of the POS tag (the category feature).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | |
3.Removing matched annotations from the input
1 2 3 4 5 6 7 | |
This can be useful to stop later phases matching the same annotations again.
4.Accessing the string covered by a match
1 2 3 4 5 6 | |
| Type | MethodName(Parameters) |
|---|---|
| 1. static String | stringFor(Document doc, AnnotationSet anns) |
| 2. static String | stringFor(Document doc, Long start, Long end) |
| 3. static String | stringFor(Document doc, SimpleAnnotation ann) |
1.Return the document text as a String covered by the given annotation set.
2.Returns the document text between the provided offsets.
3.Return the document text as a String corresponding to the annotation.
Contained Annotations¶
- To get annotations contained within the span of the match
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | |
Here in this rule we are getting tokens that are contained in NounChunks and we are adding the posTag(category) feature of the tokens contained in the Nounchunks as the feature of the NounChunks in the form of a list.
- Modify the Java RHS block to count the number of propernouns in the matched Sentence and add this count as a feature on the sentence annotation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | |
Example Scenario¶
• Load a document into Gate
• find out how many named annotation sets it has;
• find out how many annotations each set contains;
• for each annotation set, for each annotation type, find out how many annotations are present.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | |
• Use the document in the above Scenario;
• Use the annotation set Original markups and obtain annotations of type a (anchor).
• Iterate over each annotation, obtain its features and print the value of href feature.
TIP: Before printing the value of href feature, use the new URL(URL context, String spec) constuctor such that the value of the href feature is parsed within the context of the document’s source url.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
Some more methods in gate.Utils Class¶
| Modifier and Type | Method and Description |
|---|---|
| static List |
inDocumentOrder(AnnotationSet as) |
| Return a List containing the annotations in the given annotation set, in document order (i.e. increasing order of start offset). | |
| static Integer | addAnn(AnnotationSet outSet, AnnotationSet spanSet, String type, FeatureMap fm) |
| Add a new annotation to the output annotation set outSet, spanning the same region as spanSet, and having the given type and feature map. | |
| static Integer | addAnn(AnnotationSet outSet, Annotation spanAnn, String type, FeatureMap fm) |
| Add a new annotation to the output annotation set outSet, covering the same region as the annotation spanAnn, and having the given type and feature map. | |
| static Integer | addAnn(AnnotationSet outSet, long startOffset, long endOffset, String type, FeatureMap fm) |
| Add a new annotation to the output annotation set outSet, spanning the given offset range, and having the given type and feature map. | |
| static String | cleanStringFor(Document doc, AnnotationSet anns) |
| Return the cleaned document text as a String covered by the given annotation set. | |
| static String | cleanStringFor(Document doc, Long start, Long end) |
| Return the cleaned document text between the provided offsets. | |
| static String | cleanStringFor(Document doc, SimpleAnnotation ann) |
| Return the cleaned document text as a String corresponding to the annotation. | |
| static Long | end(AnnotationSet as) |
| Get the end offset of an annotation set. | |
| static Long | end(SimpleAnnotation a) |
| Get the end offset of an annotation. | |
| static Long | end(SimpleDocument d) |
| Get the end offset of a document. | |
| static Long | start(AnnotationSet as) |
| Get the start offset of an annotation set. | |
| static Long | start(SimpleAnnotation a) |
| Get the start offset of an annotation. | |
| static Long | start(SimpleDocument d) |
| Get the start offset of a document. |
Methods on covering,overlapping,coextensive and contained Annotations¶
| Modifier and Type | Method and Description |
|---|---|
| static AnnotationSet | getCoextensiveAnnotations(AnnotationSet source, Annotation coextAnn) |
| Get all the annotations from the source annotation set that start and end at exactly the same offsets as the given annotation. | |
| static AnnotationSet | getCoextensiveAnnotations(AnnotationSet source, AnnotationSet coextSet) |
| Get all the annotations from the source annotation set that start and end at exactly the same offsets as the given annotation set. | |
| static AnnotationSet | getCoextensiveAnnotations(AnnotationSet source, AnnotationSet coextSet, String type) |
| Get all the annotations from the source annotation set that start and end at exactly the same offsets as the given annotation set and are of the specified type. | |
| static AnnotationSet | getCoextensiveAnnotations(AnnotationSet source, Annotation coextAnn, String type) |
| Get all the annotations from the source annotation set that start and end at exactly the same offsets as the given annotation and have the specified type. | |
| static AnnotationSet | getContainedAnnotations(AnnotationSet sourceAnnotationSet, Annotation containingAnnotation) |
| Get all the annotations from the source annotation set that lie within the range of the containing annotation. | |
| static AnnotationSet | getContainedAnnotations(AnnotationSet sourceAnnotationSet, AnnotationSet containingAnnotationSet) |
| Get all the annotations from the source annotation set that lie within the range of the containing annotation set, i.e. within the offset range between the start of the first annotation in the containing set and the end of the last annotation in the annotation set. | |
| static AnnotationSet | getContainedAnnotations(AnnotationSet sourceAnnotationSet, AnnotationSet containingAnnotationSet, String targetType) |
| Get all the annotations from the source annotation set with a type equal to targetType that lie within the range of the containing annotation set, i.e. within the offset range between the start of the first annotation in the containing set and the end of the last annotation in the annotation set. | |
| static AnnotationSet | getContainedAnnotations(AnnotationSet sourceAnnotationSet, Annotation containingAnnotation, String targetType) |
| Get all the annotations of type targetType from the source annotation set that lie within the range of the containing annotation. | |
| static AnnotationSet | getCoveringAnnotations(AnnotationSet sourceAnnotationSet, Annotation coveredAnnotation) |
| Get all the annotations from the source annotation set that cover the range of the specified annotation. | |
| static AnnotationSet | getCoveringAnnotations(AnnotationSet sourceAnnotationSet, AnnotationSet coveredAnnotationSet) |
| Get all the annotations from the source annotation set that cover the range of the specified annotation set. | |
| static AnnotationSet | getCoveringAnnotations(AnnotationSet sourceAnnotationSet, AnnotationSet coveredAnnotationSet, String targetType) |
| Get all the annotations from the source annotation set with a type equal to targetType that cover the range of the specified annotation set. | |
| static AnnotationSet | getCoveringAnnotations(AnnotationSet sourceAnnotationSet, Annotation coveredAnnotation, String targetType) |
| Get all the annotations of type targetType from the source annotation set that cover the range of the specified annotation. | |
| static AnnotationSet | getOverlappingAnnotations(AnnotationSet sourceAnnotationSet, Annotation overlappedAnnotation) |
| Get all the annotations from the source annotation set that partly or totally overlap the range of the specified annotation. | |
| static AnnotationSet | getOverlappingAnnotations(AnnotationSet sourceAnnotationSet, AnnotationSet overlappedAnnotationSet) |
| Get all the annotations from the source annotation set that overlap the range of the specified annotation set. | |
| static AnnotationSet | getOverlappingAnnotations(AnnotationSet sourceAnnotationSet, AnnotationSet overlappedAnnotationSet, String targetType) |
| Get all the annotations from the source annotation set with a type equal to targetType that partly or completely overlap the range of the specified annotation set. | |
| static AnnotationSet | getOverlappingAnnotations(AnnotationSet sourceAnnotationSet, Annotation overlappedAnnotation, String targetType) |
| Get all the annotations of type targetType from the source annotation set that partly or totally overlap the range of the specified annotation. |