Skip to content

Advanced Jape

Debugging JAPE Grammars

Read the error messages, they are helpful!

    • line numbers etc. refer to the original JAPE files
    • description usually highlights the exact problem
1
2
3
4
5
6
7
8
9
file:/home/gate/plugins/ANNIE/resources/NE/name.jape:
Encountered " <kleeneOp> "? "" at line 1580, column 10.
Was expecting one of:
"\"" ...
<ident> ...
"|" ...
"{" ...
"(" ...
")" ...

When trying to understand how annotations were created by a grammer try the new enableDebugging option of the Jape Transducer run time parameters:
addedByPR: the name of the JAPE PR running the grammar that produced the annotation
addedByPhase: the name of the phase (usually the filename) in which the annotation was created
addedByRule: the name of the rule responsible for creating the annotation

These are given as the features of the annotations generated by the japeTransducer

Using Java in JAPE

Beyond Simple Actions

It’s often useful to do more complex operations on the RHS than simply adding annotations, e.g.

    • Set a new feature on one of the matched annotations
    • Delete annotations from the input
    • More complex feature value mappings, e.g. concatenate several LHS features to make one RHS one.
    • Collect statistics, e.g. count the number of matched annotations and store the count as a document feature.

JAPE has no special syntax for these operations, but allows blocks of arbitrary Java code on the RHS.

Gate API

GATE Feature Maps

Feature Maps. . .

• are simply Java Maps, with added support for firing events.
• are used to provide parameter values when creating and configuring resources.
• are used to store metadata on many GATE objects.

All GATE resources are feature bearers (they implement gate.util.FeatureBearer):

1
2
3
4
5
public interface FeatureBearer{
public FeatureMap getFeatures();

public void setFeatures(FeatureMap features);
 }

Creating a new FeatureMap

FeatureMap fm=Factory.newFeatureMap();

GATE Documents

A GATE Document comprises:

    • a DocumentContent object;
    • a Default annotation set (which has no name);
    • zero or more named annotation sets;


A Document is also a type of Resource, so it also has:

    •a name;
    •features

Main Document API Calls

mainDocumentAPIcalls

Annotation Sets

GATE Annotation Sets. . .

• maintain a set of Node objects (which are associated with offsets in the document content);

• and a set of annotations (which have a start and an end node).

• implement the gate.AnnotationSet interface;

• which extends Set(Annotation).

• implement several get() methods for obtaining the included annotations according to various constraints.

• are created, deleted, and managed by the Document they belong to.

Main AnnotationSet API Calls

Nodes

/ / Get the node with the smallest offset.
public Node firstNode();
/ / Get the node with the largest offset.
public Node lastNode();

Creating new Annotations

/ / Create (and add) a new annotation
public Integer add(Long start, Long end, String type, FeatureMap features);
/ / Create (and add) a new annotation
public Integer add(Node start, Node end, String type, FeatureMap features);

Getting Annotations by ID, or type

AnnotationSetAPIbyID

Getting Annotations by position

AnnotationSetAPIbyposition

Combined get methods

AnnotationSetAPIbygetMethods

Annotations

GATE Annotations. . .

• are metadata associated with a document segment;
• have a type (String);
• have a start and an end Node (gate.Node);
• have features;
• are created, deleted and managed by annotation sets.


Note

Always use an annotation set to create a new annotation! Do not use the constructor.


Annotation API

Main Annotation methods:
public String getType();
public Node getStartNode();
public Node getEndNode();
public FeatureMap getFeatures();

gate.Node
public Long getOffset();

JAPE With Java RHS Template

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
Imports: { import static gate.Utils.*; }

Phase: Example
Input: Token // and any other input Annotations
Options: control = appelt

Rule: Example1
(
// Normal JAPE LHS goes here
):label
-->
{
// java code goes in here
}

Every jape grammar is coverted into a java class internally ,the phase name is given as the java class name and it consists of a method called "doit()" method which takes the parameters(doc,bindings,inputAS,outpuAS and ontology (these will be dicussed in next section),all the RHS code we write in the rule goes into this method.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import java.util.*;
   import gate.util.*;
   import gate.jape.*;
   import gate.creole.ontology.*;
    // JAPE Source: file:/home/kpmd/Documents/test/SentenceContained.jape:1

     import static gate.Utils.*;
     import java.util.ArrayList;
      import java.util.List;

  public class SentenceconatinedSentenceconatinActionClass225
  implements java.io.Serializable, gate.jape.RhsAction { 
    private gate.jape.ActionContext ctx;
    public java.lang.String ruleName() { return "Sentenceconatin"; }
    public java.lang.String phaseName() { return "Sentenceconatined"; }
    public void setActionContext(gate.jape.ActionContext ac) { ctx = ac; }
    public gate.jape.ActionContext getActionContext() { return ctx; }
    public void doit(gate.Document doc, 
                     java.util.Map<java.lang.String, gate.AnnotationSet> bindings, 
                     gate.AnnotationSet inputAS, gate.AnnotationSet outputAS, 
                     gate.creole.ontology.Ontology ontology) throws gate.jape.JapeException {
      gate.AnnotationSet sentAnnots = bindings.get("sent"); 
      if(sentAnnots != null && sentAnnots.size() != 0) { 
     AnnotationSet s=inputAS.get("Sentence");
       for(Annotation a:s)
  {
  AnnotationSet set =inputAS.getContained(start(a),end(a));
     Set <String> Annotationname=set.getAllTypes();
      System.out.println(Annotationname);
    a.getFeatures().put("annotationname",Annotationname);
      }

Java Block Variables

The variables available to Java RHS blocks are:

    doc The document currently being processed.
    inputAS The AnnotationSet specified by the inputASName runtime parameter to the JAPE transducer PR. Read or delete annotations from here.
    outputAS The AnnotationSet specified by the outputASName runtime parameter to the JAPE transducer PR. Create new annotations in here.
    ontology The ontology (if any) provided as a runtime parameter to the JAPE transducer PR.
    bindings The bindings map. . .

Bindings

• bindings is a Map from string to AnnotationSet
• Keys are labels from the LHS.
• Values are the annotations matched by the label.

1
2
3
4
5
(
{Token.string == "University"}
{Token.string == "of"}
({Lookup.minorType == city}):uniTown
):orgName

• bindings.get("uniTown") contains one annotation (the Lookup)
• bindings.get("orgName") contains three annotations (two Tokens plus the Lookup)

A Simple Example

This is a simple example of a Java RHS that prints the type and features of each annotation it matches.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
Rule: ListEntities 
({Person}|{Organization}|{Location}):ent
-->
{
// get the annotations matched
AnnotationSet ents = bindings.get("ent");
for(Annotation e : ents) {
// display the type and features for each
System.out.println("Type: " + e.getType());
System.out.println("Features: " + e.getFeatures());
}
}

Named Java Blocks

1
2
3
4
5
-->
:uniTown{
    uniTownAnnots.iterator().next().getFeatures()
        .put("hasUniversity", Boolean.TRUE);
}

• You can label a Java block with a label from the LHS
• The block will only be called if there is at least one annotation bound to the label
• Within the Java block there is a variable labelAnnots referring to the AnnotationSet bound to the label
i.e. AnnotationSet xyAnnots = bindings.get("xy")
• you can have any number of :bind.Type = {} assignment expressions and blocks of Java code, separated by commas.

Common Idioms for Java RHS

1.Setting a new feature on one of the matched annotations

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
Rule: LcString
({Token}):tok
-->
:tok {
for(Annotation a : tokAnnots) {
// get the FeatureMap for the annotation
FeatureMap fm = a.getFeatures();
// get the “string” feature
String str = (String)fm.get("string");
// convert it to lower case and store
fm.put("lcString", str.toLowerCase());
}
}

2.Modify the Java RHS block to add a generalCategory feature to the matched Token annotation holding the first two characters of the POS tag (the category feature).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Imports: {
   import static gate.Utils.*;
}
Phase: GeneralPos
Input: Token
Options: control = appelt
Rule: GeneralizePOSTag
({Token}):tok
-->
:tok {
  for(Annotation a:tokAnnots)
{
  String s=(String)a.getFeatures().get("category");
  System.out.println(s);
try{
a.getFeatures().put("generalcategory", s.substring(1,3));
}
catch(Exception e)
{
  System.out.println(e);
  }

}
} 

3.Removing matched annotations from the input

1
2
3
4
5
6
7
Rule: Location
({Lookup.majorType = "location"}):loc
-->
:loc.Location = { kind = :loc.Lookup.minorType, rule = "Location"},
:loc {
inputAS.removeAll(locAnnots);
}

This can be useful to stop later phases matching the same annotations again.

4.Accessing the string covered by a match

1
2
3
4
5
6
Rule: Location
({Lookup.majorType = "location"}):loc
-->
:loc {
String str = stringFor(doc,locAnnots);
}
Type MethodName(Parameters)
1. static String stringFor(Document doc, AnnotationSet anns)
2. static String stringFor(Document doc, Long start, Long end)
3. static String stringFor(Document doc, SimpleAnnotation ann)

1.Return the document text as a String covered by the given annotation set.
2.Returns the document text between the provided offsets.
3.Return the document text as a String corresponding to the annotation.

Contained Annotations

  1. To get annotations contained within the span of the match
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
Imports: {
   import static gate.Utils.*;
   }
Phase: contained
Input:  NounChunk
Options: control = appelt
Rule: NPTokens
({NounChunk}):np
-->
:np {
List<String> posTags = new ArrayList<String>();
AnnotationSet nounchunktokens=getContainedAnnotations(inputAS,
npAnnots,"Token");
for(Annotation tok :nounchunktokens) 
{
posTags.add(
 (String)tok.getFeatures().get("category"));
}
 FeatureMap fm =
 npAnnots.iterator().next().getFeatures();
 fm.put("posTags", posTags);
}

Here in this rule we are getting tokens that are contained in NounChunks and we are adding the posTag(category) feature of the tokens contained in the Nounchunks as the feature of the NounChunks in the form of a list.

  1. Modify the Java RHS block to count the number of propernouns in the matched Sentence and add this count as a feature on the sentence annotation.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Imports: {
   import static gate.Utils.*;
}

Phase: Num_Nouns
Input:  Sentence
Options: control = appelt
Rule: NumNouns
(
    {Sentence}

):sent 
-->
:sent{
    int count=0;
    for(Annotation a:sentAnnots)
    {
 AnnotationSet set=inputAS.getContained(start(a),end(a));
 //System.out.println(set.get("Token"));
for(Annotation  tokset: set.get("Token"))
{
String s=(String)tokset.getFeatures().get("category");
if(s.equals("NNP"))
{
    count =count+1;
    a.getFeatures().put("count", count);
}
    }
    }
}

Example Scenario

• Load a document into Gate
• find out how many named annotation sets it has;
• find out how many annotations each set contains;
• for each annotation set, for each annotation type, find out how many annotations are present.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
// obtain a map of all named annotation sets
Map<String, AnnotationSet> namedASes =
doc.getNamedAnnotationSets();
System.out.println("No. of named Annotation Sets:"
+ namedASes.size());

// no of annotations each set contains
for (String setName : namedASes.keySet()) {
// annotation set
AnnotationSet aSet = namedASes.get(setName);
// no of annotations
System.out.println("No. of Annotations for " +
setName + ":" + aSet.size());
// all annotation types
Set<String> annotTypes = aSet.getAllTypes();
for(String aType : annotTypes) {
System.out.println(" " + aType + ": "
+ aSet.get(aType).size());
}
}

• Use the document in the above Scenario;
• Use the annotation set Original markups and obtain annotations of type a (anchor).
• Iterate over each annotation, obtain its features and print the value of href feature.
TIP: Before printing the value of href feature, use the new URL(URL context, String spec) constuctor such that the value of the href feature is parsed within the context of the document’s source url.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// obtain the Original markups annotation set
AnnotationSet origMarkupsSet =
doc.getAnnotations("Original markups");

// obtain annotations of type ’a’
AnnotationSet anchorSet = origMarkupsSet.get("a");

// iterate over each annotation
// obtain its features and print the value of href feature
for (Annotation anchor : anchorSet) {
String href = (String) anchor.getFeatures().get("href");
if(href != null) {
// resolving href value against the document’s url
System.out.println(new URL(doc.getSourceUrl(), href));
}
}

Some more methods in gate.Utils Class

Modifier and Type Method and Description
static List inDocumentOrder(AnnotationSet as)
Return a List containing the annotations in the given annotation set, in document order (i.e. increasing order of start offset).
static Integer addAnn(AnnotationSet outSet, AnnotationSet spanSet, String type, FeatureMap fm)
Add a new annotation to the output annotation set outSet, spanning the same region as spanSet, and having the given type and feature map.
static Integer addAnn(AnnotationSet outSet, Annotation spanAnn, String type, FeatureMap fm)
Add a new annotation to the output annotation set outSet, covering the same region as the annotation spanAnn, and having the given type and feature map.
static Integer addAnn(AnnotationSet outSet, long startOffset, long endOffset, String type, FeatureMap fm)
Add a new annotation to the output annotation set outSet, spanning the given offset range, and having the given type and feature map.
static String cleanStringFor(Document doc, AnnotationSet anns)
Return the cleaned document text as a String covered by the given annotation set.
static String cleanStringFor(Document doc, Long start, Long end)
Return the cleaned document text between the provided offsets.
static String cleanStringFor(Document doc, SimpleAnnotation ann)
Return the cleaned document text as a String corresponding to the annotation.
static Long end(AnnotationSet as)
Get the end offset of an annotation set.
static Long end(SimpleAnnotation a)
Get the end offset of an annotation.
static Long end(SimpleDocument d)
Get the end offset of a document.
static Long start(AnnotationSet as)
Get the start offset of an annotation set.
static Long start(SimpleAnnotation a)
Get the start offset of an annotation.
static Long start(SimpleDocument d)
Get the start offset of a document.

Methods on covering,overlapping,coextensive and contained Annotations

Modifier and Type Method and Description
static AnnotationSet getCoextensiveAnnotations(AnnotationSet source, Annotation coextAnn)
Get all the annotations from the source annotation set that start and end at exactly the same offsets as the given annotation.
static AnnotationSet getCoextensiveAnnotations(AnnotationSet source, AnnotationSet coextSet)
Get all the annotations from the source annotation set that start and end at exactly the same offsets as the given annotation set.
static AnnotationSet getCoextensiveAnnotations(AnnotationSet source, AnnotationSet coextSet, String type)
Get all the annotations from the source annotation set that start and end at exactly the same offsets as the given annotation set and are of the specified type.
static AnnotationSet getCoextensiveAnnotations(AnnotationSet source, Annotation coextAnn, String type)
Get all the annotations from the source annotation set that start and end at exactly the same offsets as the given annotation and have the specified type.
static AnnotationSet getContainedAnnotations(AnnotationSet sourceAnnotationSet, Annotation containingAnnotation)
Get all the annotations from the source annotation set that lie within the range of the containing annotation.
static AnnotationSet getContainedAnnotations(AnnotationSet sourceAnnotationSet, AnnotationSet containingAnnotationSet)
Get all the annotations from the source annotation set that lie within the range of the containing annotation set, i.e. within the offset range between the start of the first annotation in the containing set and the end of the last annotation in the annotation set.
static AnnotationSet getContainedAnnotations(AnnotationSet sourceAnnotationSet, AnnotationSet containingAnnotationSet, String targetType)
Get all the annotations from the source annotation set with a type equal to targetType that lie within the range of the containing annotation set, i.e. within the offset range between the start of the first annotation in the containing set and the end of the last annotation in the annotation set.
static AnnotationSet getContainedAnnotations(AnnotationSet sourceAnnotationSet, Annotation containingAnnotation, String targetType)
Get all the annotations of type targetType from the source annotation set that lie within the range of the containing annotation.
static AnnotationSet getCoveringAnnotations(AnnotationSet sourceAnnotationSet, Annotation coveredAnnotation)
Get all the annotations from the source annotation set that cover the range of the specified annotation.
static AnnotationSet getCoveringAnnotations(AnnotationSet sourceAnnotationSet, AnnotationSet coveredAnnotationSet)
Get all the annotations from the source annotation set that cover the range of the specified annotation set.
static AnnotationSet getCoveringAnnotations(AnnotationSet sourceAnnotationSet, AnnotationSet coveredAnnotationSet, String targetType)
Get all the annotations from the source annotation set with a type equal to targetType that cover the range of the specified annotation set.
static AnnotationSet getCoveringAnnotations(AnnotationSet sourceAnnotationSet, Annotation coveredAnnotation, String targetType)
Get all the annotations of type targetType from the source annotation set that cover the range of the specified annotation.
static AnnotationSet getOverlappingAnnotations(AnnotationSet sourceAnnotationSet, Annotation overlappedAnnotation)
Get all the annotations from the source annotation set that partly or totally overlap the range of the specified annotation.
static AnnotationSet getOverlappingAnnotations(AnnotationSet sourceAnnotationSet, AnnotationSet overlappedAnnotationSet)
Get all the annotations from the source annotation set that overlap the range of the specified annotation set.
static AnnotationSet getOverlappingAnnotations(AnnotationSet sourceAnnotationSet, AnnotationSet overlappedAnnotationSet, String targetType)
Get all the annotations from the source annotation set with a type equal to targetType that partly or completely overlap the range of the specified annotation set.
static AnnotationSet getOverlappingAnnotations(AnnotationSet sourceAnnotationSet, Annotation overlappedAnnotation, String targetType)
Get all the annotations of type targetType from the source annotation set that partly or totally overlap the range of the specified annotation.