Advanced Jape¶

Debugging JAPE Grammars¶

Read the error messages, they are helpful!

• line numbers etc. refer to the original JAPE files

• description usually highlights the exact problem

file:/home/gate/plugins/ANNIE/resources/NE/name.jape:
Encountered " <kleeneOp> "? "" at line 1580, column 10.
Was expecting one of:
"\"" ...
<ident> ...
"|" ...
"{" ...
"(" ...
")" ...

When trying to understand how annotations were created by a grammer try the new enableDebugging option of the Jape Transducer run time parameters:
addedByPR: the name of the JAPE PR running the grammar that produced the annotation
addedByPhase: the name of the phase (usually the filename) in which the annotation was created
addedByRule: the name of the rule responsible for creating the annotation

These are given as the features of the annotations generated by the japeTransducer

Using Java in JAPE¶

Beyond Simple Actions

It’s often useful to do more complex operations on the RHS than simply adding annotations, e.g.

• Set a new feature on one of the matched annotations

• Delete annotations from the input

• More complex feature value mappings, e.g. concatenate several LHS features to make one RHS one.

• Collect statistics, e.g. count the number of matched annotations and store the count as a document feature.

JAPE has no special syntax for these operations, but allows blocks of arbitrary Java code on the RHS.

Gate API¶

GATE Feature Maps¶

Feature Maps. . .

• are simply Java Maps, with added support for firing events.
• are used to provide parameter values when creating and configuring resources.
• are used to store metadata on many GATE objects.

All GATE resources are feature bearers (they implement gate.util.FeatureBearer):

public interface FeatureBearer{
public FeatureMap getFeatures();

public void setFeatures(FeatureMap features);
 }

Creating a new FeatureMap

FeatureMap fm=Factory.newFeatureMap();

GATE Documents¶

A GATE Document comprises:

• a DocumentContent object;

• a Default annotation set (which has no name);

• zero or more named annotation sets;

A Document is also a type of Resource, so it also has:

•a name;

•features

Main Document API Calls¶

mainDocumentAPIcalls

Annotation Sets¶

GATE Annotation Sets. . .

• maintain a set of Node objects (which are associated with offsets in the document content);

• and a set of annotations (which have a start and an end node).

• implement the gate.AnnotationSet interface;

• which extends Set(Annotation).

• implement several get() methods for obtaining the included annotations according to various constraints.

• are created, deleted, and managed by the Document they belong to.

Main AnnotationSet API Calls¶

Nodes

/ / Get the node with the smallest offset.
public Node firstNode();
/ / Get the node with the largest offset.
public Node lastNode();

Creating new Annotations

/ / Create (and add) a new annotation
public Integer add(Long start, Long end, String type, FeatureMap features);
/ / Create (and add) a new annotation
public Integer add(Node start, Node end, String type, FeatureMap features);

Getting Annotations by ID, or type¶

AnnotationSetAPIbyID

Getting Annotations by position¶

AnnotationSetAPIbyposition

Combined get methods¶

AnnotationSetAPIbygetMethods

Annotations¶

GATE Annotations. . .

• are metadata associated with a document segment;
• have a type (String);
• have a start and an end Node (gate.Node);
• have features;
• are created, deleted and managed by annotation sets.

Note

Always use an annotation set to create a new annotation! Do not use the constructor.

Annotation API¶

Main Annotation methods:
public String getType();
public Node getStartNode();
public Node getEndNode();
public FeatureMap getFeatures();

gate.Node
public Long getOffset();

JAPE With Java RHS Template¶

Imports: { import static gate.Utils.*; }

Phase: Example
Input: Token // and any other input Annotations
Options: control = appelt

Rule: Example1
(
// Normal JAPE LHS goes here
):label
-->
{
// java code goes in here
}

Every jape grammar is coverted into a java class internally ,the phase name is given as the java class name and it consists of a method called "doit()" method which takes the parameters(doc,bindings,inputAS,outpuAS and ontology (these will be dicussed in next section),all the RHS code we write in the rule goes into this method.

import java.util.*;
   import gate.util.*;
   import gate.jape.*;
   import gate.creole.ontology.*;
    // JAPE Source: file:/home/kpmd/Documents/test/SentenceContained.jape:1

     import static gate.Utils.*;
     import java.util.ArrayList;
      import java.util.List;

  public class SentenceconatinedSentenceconatinActionClass225
  implements java.io.Serializable, gate.jape.RhsAction { 
    private gate.jape.ActionContext ctx;
    public java.lang.String ruleName() { return "Sentenceconatin"; }
    public java.lang.String phaseName() { return "Sentenceconatined"; }
    public void setActionContext(gate.jape.ActionContext ac) { ctx = ac; }
    public gate.jape.ActionContext getActionContext() { return ctx; }
    public void doit(gate.Document doc, 
                     java.util.Map<java.lang.String, gate.AnnotationSet> bindings, 
                     gate.AnnotationSet inputAS, gate.AnnotationSet outputAS, 
                     gate.creole.ontology.Ontology ontology) throws gate.jape.JapeException {
      gate.AnnotationSet sentAnnots = bindings.get("sent"); 
      if(sentAnnots != null && sentAnnots.size() != 0) { 
     AnnotationSet s=inputAS.get("Sentence");
       for(Annotation a:s)
  {
  AnnotationSet set =inputAS.getContained(start(a),end(a));
     Set <String> Annotationname=set.getAllTypes();
      System.out.println(Annotationname);
    a.getFeatures().put("annotationname",Annotationname);
      }

Java Block Variables¶

The variables available to Java RHS blocks are:

doc

inputAS

outputAS

ontology

bindings

Bindings¶

• bindings is a Map from string to AnnotationSet
• Keys are labels from the LHS.
• Values are the annotations matched by the label.

(
{Token.string == "University"}
{Token.string == "of"}
({Lookup.minorType == city}):uniTown
):orgName

• bindings.get("uniTown") contains one annotation (the Lookup)
• bindings.get("orgName") contains three annotations (two Tokens plus the Lookup)

A Simple Example¶

This is a simple example of a Java RHS that prints the type and features of each annotation it matches.

Rule: ListEntities 
({Person}|{Organization}|{Location}):ent
-->
{
// get the annotations matched
AnnotationSet ents = bindings.get("ent");
for(Annotation e : ents) {
// display the type and features for each
System.out.println("Type: " + e.getType());
System.out.println("Features: " + e.getFeatures());
}
}

Named Java Blocks¶

-->
:uniTown{
    uniTownAnnots.iterator().next().getFeatures()
        .put("hasUniversity", Boolean.TRUE);
}

• You can label a Java block with a label from the LHS
• The block will only be called if there is at least one annotation bound to the label
• Within the Java block there is a variable labelAnnots referring to the AnnotationSet bound to the label
i.e. AnnotationSet xyAnnots = bindings.get("xy")
• you can have any number of :bind.Type = {} assignment expressions and blocks of Java code, separated by commas.

Common Idioms for Java RHS¶

1.Setting a new feature on one of the matched annotations

Rule: LcString
({Token}):tok
-->
:tok {
for(Annotation a : tokAnnots) {
// get the FeatureMap for the annotation
FeatureMap fm = a.getFeatures();
// get the “string” feature
String str = (String)fm.get("string");
// convert it to lower case and store
fm.put("lcString", str.toLowerCase());
}
}

2.Modify the Java RHS block to add a generalCategory feature to the matched Token annotation holding the first two characters of the POS tag (the category feature).

Imports: {
   import static gate.Utils.*;
}
Phase: GeneralPos
Input: Token
Options: control = appelt
Rule: GeneralizePOSTag
({Token}):tok
-->
:tok {
  for(Annotation a:tokAnnots)
{
  String s=(String)a.getFeatures().get("category");
  System.out.println(s);
try{
a.getFeatures().put("generalcategory", s.substring(1,3));
}
catch(Exception e)
{
  System.out.println(e);
  }

}
} 

3.Removing matched annotations from the input

Rule: Location
({Lookup.majorType = "location"}):loc
-->
:loc.Location = { kind = :loc.Lookup.minorType, rule = "Location"},
:loc {
inputAS.removeAll(locAnnots);
}

This can be useful to stop later phases matching the same annotations again.

4.Accessing the string covered by a match

Rule: Location
({Lookup.majorType = "location"}):loc
-->
:loc {
String str = stringFor(doc,locAnnots);
}

Type	MethodName(Parameters)
1. static String	stringFor(Document doc, AnnotationSet anns)
2. static String	stringFor(Document doc, Long start, Long end)
3. static String	stringFor(Document doc, SimpleAnnotation ann)

1.Return the document text as a String covered by the given annotation set.
2.Returns the document text between the provided offsets.
3.Return the document text as a String corresponding to the annotation.

Contained Annotations¶

To get annotations contained within the span of the match

Imports: {
   import static gate.Utils.*;
   }
Phase: contained
Input:  NounChunk
Options: control = appelt
Rule: NPTokens
({NounChunk}):np
-->
:np {
List<String> posTags = new ArrayList<String>();
AnnotationSet nounchunktokens=getContainedAnnotations(inputAS,
npAnnots,"Token");
for(Annotation tok :nounchunktokens) 
{
posTags.add(
 (String)tok.getFeatures().get("category"));
}
 FeatureMap fm =
 npAnnots.iterator().next().getFeatures();
 fm.put("posTags", posTags);
}

Here in this rule we are getting tokens that are contained in NounChunks and we are adding the posTag(category) feature of the tokens contained in the Nounchunks as the feature of the NounChunks in the form of a list.

Modify the Java RHS block to count the number of propernouns in the matched Sentence and add this count as a feature on the sentence annotation.

Imports: {
   import static gate.Utils.*;
}

Phase: Num_Nouns
Input:  Sentence
Options: control = appelt
Rule: NumNouns
(
    {Sentence}

):sent 
-->
:sent{
    int count=0;
    for(Annotation a:sentAnnots)
    {
 AnnotationSet set=inputAS.getContained(start(a),end(a));
 //System.out.println(set.get("Token"));
for(Annotation  tokset: set.get("Token"))
{
String s=(String)tokset.getFeatures().get("category");
if(s.equals("NNP"))
{
    count =count+1;
    a.getFeatures().put("count", count);
}
    }
    }
}

Example Scenario¶

• Load a document into Gate
• find out how many named annotation sets it has;
• find out how many annotations each set contains;
• for each annotation set, for each annotation type, find out how many annotations are present.

// obtain a map of all named annotation sets
Map<String, AnnotationSet> namedASes =
doc.getNamedAnnotationSets();
System.out.println("No. of named Annotation Sets:"
+ namedASes.size());

// no of annotations each set contains
for (String setName : namedASes.keySet()) {
// annotation set
AnnotationSet aSet = namedASes.get(setName);
// no of annotations
System.out.println("No. of Annotations for " +
setName + ":" + aSet.size());
// all annotation types
Set<String> annotTypes = aSet.getAllTypes();
for(String aType : annotTypes) {
System.out.println(" " + aType + ": "
+ aSet.get(aType).size());
}
}

• Use the document in the above Scenario;
• Use the annotation set Original markups and obtain annotations of type a (anchor).
• Iterate over each annotation, obtain its features and print the value of href feature.
TIP: Before printing the value of href feature, use the new URL(URL context, String spec) constuctor such that the value of the href feature is parsed within the context of the document’s source url.

// obtain the Original markups annotation set
AnnotationSet origMarkupsSet =
doc.getAnnotations("Original markups");

// obtain annotations of type ’a’
AnnotationSet anchorSet = origMarkupsSet.get("a");

// iterate over each annotation
// obtain its features and print the value of href feature
for (Annotation anchor : anchorSet) {
String href = (String) anchor.getFeatures().get("href");
if(href != null) {
// resolving href value against the document’s url
System.out.println(new URL(doc.getSourceUrl(), href));
}
}

Some more methods in gate.Utils Class¶

Modifier and Type	Method and Description
static List	inDocumentOrder(AnnotationSet as)
	Return a List containing the annotations in the given annotation set, in document order (i.e. increasing order of start offset).
static Integer	addAnn(AnnotationSet outSet, AnnotationSet spanSet, String type, FeatureMap fm)
	Add a new annotation to the output annotation set outSet, spanning the same region as spanSet, and having the given type and feature map.
static Integer	addAnn(AnnotationSet outSet, Annotation spanAnn, String type, FeatureMap fm)
	Add a new annotation to the output annotation set outSet, covering the same region as the annotation spanAnn, and having the given type and feature map.
static Integer	addAnn(AnnotationSet outSet, long startOffset, long endOffset, String type, FeatureMap fm)
	Add a new annotation to the output annotation set outSet, spanning the given offset range, and having the given type and feature map.
static String	cleanStringFor(Document doc, AnnotationSet anns)
	Return the cleaned document text as a String covered by the given annotation set.
static String	cleanStringFor(Document doc, Long start, Long end)
	Return the cleaned document text between the provided offsets.
static String	cleanStringFor(Document doc, SimpleAnnotation ann)
	Return the cleaned document text as a String corresponding to the annotation.
static Long	end(AnnotationSet as)
	Get the end offset of an annotation set.
static Long	end(SimpleAnnotation a)
	Get the end offset of an annotation.
static Long	end(SimpleDocument d)
	Get the end offset of a document.
static Long	start(AnnotationSet as)
	Get the start offset of an annotation set.
static Long	start(SimpleAnnotation a)
	Get the start offset of an annotation.
static Long	start(SimpleDocument d)
	Get the start offset of a document.

Methods on covering,overlapping,coextensive and contained Annotations¶

Modifier and Type	Method and Description
static AnnotationSet	getCoextensiveAnnotations(AnnotationSet source, Annotation coextAnn)
	Get all the annotations from the source annotation set that start and end at exactly the same offsets as the given annotation.
static AnnotationSet	getCoextensiveAnnotations(AnnotationSet source, AnnotationSet coextSet)
	Get all the annotations from the source annotation set that start and end at exactly the same offsets as the given annotation set.
static AnnotationSet	getCoextensiveAnnotations(AnnotationSet source, AnnotationSet coextSet, String type)
	Get all the annotations from the source annotation set that start and end at exactly the same offsets as the given annotation set and are of the specified type.
static AnnotationSet	getCoextensiveAnnotations(AnnotationSet source, Annotation coextAnn, String type)
	Get all the annotations from the source annotation set that start and end at exactly the same offsets as the given annotation and have the specified type.
static AnnotationSet	getContainedAnnotations(AnnotationSet sourceAnnotationSet, Annotation containingAnnotation)
	Get all the annotations from the source annotation set that lie within the range of the containing annotation.
static AnnotationSet	getContainedAnnotations(AnnotationSet sourceAnnotationSet, AnnotationSet containingAnnotationSet)
	Get all the annotations from the source annotation set that lie within the range of the containing annotation set, i.e. within the offset range between the start of the first annotation in the containing set and the end of the last annotation in the annotation set.
static AnnotationSet	getContainedAnnotations(AnnotationSet sourceAnnotationSet, AnnotationSet containingAnnotationSet, String targetType)
	Get all the annotations from the source annotation set with a type equal to targetType that lie within the range of the containing annotation set, i.e. within the offset range between the start of the first annotation in the containing set and the end of the last annotation in the annotation set.
static AnnotationSet	getContainedAnnotations(AnnotationSet sourceAnnotationSet, Annotation containingAnnotation, String targetType)
	Get all the annotations of type targetType from the source annotation set that lie within the range of the containing annotation.
static AnnotationSet	getCoveringAnnotations(AnnotationSet sourceAnnotationSet, Annotation coveredAnnotation)
	Get all the annotations from the source annotation set that cover the range of the specified annotation.
static AnnotationSet	getCoveringAnnotations(AnnotationSet sourceAnnotationSet, AnnotationSet coveredAnnotationSet)
	Get all the annotations from the source annotation set that cover the range of the specified annotation set.
static AnnotationSet	getCoveringAnnotations(AnnotationSet sourceAnnotationSet, AnnotationSet coveredAnnotationSet, String targetType)
	Get all the annotations from the source annotation set with a type equal to targetType that cover the range of the specified annotation set.
static AnnotationSet	getCoveringAnnotations(AnnotationSet sourceAnnotationSet, Annotation coveredAnnotation, String targetType)
	Get all the annotations of type targetType from the source annotation set that cover the range of the specified annotation.
static AnnotationSet	getOverlappingAnnotations(AnnotationSet sourceAnnotationSet, Annotation overlappedAnnotation)
	Get all the annotations from the source annotation set that partly or totally overlap the range of the specified annotation.
static AnnotationSet	getOverlappingAnnotations(AnnotationSet sourceAnnotationSet, AnnotationSet overlappedAnnotationSet)
	Get all the annotations from the source annotation set that overlap the range of the specified annotation set.
static AnnotationSet	getOverlappingAnnotations(AnnotationSet sourceAnnotationSet, AnnotationSet overlappedAnnotationSet, String targetType)
	Get all the annotations from the source annotation set with a type equal to targetType that partly or completely overlap the range of the specified annotation set.
static AnnotationSet	getOverlappingAnnotations(AnnotationSet sourceAnnotationSet, Annotation overlappedAnnotation, String targetType)
	Get all the annotations of type targetType from the source annotation set that partly or totally overlap the range of the specified annotation.