Skip to content

Annotation Set Transfer

• This PR enables copying or moving annotations from one set to another
• As with the Segment Processing PR, you can specify a covering annotation to delimit the section you're interested in
• One use for this is to change annotation set names or to move results into a new set, without rerunning the application
• For example, you might want to move all the gold standard annotations from Default to Key annotation set

Transferring annotations

tranferingAnnotations The annotations remain the same, they're just stored in a different set

Hands-on Exercise

Objective: move all the annotations from the Default set into the Key set
• Clear GATE of all previous documents, corpora, applications and PRs
• Load document self-shearing-sheep-marked.xml from hands-on material and create an instance of an AST (you may need to load the Tools plugin)
• Have a look at the annotations in the document
• Add the AST to a new application and set the parameters to move all annotations from Default to Key
• Make sure you don't leave the originals in Default!
• Run the application and check the results

Delimiting a section of text

• Another use is to delimit only a certain section of text in which to run further PRs over
• Unlike with the Segment Processing PR, if we are dealing with multiple sections within a document, these will not be processed independently
• So co-references will still hold between different sections
• Also, those PRs which do not consider specific annotations as input (e.g. tokeniser and gazetteer), will run over the whole document regardless

Processing a single section

ASTonaSingleSection

Transferring title annotations

• But the rest of the document remains tokenised
• These Tokens remain in the Default set because they weren't moved.
ASTTransferingTitleAnnots

Setting the parameters

• Let's assume we want to process only those annotations covered by the HTML “body” annotation (i.e. we don't want to process the headers etc.).
• We'll put our final annotations in the “Result” set.
• We need to specify as parameters

    textTagName: the name of the covering annotation: “body”
    tagASname: the annotation set where we find this: “Original markups”
    annotationTypes: which annotations we want to transfer
    inputASname: which annotation set we want to transfer them from: “Default”
    outputASname: which annotation set we want to transfer them into: “Result”

Additional options

There are two additional options you can choose
copyAnnotations: whether to copy or move the annotations (i.e. keep the originals or delete them)
transferAllUnlessFound: if the covering annotation is not found, just transfer all annotations. This is a useful option if you just want to transfer all annotations in a document without worrying about a covering annotation.

Parameter settings

ASTparametersettings

•Move all annotations contained within the “body” annotation (found in the Original markups set), from the Default set to the Result set.
•If no “body” annotation is found, do nothing.

Using it within an application

• We want to run ANNIE over only the text contained within the “title” text
• Apart from the tokeniser and gazetteer, the other ANNIE PRs all rely on previous annotations (Token, Lookup, Sentence, etc.)
• We run the tokeniser and gazetteer first on the whole document
• Then use the AST to transfer all relevant Token and Lookup annotations into the new set
• Then we can run the rest of the ANNIE PRs on these annotations
• To do this, we use for inputAS and outputAS the name of the new set “Result”

Application architecture

ASTarchitecture

Hands-on: processing a document section

● We will modify ANNIE to only run over the title of the document
● Load the document cricket.html and create a corpus with it
● Load ANNIE
● Add an AST immediately after the tokeniser and gazetteer
● Set the AST parameters to move all annotations contained within the “title” annotation (found in the Original markups set), from the Default set to the Result set.
● If you get stuck, check the slide “Setting the Parameters”
● Modify the Input and Output set of all following PRs to “Result”
● Run on the corpus and inspect the results

ASToutput