Skip to content

Machine Learning through Gate

GateNLP - Learning Framework Plugin:

The GATE LearningFramework plugin is a plugin for the GATE NLP platform. It supports a number of machine learning tasks relevant to NLP (classification, chunking, regression) and provides a consistent way to use a broad range of machine learning algorithms from several libraries to perform those tasks.

Preparing Data for ML:

Irrespective of machine learning task first we have to prepare data because data plays a key role in ML. Here we going to divide data into two sets

  • Training set
  • Testing set
  • If we consider having a whole data set of 1000 documents for building a model.In that take 800 documents for training the model and the remaining 200 documents(unseen by training model) for testing the model ,so as to evaluate our model appropriately and take feedback to improve our model.

    Training set:

    All the data in training set must contain the required class annotation(Desired OutPut) in the default annotation set along with basic Annotations Like Token,Sentence etc or the annotations on which the desired annotation is dependent .These basic annotations are used to create ML features and sometimes acts as instance type for the ML Algorithm.

    TestingSet:

    All the data in the testing set must contain basic annotations and dependent annotations used in the training set, and required annotation is stored in the key annotation set in order to compare the results with the Annotations in the Learning Framework set obtained by running the model.