I’m going to show you how to predict the Part of Speech Label in a sentence easily using a bit of math and C#.
Project Snoop, its all about the words. hah, its a NLP engine I’ve been working on for a long time now. In my attempt to get in to Machine Learning, I didn’t want to just install python and download TensorFlow and have all the magic happen behind the scenes as I sat here obliviously.
So I decided to write a NLP library from scratch, and believe me, it took time for this developer without a high-school education to pull this out of my own ass.
The project Isn’t Open-source just yet, I want to complete it a bit more before I am ready for the world to judge it. That being said, I thought that my CRF implementation was rather clever, so I wanted to share it. CRF had me baffled for a long time, but when the math finally started to click and I got the code written, It was unbelievably simple. It just looks scary.

I created a class called GCRF (Generic Conditional Random Fields) although the fields aren’t generic and the actual implementation is a linear chained CRF I still stuck with it.
The GCRF class has a Sequence action which we can hook up-to, to tokenize the input.

PartOfSpeechCRF = new GCRF<string, Word>(LanguageModel.PosModel.Labels)
{
    Sequence = (string s) =>
    {
        return Text.Tokenize(s).ToArray();
    }
};

Additionally we also have a Feature Function we can hook up to:

PartOfSpeechCRF.FeatureFunction += (Word[] words, int index, string label) => {
    Vector vec = new(words[1].Feature.Size * 3);
    Word[] word = new Word[] {
        index > 0 ? words[index - 1] : new Word(),
        words[index],
        index < words.Length - 1 ? words[index + 1] : new Word()
    };
    for (int w = 0; w < word.Length; w++)
    {
        vec.Append(word[w].Feature);
    }
    return vec;
};

The feature function creates a large vector for the given word that contains information about the word itself, and the words that are adjoined to it. the word.Feature is a float array of information such as how many letters the word has, how many are upper-case, is the first character upper-case, are there any foreign characters or numbers in the word. it does this 3 times for each word the left, center and right word, and returns a vector about that information.
This information will be used later when we bash it in a logistic regression model to predict the probability of the word being labeled a Verb, Noun, Adjective or any other POS tag.

The equation for CRF scoring is to iteration over the feature function for each item and score its value.

$$score(l | s) = \sum_{j = 1}^m \sum_{i = 1}^n \lambda_j f_j(w, i, w_l)$$

The feature function takes in:

  • the sentence aka word array
  • the position i of the word in the sentence
  • the word label wl of the current word

Once each word has been scored we transform the values to probabilities using logistic regression.

$$p(l | s) = \frac{exp[score(l|s)]}{\sum_{l’} exp[score(l’|s)]} = \frac{exp[\sum_{j = 1}^m \sum_{i = 1}^n \lambda_j f_j(w, i, w_l)]}{\sum_{l’} exp[\sum_{j = 1}^m \sum_{i = 1}^n \lambda_j f_j(w, i, w_l)]}$$

This way we can train our model on a sequence of labeled data of words and their accompanying POS tag.
Once trained we can use the Predict function which will give us the most likely label for the word queried.

public class GCRF<T, T2> where T : class
{

    public string[] Labels { get; private set; }

    private LogisticRegression LogRegression { get; set; }

    public GCRF(string[] labels)
    {
        Labels = labels;
        LogRegression = new LogisticRegression();
    }

    // The feature function to use
    public Func<T2[], int, string, Vector> FeatureFunction;

    // How to split the data in to a sequence of smaller pieces
    public Func<T, T2[]> Sequence;

    // Training CRF using logistic Regression
    public void Fit(Matrix Training, Matrix Validation, double lr = 0.01, int epoch = 10000)
    {
        LogRegression.Train(Training, Validation, lr, epoch, ActivationFunc.Sigmoid);
    }

    private double ScoreEvaulation(Vector data, string label)
    {
        var _output = LogRegression.ComputeOutput(data, ActivationFunc.Sigmoid);
        var e = Array.IndexOf(Labels, label);
        return _output[e];
    }

    // Score each label probablity for each sequenced element
    private Vector Score(T2[] data, string label)
    {
        Vector w = new(data.Length);
        for (int j = 0; j < data.Length; j++)
        {
            w[j] += ScoreEvaulation(FeatureFunction(data, j, label), label);
        }
        return w;
    }

    // Getting a matrix of labeled probablities. 
    public Matrix PredictMatrix(T data, string[] labels)
    {
        if (Sequence == null)
            throw new InvalidOperationException("Sequence not set.");
        T2[] sequence = Sequence(data);
        Matrix p = new(labels.Length, 0);
        for (int i = 0; i < labels.Length; i++)
        {
            p[i] = Score(sequence, labels[i]);
        }
        return p;
    }

    // Label each element in the data with its corresponding probability
    public (T2 element, string label, double prob)[] Predict(T data)
    {
        if (Sequence == null)
            throw new InvalidOperationException("Sequence not set.");
        var sequence = Sequence(data);
        Matrix p = PredictMatrix(data, Labels);

        (T2 element, string label, double prob)[] max = new (T2 element, string label, double prob)[sequence.Length];
        for (int i = 0; i < sequence.Length; i++)
        {
            Vector v = new(p.SelectMany(x => x.Skip(i).Take(1)).ToArray());
            int maxIndex = v.MaxIndex;
            double maxProb = v[maxIndex];
            string maxLabel = Labels[maxIndex];
            max[i] = (sequence[i], maxLabel, maxProb);
        }
        return max;
    }

}

Of course I’m going to include the code for the GCRF class, however useful it maybe as there are missing class references, such as LogisticRegression, Vector and Matrix classes
but I hopefully demonstrated that CRF’s are not that scary.