A while back now, the “Kinja Product Team” rolled out a new feature where the editor can suggest tags, based on the body of the post. Maybe you even saw this barn-burner of a post from Ernie the Kinja Tech here?
Well, today I’m here, however many months later, to go through how this feature works so that maybe someone can take it and put it in Arc or Chorus or whatever.
Before I get to that though, it’s probably useful to add a little bit of backstory. Here at Kinja, like many other sites, you can tag posts, and then in turn view posts with a certain tag. This is useful for categorization and content discovery, as people can quickly see similar stories that they may be interested in.
Particular to this feature though, sometimes adding tags manually can be an annoying and tedious task. So, by adding in an easy way to get tags based on the actual content of a given article, we hoped to lessen this burden, and have a more consistent basis of tags.
At a high level, this feature is actually extremely simple. The basic flow is that, in our editor code, when you click the “Suggest Tags” button, we send an API call to our Kinja Autotagging microservice, which is specifically built for using the Google NLP library.
This gives us a response that looks like this:
To make this feature a little more user friendly, we then take these tags and send them to our API to get the “count” associated with each tag, or the amount of times a given tag has been used over a certain period of time. The more a tag gets used, the higher the count.
This response looks like:
We then take this response with the count, and do some quick matching on the names from the previous response to give us a final data structure that looks like:
We then take this data and feed it into an extremely simple React component, which you can see in action here:
So that’s it, really! It’s a pretty simple flow for a feature that hopefully makes articles a little bit easier to categorize and organize. A great future improvement for this project would be to use the Google AutoML Natural Language service, which you can feed a corpus of documents and train a custom model to get better results, but this seems to currently work well enough for our use case.