Chinese Hate-speech Classification using Machine Learning (in progress…)

Inspired by a discussion on automatic hate speech detection from a class, I was curious to find out how easy it is to find hateful content in Chinese on Twitter so I typed in a few racial slurs on their search engine – and oh, how much I wish I could unsee what it returned to me.

Automatic hate speech detection is definitely not an easy task, especially when they don’t contain hateful slurs. But how difficult it is to detect hate speech with slurs in it? I think we can do better than this.

Therefore, I decided to do something about it. I have collected 10,000 potentially hateful Tweets using a key-word approach and have them annotated by three annotators from Hong Kong. Now I am working on training a classifier to identify Chinese hate speech automatically.

Check out this GitHub repository for details of pre-processing, feature selection, and modelling for the baseline ML model.

Voice Skill: DMV Metro

One day while I was putting my make-up on, I really needed to know when the next train would come without me typing into Google Maps because I was in a huge rush. So I thought it would be great if I could simply ask Siri or Alexa, handsfree. But then I realised they didn’t have such a function specifically for my need: tell me when the next train will arrive at the station near my home in D.C. So me, some researchers and conversation designers from the Human Language Technology group @ Georgetown University decided to build a voice skill called DMV Metro on Alexa and Google Actions to fill in this gap.

Check out this Github repository to see how to build intents and slot filling, and handle user responses and API integrations on both the front-end and back-end.