Sentiment Analysis of Tweets for Hate Speech Detection Using Binary Classification Algorithms and BERT
Abstract
In the modern world, social media wields a lot of power. Twitter, particularly, has provided people a platform to express their opinions about everything under the sun from mundane everyday life to politics, race, religion etc. It has often come under scrutiny for unabashed propagation of hate speech. This project employs natural language processing techniques on a corpus of tweets to detect hate speech. A total of 3538 unique tokens are identified that appear only in tweets classified as hate speech. With the help of data visualization techniques like word clouds and frequency distribution plots, it became evident that the occurrence of sexist, homophobic, and racist slurs is the most frequent in hate tweets. This implies that women, LGBTQ+ community, and people of color are the most targeted sections of society.