Characterizing Communal Micro blogs During Disaster Events

Rise in hateful tweets on Social Media


Social media sites ranging from whatsapp texts to Instagram have seen an unprecedented spike in number of users. Many users use anodyne language and still make their point reach across the globe. Few obstinate users use gratuitous remarks which, sometime may be called callous to be mild, or completely outrageous. News consumption through conventional method is slowly fading away due to emergence of these sites and therefore its important, now more than ever to limit hate speech while letting users speak their minds. There are many types of hate speech, each with their own set of challenges, targets and motive. We have primarily focused on an especially prodigious and potentially dangerous type which is called communal hatred. Simply put, these are the posts, tweet, messages targeted toward religious or racial communities during crisis situation which creates a volatile environment and therefore must be stopped or contained. Many communities are targeted such as “Hindu,” “Muslims,” “Sikhs” etc. Such kind of communal tweets help in developing fear and creates a sense mistrust and despair at a time when hope and unity is the need of the hour and which subsequently deteriorates communal well-being, not to mention law and order situation as well. Therefore, its important that we make sure that such messages are contained and not let venom spread through the community and put further pressure on administrations. An interesting pattern has been observed among the spreaders of these tweets, while many of them are nefarious bots, that’s not the only problem.

Effects of hateful tweets and Countering them


Many users who post such gratuitous remarks or statements are in fact popular influencers or personalities with a large followers which amplifies the already distorted reality being projected. We have tried to provide an automated system which can, in real time analyze ,understand and help counter the offensive narrative. Such message can put strain and pressure on government and agencies to divert valuable resources to quell fear and calm communities instead of helping the needy and the vulnerable. While such acts of pure bigotry is horrifying enough, what has made things even worse in countries like India, that these posts crop up even during the worst possible time, i.e., during natural disasters exacerbating already simmering tension. Such messaging tend to flare up during such tragic events such as terrorist attacks where in the ethnic background of the perpetrator is used as an excuse to justify targeting people of that background. This makes it difficult to ask communities to unify for solidarity .We have developed a system to categorize and differentiate different tweets, the users who are tweeting and also try and counter these tweets through facts and positive messaging .

Using Big Data concepts for effective decision making


The emergence of many internet connected devices coupled with soaring growth of consumers demand for internet based devices across the globe has led to billions of devices accessing internet and therefore producing terra bytes of data every minute. This data has a potential to be analyzed and identify everything from emergence of diseases, traffic predictions to curating tv shows for particular user based on their watching history. This enormous, continuously exponentially growing data is called Big Data. While, Big data in itself isn’t very valuable until and unless its selectively sourced, cleaned processed and then it’s transformed. This transformed data further is evaluated and visualized. This process of extracting useful information such that it may further help in decision making at the executive level where cognitive ability is of greater importance is called \ac{KDD}.We have used classification to determine what may constitute as hateful and what may seem as hateful but is not necessarily hate filled comment .This is achieved through grammatical inter relations to understand the othering of the tweets. The approach is rule based. The exercise was used to determine malicious, malignant tweet with a clear motive toward creating hatred, dividing and discriminating against the so called others. The differentiating on the basis of grammatical semantics is extremely important because keywords alone may not constitute any harm, maybe the keywords used come under communal but the post, itself is not virulent attack. Such classification is important otherwise the system lends itself to trouble and scrutiny which would otherwise be reserved for the perpetrators of such tweets.
Classifier are used almost everywhere given they form the basis of the big data application, subsequently also helping in \ac{ML}.Classifiers will be used to identify and then train a machine to spot and analysis the tweet. Now, this is an important aspect of the tool we build. To fully exploit and unleash the potential of classifier we need to conform to realities of either supervised learning or the unsupervised learning. Supervised learning is not desirable until and unless the data is too sporadic, too complex so much so that human intervention is the only way to fully comprehend data. It has been observed that more than 50% of such tweets flood the social media sites after a major event and therefore ,its imperative for the authorities concerned to quickly identify them. Twitter’s own repository ,API provides one with enough tweets to build databases. Sites such as Chorus Analytic, Follow the hashtag are extremely vital for collection and improving database and the classifier by extension.
Rule based approach and probability logics were applied on individual classifier, this is important to make the most out of databases and not clog the entire system. Further one table or dictionary will not be able to store all the keywords and therefore this was done individually. Fascinating was the fact that across all the sample sets that was tested on, the results were uniform .This means when this mode is scaled up to meet exponential demands in real time across different types of hate filled cyber crimes.
{Determine targets of hostile tweet}
It’s no secret that Social media sites are one of the most important way not only to communicate, but also to get information about the world. This is an important point because this means , unlike many conventional media where news is more or less vetted and the channel still has to adhere to some journalistic integrity and law,this isn’t the case with online social media sites.
The social media sites allow compatible or like-minded individuals to interact about their common and shared interests. While, this seems an extremely safe and tranquil set up ,the flip side of these interactions is the fact that it gives individual with nefarious motive to belligerent with their fear mongering and amplify their tweets. Because, the sites themselves appear to be friendly and intimate because of shared ideas, the echo-chambers are very easy to form and precipitate.
Its therefore no surprise that many of these sites harbor groups which amongst themselves share inflammatory information which donot have factual basis.
There hasn’t been enough effort to weed out the people who spread these inaccurate, wild and offensive tweets. The fact remains that despite the best efforts of the lawmakers, there still remains a prodigious gap between what is actually an offensive tweet and what’s not. Its imperative that we understand that while some tweet may not necessarily cause any physical form, but nevertheless may still cause pain to people.
The understanding is that if a particular tweet, post or a series of them, can cause such problems for an individual or a community or an organization or harm them. Twitter feeds can be used to identify novel keywords. Creating and organizing databases are important steps to define novel keywords that would help us in evaluating newer forms of abuse.
{Understand the hate against communities}
As has already been reiterated over and again that Social Media sites serve as an independent platform and are important tool to secure and keep of freedom of speech within the reach of even those for whom, their independent and free voice can lead to more than just some inconvenience, but can lead to a life in jail or worse. This is why we have to traverse a fine line ,which on any given day keeps on shifting between overly cautious or overtly careless, because any misstep will be criticized as blocking free speech and the lack thereof will be criticized as letting hate mongers spew venom and run amok with their words. Our classifier was used to determine and categorize the tweets as communal and non communal ,this is because it has been observed that ,while disaster may bring the best in us ,it also sections of population, political and the likes, to further their agenda and build their campaign on such themes and disrupt the collective harmony, solidarity and peace that is so fragile but still sustainable.
Communal tweets are especially dangerous given that every community expresses desire for reforms which will help them live their life peacefully. Future updates would need to research and determine how often does the glossary need to be updated. The hashtags ,that may be used to expand the glossary’s use further and this would. It would also require a collective effort to understand how communal conversation are spun and the conversation ,they are part of. Twitter users from across the globe, from various ethnic races will converge making these research more useful.
{Sentiment analysis on Web}
This is a research based topic. Since almost every website allows some form of comment, which is seen a free wheeling ride for hate mongers to post such message which are targeted towards a particular community and portrays them in bad light or falsely accuses them of conspiracy against others. These comments on web forums, websites are very easy to shrug off as some malicious miscreants trying to create disturbance. These post nevertheless pose real danger. Because, the sites are mainly Opinionated or news channel and therefore appear to be friendly and intimate because of their importance and relevance , the echo-chambers are very easy to rise and precipitate overtime to such level that it no longer represents a debate of opinions rather a shouting match to rouse like minded users.
To remove or in some way contain such hate filled comments , Anti-dictionary is proposed.It would be a glossary of words,sentences that is used to demean a particular community.While this may suffice in web pages,web sites or even blogs that are not as dynamic or open platforms such as twitter, they may not fully overcome the challenges posed by rapidly evolving environment.
Its imperative to separate words also known as lexicon, from sentences which have been widely used by users on twitter. The keyword and their semantic dependencies understood , the words are then separated and an entire dictionary is developed to be engaged whenever and wherever required.
There are many algorithms that can be employed to to sentient analysis.
One such algorithm is \ac{SVM}. \ac{SVM} is a supervised \ac{ML} algorithm. SVM is used for regression and classification problems.However their usage for classification problem is more pronounced simply because it works very well especially in sentiment analysis.SVM plots data points on a plane if its 3d or a margin line on 2-d.N-grams could be used ,but they have donot yield desired results.SVM uses,what are known as kernels to classify,these are mathematical functions and there are various types of them,\ac{eg}-linear, sigmoid, RBF, non-linear, polynomial.
There is a big obstacle when trying to do a sentiment analysis,the primary among them is the fact that from linguistic point of view there is subject dependency of opinions or attitude. This would mean negative words such as never, lie would tilt the attitude.Similarly if a verb such skeptical is added in front of a positive word ,the entire meaning of the sentence changes.This very reason of long distance dependency between words make using n-grams, which are continuous n words unpopular or undesirable to use.
To perfectly blend SVM models and use ,them proper data is gathered.This is the vectorized and model is built.
Its important to select a right hyperplane,this hyperplane would determine the margin from which to determine distance.Lower margins would report error.The selection of margin ,thus remains a problem.
Various research on SVM has indicated that high degree words have been fruitless in classification,yet more have indicated that it does help.
There are suggestions to instead use weighed models such that the these models dont overfit and then use SVM to fully classify the data. Still,more research is needed before SVM can give near perfect response.