"Abusive and Threatening language detection in Urdu"

CICLing 2021 UrduAbusive & UrduThreat track @ FIRE 2021

Task Description

With the emergence of the internet and technology, the usage of social media platforms has grown substantially. Social media platforms allow users to share their thoughts and a peculiar source of communication. However, people use these platforms to threaten someone and use abusive language on these platforms. The threat is characterized as an expression of a clear intent to cause bodily or other harm. People use language to highlight, depict, elicit, instruct, and urge to inform the nuances of themselves and their worlds, some use it for a good cause, and some use it for destructive purposes. For example, "shut your mouth, or you will see red." In this context, the word "red" might be perceived as a threat to cause injury to someone or, in the worst-case scenario, bloodshed/killing. Therefore, such references might be considered a vile aspersion. Impacts of abusive language are detrimental, ranging from short-term emotional reactions (anger, fear, self-blame, etc.) to long-term psychological effects (low self-esteem, depression, etc.), causing mental and physical health issues (sleep disorder, headache, eating disorder, etc.).

The Urdu language has more than 230 million speakers worldwide, with vast representation on social networks and digital media. Therefore, it is essential to propose methods that can automatically detect threatening and abusive language to avoid violence and outrageous consequences. To our best knowledge, this is the first shared task on Abusive language detection in Urdu.

We divide one task into two sub-tasks:

  1. Threatening Language Detection in the Urdu Language

  2. Abusive Language Detection in the Urdu Language

This task aims at identifying threatening and abusive language in the Urdu language on Twitter. Given a Twitter tweet, binary classification systems have to do two tasks

  • differentiate it as either threatening or non-threatening tweet,

  • differentiate it as either abusive or non-abusive tweet.