Dataset

Data

The primary objective of this shared task is to detect abusive and Threatening language in the Urdu language.

Participants in this year’s shared task can choose to participate in either one or both subtasks.

Tasks

Sub-task A:-

Sub-task A focuses on detecting Abusive language using Twitter tweets in Urdu language. This is a binary classification task in which participating systems are required to classify tweets into two classes, namely: Abusive and Non-Abusive.

  • Abusive - This Twitter post contains any abusive content.

  • Non-Abusive - This Twitter post does not contain any abusive or profane content.

Sub-task B:-

Sub-task B focuses on detecting Threatening language using Twitter tweets in Urdu language. This is a binary classification task in which participating system are required to classify tweets into two class, namely: Threatening and Non-Threatening.

  • Threatening - This Twitter post contains any threatening content.

  • Non-Threatening - This Twitter post does not contain any threatening or profane content.

  1. Abusive

We followed Twitter definition to describe abusive comments toward an individual or groups to harass, intimidate, or silence someone else’s voice.

  1. Threatening

We followed Twitter definition to describe Threatening posts toward an individual or groups to threaten with violent acts, to kill or inflict serious physical harm, to intimidate, and to use violent language.

Training set: The link to download the corpus will be provided soon. (Training Dataset)

Test set: The link to download the testing dataset will be shared soon.