Abusive and Threatening Language Detection Task in Urdu
CICLing 2021 track @ FIRE 2021 co-hosted with ODS SoC 2021
Dear Participants and Interested Community,
The leaderboard part of the competition is over as of Aug 29, 2021. However, even if you have not participated yet, you can still participate.
We encourage EVERYONE (ODS SoC leaderboard participants as well as those who were late for the leaderboard deadline) to submit their technical reports to the published in [FIRE 2021] Hate Speech and Offensive Content Identification track before the deadline on September 25, 2021.
Please, see the instructions on submissions.
For your convenience, we make available the datasets along with the ground truth annotations ("correct labels") to the Test datasets for both subtasks:
When writing a paper or producing a software application, tool, or interface based on the datasets or baseline systems provided on this website, it is necessary to properly cite the source. Below are citations for the corresponding paper :
Maaz Amjad, Alisa Zhila, Grigori Sidorov, Andrey Labunets, Sabur Butt, Hamza Imam Amjad, Oxana Vitman, and Alexander Gelbukh (2021) Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021. In CEUR Workshop Proceedings. (2021). CEUR Workshop Proceedings.
We encourage participants to suggest methods that can automatically detect threat and abuse in Urdu language to avoid violence and outrageous consequences.
To our best knowledge, this is the first shared task on Abusive language detection in Urdu.
NOTE: All dates are End-of-Day Pacific Time Zone
The Task is divided into two Subtasks
Participants in this year’s shared task can choose to participate in either one or both subtasks.
Sub-task A focuses on detecting Abusive language using Twitter tweets in Urdu language. This is a binary classification task in which participating systems are required to classify tweets into two classes, namely: Abusive and Non-Abusive
Abusive - This Twitter post contains any abusive content.
Non-Abusive - This Twitter post does not contain any abusive or profane content.
Threatening - This Twitter post contains any threatening content.
We followed Twitter's definition to describe Threatening posts toward an individual or groups to threaten with violent acts, to kill or inflict serious physical harm, to intimidate, and to use violent language
Dataset for Subtask B
Task Results Submission
Participants may join in teams of up to 7 people including all co-authors of the paper (if desired to submit a technical report)
Task link and result submission instructions TBD on July 19, 2021
The format and limit of result submissions TBD
When using the provided datasets, please, cite our paper as follows:
Maaz Amjad, Alisa Zhila, Grigori Sidorov, Andrey Labunets, Sabur Butt, Hamza Imam Amjad, Oxana Vitman, and Alexander Gelbukh (2021) Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021. . In CEUR Workshop Proceedings. (2021). CEUR Workshop Proceedings.
Technical Report Submission
After the result submission deadline, participants are invited to submit an abstract and a technical report paper with a brief description of their approach and experiments for publication in the FIRE 2021 Proceedings. All the working notes will be published in CEUR Workshop Proceedings.
For teams wishing to submit their technical reports, please, also register here:
Note: Even if you are not sure about your plans to submit the report, please, better register early. You can withdraw later.
Technical reports MUST be accompanied by the system code.
The code of the final system, an abstract, and a technical report in a zip file should be submitted at UrduAbs2021 (at) CICLing.org (please name the folder with your team name).
Teams participating in more than one track can submit one or two corresponding technical reports depending on whether the systems for each sub-task are sufficiently different.
The technical report paper length should be up to 9 pages, but not shorter than 5 pages, and we encourage you to write longer papers (up to 9 pages) while the contents justifies the length.
All technical report submissions should be in single column CEUR format. Authors should use one of the CEUR Templates below:
Word and Latex: http://ceur-ws.org/Vol-XXX/CEURART.zip
Choose the right track to submit the paper:
We will use Accuracy and F1 for ranking of the results.
ROC-AUC will also be posted for teams where confidence scores are provided but will not be defining in the final ranking.
Results and Ranking
oksana (dot) vittmann (at) gmail (dot) com
Master student, Center for Computing Research (CIC), Instituto Politécnico Nacional (IPN), Mexico.
Hamza Imam Amjad
Master student, Moscow Institute of Physics and Technology, Russia.
Contact the program committee via email: