NLPCC 2026 Share Task 6
NLPCC 2026 Share Task 6: The Second Shared Task on LLM-Generated Text Detection
Macau, China
November 3-5, 2026
Registration

To register, please fill out the Shared Task Registration Form: https://alidocs.dingtalk.com/notable/share/form/v018K4nyeZ1wGK0YnLb_dv19yqvsgs3oebp3pcjys_bUfwzOn

Should you have any questions, please feel free to contact us at: nlp2ct.junchao@gmail.com

Latest News

  • [ 2025.05.08 ] 🔄 We have updated the dataset to unify the label "MGT" to "LGT" to match the documentation. No other changes were made. The original MGT in the training set is equivalent to LGT (label=2). We apologize for any inconvenience and confusion caused.
  • [ 2025.04.15 ] 💡 We have released the detailed task guidelines and training data ~
  • [ 2025.03.20 ] 🔥 Registration for NLPCC-2026 Task 6 is now open. Welcome to join us!

Introduction

The rapid development of large language models (LLMs) has given rise to a series of challenges, including the generation of disinformation, the spread of harmful content, and various forms of misuse. Against this backdrop, the efficient discrimination between LLM-generated text and human-written text has become an urgent and critical research issue in the field of natural language processing (NLP). While remarkable progress has been made, relevant research has largely focused on English, systematic and technical exploration for the Chinese remain scarce. This shared task aims to fill this gap, build more robust Chinese LLM-generated text detectors, and advance research and real-world applications in this field within the Chinese.

Following the success of the 1st Shared Task on LLM-Generated Text Detection (NLPCC 2025), the 2nd Shared Task on LLM-Generated Text Detection in 2026 features significant upgrades: the task formulation has been expanded from binary to ternary classification. Specifically, in addition to distinguishing between human-written text and LLM-generated text, a new category for identifying LLM-refined text has been introduced, which better aligns with real-world application scenarios of LLMs. Participating teams are required to design and implement text detection algorithms based on the training data provided to achieve accurate classification and will undergo rigorous stress testing.

Dataset

The training set for this task is primarily sampled and adapted from the CUDRT (TIST 2026) dataset (Chinese subset: Complete 25 ratio). The test set will be released in the later stages of the competition for teams to verify the effectiveness of their methods and evaluate final performance. The evaluation dataset is extended and constructed based on the DetectRL benchmark (NeurIPS 2024) framework, containing multiple generation models and domain data to ensure the authenticity and challenge of the evaluation scenarios.

Training Set

Contains data from 4 types of LLMs and 2 domains. Specifically, data sources include news and academic writing, and generation models include GPT-4, Qwen, ChatGLM, and Baichuan. The training set contains a total of 20,370 pairs of samples. Each pair contains four fields: "ID", "HWT", "LGT", and "HLT". "ID" identifies the sample number, while "HWT", "LGT", and "HLT" represent "Human-written text", "LLM-generated text", and "LLM-refined text" respectively.

Data Download

The training data can be found at the following links:

Data Restrictions

To support the development of detection systems, participants are allowed to perform data processing and augmentation based on the provided training data, but no external additional new datasets may be introduced.

Evaluation Metrics

The official evaluation metric for this task is the macro-averaged F1-Score.

Submission and Evaluation

The submission platform for this evaluation task (tentatively Alibaba Cloud Tianchi) will be launched one week before the release of the test data for Phase 1 evaluation (open evaluation, leaderboard verification); and subsequently Phase 2 evaluation (closed evaluation, final ranking). Specific details of the test result text are as follows:

1. Test Result File

  • Your test result file must be a JSON file containing all samples.
  • Please ensure that the text and ID fields remain unchanged.
  • Each sample in the JSON file should contain the following fields: "ID", "text", and "label" (HWT: 0, LGT: 1, HLT: 2).

2. Code and Data

  • The code folder should contain all the code required for data augmentation, data processing, model training, and model inference, as well as the complete dataset used to train the detector.
  • Please include a simple README.md and an environment configuration file (e.g., requirements.txt).

3. Technical Report

The technical report should describe in detail the methods used to solve the task. All solutions will be handled confidentially and deleted uniformly after the NLPCC 2026 conference.

Tentative Schedule

March 20, 2026Shared task announcement and call for participation
March 20, 2026Registration opens
April 15, 2026Release of detailed task guidelines and training data
May 25, 2026Registration deadline
June 11, 2026Test data release
June 20, 2026Deadline for participants to submit results
June 30, 2026Evaluation results released and call for system reports and conference papers

Organizers

This shared task is jointly organized by the Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory (NLP2CT) at the University of Macau, Central China Normal University, and Alibaba Cloud.

  • Junchao Wu | University of Macau (Contact)nlp2ct.junchao@gmail.com | Homepage
  • Derek Fai Wong | University of Macau | Homepage
  • Runzhe Zhan | University of Macau | Homepage
  • Zeyu Wu | University of Macau
  • Zhiwen Xie | University of Macau / Central China Normal University
  • Yichao Du | Alibaba Cloud
  • Longyue Wang | Alibaba Cloud | Homepage

FAQ

Q: Where can I register for this shared task? Can team registration be updated?

A: Fill out the following link to register. Before the registration deadline, you can fill out this link repeatedly to update team registration information.

Q: Is there any restriction on the number of model parameters or data augmentation methods? If an ensemble approach is used, are there any constraints on the total number of parameters?

A: NLPCC 2026 Share Task 6 places no restrictions on the number of parameters for a single model, nor on the total number of parameters after ensemble. There are also no restrictions on data augmentation methods or models used. You may use open-source LLMs or closed-source APIs for text paraphrasing, rewriting, and other data-augmentation operations; these are fully allowed under the competition rules.

Q: The training set is sourced from news and academic writing, using four generation models. Will the subsequent test data also consist of these two sources and these four models, or will it be expanded? What is the generation method for test data? Will it contain formatting artifacts (e.g., "\n") or incomplete texts?

A: As stated in the README under the Dataset section, the test set will be constructed using out-of-distribution data from another dataset (DetectRL benchmark, NeurIPS 2024). It will no longer be limited to the news and academic domains or the four generation models covered in the training set. The test data may introduce texts from unknown domains and unknown generation models, including different data generation schemes, to conduct multi-dimensional stress testing and comprehensively evaluate the detector's actual performance in real-world application scenarios. Test-set LGT text may not use the same front-25%-token continuation as the training set; even if continuation is used, the prefix tokens will be cleaned, resulting in higher overall quality. The test set undergoes strict preprocessing and will not retain redundant formatting symbols such as "\n". It will also avoid abrupt text truncation, though it will not force every text to end with a period, preserving natural forms such as byline signatures to test real-world robustness.

Q: Training data quality issues: some samples contain pure English text labeled as HLT; some end with "123" with preceding text identical to the human text; and the README mentions LGT while the dataset uses MGT. How should these be understood?

A: The pure English samples labeled as HLT and the ~2,000 samples ending with "123" (with preceding text identical to the human text) are both native noise from the CUDRT dataset; no additional cleaning was performed. Teams are expected to design their own data-cleaning strategies. Regarding the label inconsistency: in the current training set, MGT is equivalent to LGT (label=2). We will update the dataset in a subsequent release to uniformly correct MGT to LGT, along with an official correction notice.

Q: Can we use pseudo-labeling on the test set data (i.e., running the test data through our model to obtain pseudo-labels and using them as expanded training data)?

A: No, it is NOT allowed to use test set data for pseudo-labeling. You cannot add any test set samples with predicted pseudo-labels into your training data for model optimization. This practice is not applicable in real-world application scenarios, so we do not permit such operations in the competition. The official test set is strictly for final-submission evaluation only. Any form of using test set data for training, fine-tuning, pseudo-label construction, or data augmentation is considered a violation of the competition rules. You may only use the officially provided training set. Methods to expand the training set are limited to the announced legal data augmentation methods, and no new data sources may be explicitly introduced.

References

If your research uses relevant datasets, or if this task is helpful to you, please consider citing the following literature:

DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios
Wu, Junchao; Zhan, Runzhe; Wong, Derek F; Yang, Shu; Yang, Xinyi; Yuan, Yulin; Chao, Lidia S · Advances in Neural Information Processing Systems · 2024 · pp. 100369–100401
@article{wu2024detectrl,
  title={Detectrl: Benchmarking llm-generated text detection in real-world scenarios},
  author={Wu, Junchao and Zhan, Runzhe and Wong, Derek F and Yang, Shu and Yang, Xinyi and Yuan, Yulin and Chao, Lidia S},
  journal={Advances in Neural Information Processing Systems},
  volume={37},
  pages={100369--100401},
  year={2024}
}
Overview of the NLPCC 2025 Shared Task 1: LLM-Generated Text Detection
Wu, Junchao; Zhan, Runzhe; Wang, Qianli; Yuan, Yulin; Chao, Lidia S; Wong, Derek F · CCF International Conference on Natural Language Processing and Chinese Computing · 2025 · pp. 263–274
@inproceedings{wu2025overview,
  title={Overview of the NLPCC 2025 Shared Task 1: LLM-Generated Text Detection},
  author={Wu, Junchao and Zhan, Runzhe and Wang, Qianli and Yuan, Yulin and Chao, Lidia S and Wong, Derek F},
  booktitle={CCF International Conference on Natural Language Processing and Chinese Computing},
  pages={263--274},
  year={2025},
  organization={Springer}
}
Toward Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT
Tao, Zhen; Chen, Yanfang; Xi, Dinghao; Li, Zhiyu; Xu, Wei · ACM Transactions on Intelligent Systems and Technology · 2026 · Vol. 17, No. 2 · pp. 1–35
@article{tao2026toward,
  title={Toward Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT},
  author={Tao, Zhen and Chen, Yanfang and Xi, Dinghao and Li, Zhiyu and Xu, Wei},
  journal={ACM Transactions on Intelligent Systems and Technology},
  volume={17},
  number={2},
  pages={1--35},
  year={2026}
}
NLPCC 2026 共享任务6: 第二届大语言模型生成文本检测评测
Macau, China
November 3-5, 2026
报名

报名请填写共享任务报名表: https://alidocs.dingtalk.com/notable/share/form/v018K4nyeZ1wGK0YnLb_dv19yqvsgs3oebp3pcjys_bUfwzOn 如有任何疑问,欢迎随时联系: nlp2ct.junchao@gmail.com

最新消息

  • [ 2025.05.08 ] 🔄 我们更新了数据集,将数据内的 MGT 统一修正为 LGT,以和文档对应。其他没有任何改变,原来训练集中的 MGT 即是 LGT(标签为 2)。带来不便和困惑表示抱歉。
  • [ 2025.04.15 ] 💡 我们已经发布了具体的任务指引和训练数据 ~
  • [ 2025.03.20 ] 🔥 NLPCC-2026 Task 6 任务的注册已经开放,欢迎报名!

简介

大型语言模型(LLM)的快速发展带来了虚假信息生成、有害内容传播、滥用误用等一系列严峻挑战。在此背景下,高效区分 LLM 生成文本与人类原创文本,已成为自然语言处理领域亟待解决的重要课题。当前,LLM 生成文本检测技术已取得显著进展,然而相关研究多集中于英文场景,面向中文的系统性研究与技术探索仍较为匮乏。本共享任务旨在弥补这一研究缺口,构建性能更强的中文 LLM 生成文本检测模型,推动该领域在中文语境下的研究与应用落地。继第一届 LLM 生成文本检测共享任务(NLPCC2025,新疆)成功举办,2026 第二届 LLM 生成文本检测共享任务在首届基础上实现重要升级:任务形式从二分类扩展为三分类,即在区分人类原创(Human-Written)与 LLM 原生生成(LLM-Generated)文本的基础上,新增对 LLM 优化润色(LLM-Refined)文本的识别,更贴合大型语言模型的实际应用场景。参赛队伍需基于组委会提供的原始训练数据,设计并实现文本检测算法,实现对不同来源文本的精准判别,并接受压力测试。

数据集

本次任务的训练集主要采样和改编自 CUDRT(TIST 2026)公开数据集(中文子集:基于前25%token续写));测试集将在赛事后期陆续发布,供参赛队伍验证方法有效性并评估最终性能。评测数据集在 DetectRL 基准(NeurIPS 2024)框架基础上扩展构建,包含多种生成模型和领域数据,以保证评测场景的真实性与挑战性。

训练集

包含来自4种大语言模型和2个领域的数据。具体来说,数据来源包括新闻和学术写作,生成模型包括 GPT-4、Qwen、ChatGLM 和 Baichuan。训练集总共包含 20,370 对样本。每对样本包含“ID”, “HWT”, “LGT”, “HLT” 四个字段,ID标识每条样本编号,“HWT”, “LGT”, 和“HLT” 分别代表“人类写作的样本”,“大模型生成的样本”,“大模型增强的样本”。

数据下载

训练数据可以在以下链接中找到:

数据限制

为了支持检测系统的开发,允许参赛者在提供的训练数据基础上进行数据处理和增强,但不得引入外部额外的、新的数据集。

评估指标

本次任务的官方评估指标是宏平均 F1-Score (macro-averaged F1-Score)。

提交与评测

本评测任务的提交平台(暂定为阿里云天池)将在测试数据发布前一周上线,进行第一阶段评测(开放评测,排行榜验证);随后进行第二阶段评测(封闭评测,最终排名)。测试结果文本的具体细节如下:

1. 测试结果文件

  • 您的测试结果文件必须是一个包含所有样本的 JSON 文件。
  • 请确保文本和 ID 字段保持不变。
  • JSON 文件中的每个样本应包含以下字段:“ID”、“text”和“label”(规则:HWT 为 0,LGT 为 1,HLT 为 2)。

2. 代码和数据

  • 代码文件夹应包含数据增强、数据处理、模型训练和模型推理所需的所有代码,以及用于训练检测器的完整数据集。
  • 请附带一个简单的 README.md 和环境配置文件(如 requirements.txt)。

3. 技术报告

技术报告应详细描述用于解决任务的方法。所有方案都将保密处理,并在 NLPCC 2026 会议结束后统一删除(不存档)。

初步日程 (Tentative Schedule)

2026年3月20日共享任务发布及参赛征集
2026年3月20日报名开始
2026年4月15日发布详细的任务指南和训练数据
2026年5月25日报名截止
2026年6月11日测试数据发布
2026年6月20日参赛队伍提交评测结果截止
2026年6月30日公布评测结果,征集系统报告和会议论文

组织者

本共享任务由澳门大学自然语言处理与葡汉机器翻译实验室(NLP2CT)、华中师范大学以及阿里云联合主办。

常见问题 (FAQ)

问:我可以在哪里报名参加这个共享任务?团队报名信息可以更新吗?

答:请填写以下链接进行报名。在报名截止日期之前,您可以重复填写此链接以更新团队报名信息。

问:本次任务对模型参数量、数据增强方法或所使用的模型是否有任何限制?如果进行模型集成,对总参数量有没有要求?

答:本次 NLPCC 2026 Share Task 6 任务对单模型参数量没有任何限制,同时若采用模型集成方案,对集成后的总参数量也无相关参数约束。允许使用开源大模型、闭源 API 进行文本释义、改写等数据增强操作,此类操作完全符合比赛规则。

问:训练集的数据来源是新闻和学术写作,所用为四种生成模型。后续测试数据中也是这两种来源和这四种模型吗,还是会扩充?测试数据是否采用与训练集相同的续写方式生成?是否会包含格式符号或不完整文本?

答:正如 README 中【数据集】部分所述,测试集将采用另一个数据集的分布外数据(DetectRL 基准,NeurIPS 2024)扩展构建,不再局限于训练集所涵盖的新闻、学术领域以及四类生成模型。测试数据将可能引入未知领域文本与未知生成模型,包括不同的数据生成方案,进行多维度压力测试,从而全面客观地综合评估检测器在真实应用场景下的实际性能表现。测试集 LGT 文本不一定采用训练集同款前 25% token 续写方式;即便采用续写生成,也会对前置 token 做清洗,整体测试集质量更高。测试集会经过严格预处理清洗,不会保留 "\n" 等冗余格式符号。测试集基本不会出现文本突然中断的情况,但不会强制要求均以句号完整收尾,会保留如新闻末尾署名标注等现实场景自然存在的文本形式,用于检验模型真实鲁棒性。

问:训练集中存在纯英文文本且标签为 HLT、约 2000 条以 "123" 结尾且前文与人类文本相同的样本,且 README 中标注为 LGT 而实际数据集为 MGT,这些应如何理解?

答:训练集中出现的纯英文文本标签为 HLT 的样本,以及约 2000 条以 "123" 结尾、前文与人类文本重复的样本,均为 CUDRT 数据集原生噪声,我方未做额外清洗处理。数据清洗工作由各参赛队伍自行设计方案处理。关于标签不一致:训练集中的 MGT 即为 LGT(标签为 2),后续会更新数据集,将数据内的 MGT 统一修正为 LGT,并同步发布官方更正通知。

问:本次任务是否允许使用伪标签,即测试集的数据能否用我们的模型推理一遍,将预测结果作为伪标签加入训练集以扩充训练数据?

答:本次任务不允许使用测试集数据进行伪标签操作。任何形式的利用测试集样本及预测伪标签加入训练数据进行模型优化的行为均不被允许。因为在真实应用场景中,利用测试集数据进行伪标签的做法并不成立,所以在本次比赛中也不允许此类操作。官方测试集仅用于最终提交的评测。任何形式的利用测试集数据进行训练、微调、伪标签构建或数据增强的行为,将被视为违反比赛规则。参赛队伍仅可使用官方提供的训练集。训练集的扩展方式仅限于已公布的合法的数据增强方法,不可显式引入新的数据来源。

参考文献

如果您的研究使用了相关数据集,或者本任务对您有所帮助,请考虑引用以下文献:

DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios
Wu, Junchao; Zhan, Runzhe; Wong, Derek F; Yang, Shu; Yang, Xinyi; Yuan, Yulin; Chao, Lidia S · Advances in Neural Information Processing Systems · 2024 · pp. 100369–100401
@article{wu2024detectrl,
  title={Detectrl: Benchmarking llm-generated text detection in real-world scenarios},
  author={Wu, Junchao and Zhan, Runzhe and Wong, Derek F and Yang, Shu and Yang, Xinyi and Yuan, Yulin and Chao, Lidia S},
  journal={Advances in Neural Information Processing Systems},
  volume={37},
  pages={100369--100401},
  year={2024}
}
Overview of the NLPCC 2025 Shared Task 1: LLM-Generated Text Detection
Wu, Junchao; Zhan, Runzhe; Wang, Qianli; Yuan, Yulin; Chao, Lidia S; Wong, Derek F · CCF International Conference on Natural Language Processing and Chinese Computing · 2025 · pp. 263–274
@inproceedings{wu2025overview,
  title={Overview of the NLPCC 2025 Shared Task 1: LLM-Generated Text Detection},
  author={Wu, Junchao and Zhan, Runzhe and Wang, Qianli and Yuan, Yulin and Chao, Lidia S and Wong, Derek F},
  booktitle={CCF International Conference on Natural Language Processing and Chinese Computing},
  pages={263--274},
  year={2025},
  organization={Springer}
}
Toward Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT
Tao, Zhen; Chen, Yanfang; Xi, Dinghao; Li, Zhiyu; Xu, Wei · ACM Transactions on Intelligent Systems and Technology · 2026 · Vol. 17, No. 2 · pp. 1–35
@article{tao2026toward,
  title={Toward Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT},
  author={Tao, Zhen and Chen, Yanfang and Xi, Dinghao and Li, Zhiyu and Xu, Wei},
  journal={ACM Transactions on Intelligent Systems and Technology},
  volume={17},
  number={2},
  pages={1--35},
  year={2026}
}