NLPCC 2026 - Share Task 6 官网

NLPCC 2026 Share Task 6: The Second Shared Task on LLM-Generated Text Detection

Macau, China

November 3-5, 2026

To register, please fill out the Shared Task Registration Form: https://alidocs.dingtalk.com/notable/share/form/v018K4nyeZ1wGK0YnLb_dv19yqvsgs3oebp3pcjys_bUfwzOn

Should you have any questions, please feel free to contact us at: nlp2ct.junchao@gmail.com

Latest News

[ 2025.06.22 ] 📢 Final leaderboard released! See Results section. Call for system papers is now open — see Call for Papers for details.
[ 2025.06.16 ] 🏆 Phase 2 (closed evaluation) is now open! The test data is available. Submission deadline: June 20, 2026, 23:59 (GMT+8).
[ 2025.06.12 ] 🏆 Phase 2 will open on June 16. The Phase 1 submission deadline has been extended to June 15, 2026, 23:59 (GMT+8).
[ 2025.06.05 ] 🚀 Phase 1 (open evaluation) is now open! The test data and Codabench evaluation platform are available.
[ 2025.05.08 ] 🔄 We have updated the dataset to unify the label "MGT" to "LGT" to match the documentation. No other changes were made. The original MGT in the training set is equivalent to LGT (label=2). We apologize for any inconvenience and confusion caused.
[ 2025.04.15 ] 💡 We have released the detailed task guidelines and training data ~
[ 2025.03.20 ] 🔥 Registration for NLPCC-2026 Task 6 is now open. Welcome to join us!

Introduction

The rapid development of large language models (LLMs) has given rise to a series of challenges, including the generation of disinformation, the spread of harmful content, and various forms of misuse. Against this backdrop, the efficient discrimination between LLM-generated text and human-written text has become an urgent and critical research issue in the field of natural language processing (NLP). While remarkable progress has been made, relevant research has largely focused on English, systematic and technical exploration for the Chinese remain scarce. This shared task aims to fill this gap, build more robust Chinese LLM-generated text detectors, and advance research and real-world applications in this field within the Chinese.

Following the success of the 1st Shared Task on LLM-Generated Text Detection (NLPCC 2025), the 2nd Shared Task on LLM-Generated Text Detection in 2026 features significant upgrades: the task formulation has been expanded from binary to ternary classification. Specifically, in addition to distinguishing between human-written text and LLM-generated text, a new category for identifying LLM-refined text has been introduced, which better aligns with real-world application scenarios of LLMs. Participating teams are required to design and implement text detection algorithms based on the training data provided to achieve accurate classification and will undergo rigorous stress testing.

Dataset

The training set for this task is primarily sampled and adapted from the CUDRT (TIST 2026) dataset (Chinese subset: Complete 25 ratio). The Phase 1 and Phase 2 test sets are derived from the Chinese split of the DetectRL-X benchmark (ACL 2026). The evaluation dataset is extended and constructed based on the DetectRL benchmark (NeurIPS 2024) framework, containing multiple generation models and domain data to ensure the authenticity and challenge of the evaluation scenarios.

Training Set

Contains data from 4 types of LLMs and 2 domains. Specifically, data sources include news and academic writing, and generation models include GPT-4, Qwen, ChatGLM, and Baichuan. The training set contains a total of 20,370 pairs of samples. Each pair contains four fields: "ID", "HWT", "LGT", and "HLT". "ID" identifies the sample number, while "HWT", "LGT", and "HLT" represent "Human-written text", "LLM-generated text", and "LLM-refined text" respectively.

Data Download

The training data can be found at the following links:

Github: Data Folder
Phase 1 Test Data: testp1.json
Phase 2 Test Data: testp2.json

The fully labeled test data (Phase 1 & Phase 2) and official scoring scripts are now publicly available for result analysis, ablation experiments, and baseline reproduction.

Data Restrictions

To support the development of detection systems, participants are allowed to perform data processing and augmentation based on the provided training data, but no external additional new datasets may be introduced.

Evaluation Metrics

The official evaluation metric for this task is the macro-averaged F1-Score.

Submission and Evaluation

The submission platform for this evaluation task is Codabench. Phase 2 (closed evaluation, final ranking) is now open. The submission deadline is June 20, 2026, 23:59 (GMT+8). Each team may submit up to 100 times (the final ranking will be based exclusively on the last submission, enforced by Force Last). Scores will not be available during the submission period; the final leaderboard will be publicly released by June 22, 2026.

Each team must submit the following three items packaged together. Submissions missing any component will be considered invalid. To facilitate automated compliance checks, the three items must be named exactly as follows:

1. Test Results File

Must be named prediction.json. A JSON file covering all test samples, keeping id unchanged.
Each sample must include id and label (HWT → 0, LGT → 1, HLT → 2).

2. Code and Data

Must be named code_and_data.zip. All source code for data augmentation, processing, model training, and inference, along with the complete training dataset.
Include a README.md and an environment configuration file (e.g., requirements.txt).

3. Technical Report

Must be named technical_report.pdf. A detailed description of your methodology. There is no language restriction; you may use either Chinese or English. All reports will be kept strictly confidential and deleted after NLPCC 2026 (not archived).

Account Policy

Each team may use only one account, and the account name must match the System Name (ID) registered during sign-up. If your Codabench account name does not match your registered System Name (ID), simply name the outermost ZIP archive with your registered System Name (ID) and ensure you submit using the exact same email address you provided during registration. Our team will verify your submission by cross-referencing your registered email and the ZIP archive name.
For packaging and format details, refer to the Submission guidelines under Get Started on Codabench.

Competition Rules

The following rules are strictly enforced. Violations will result in disqualification.

No use of test set data for development — Using Phase 1 or Phase 2 test set samples in any form to develop or optimize your detector is strictly prohibited. This includes, but is not limited to, pseudo-labeling and unsupervised methods applied to the test set. Only the officially provided training set may be used, extended exclusively through legitimate data augmentation methods based on the training set. Introducing new external data sources is not allowed.

Single-text prediction only — Each prediction must take a single text as input and output its label. Batch inference across the entire dataset is not permitted; the approach must reflect real-world applicability.

No feature extraction on test set — Rule-based methods may extract features from the training set only. Teams using such approaches must provide a statistical script demonstrating that the features are statistically observable in the training set. If rules cannot be verified on the training set, the submission will be disqualified.

No external data or web access — Using retrieval-augmented generation (RAG), web search, or any form of external data source is prohibited. Calling built-in LLM APIs for text classification is permitted, provided no external information is introduced.

LLM API logging — If your approach involves calling LLM APIs for detection, you must retain complete Q&A logs for potential audit.

Tentative Schedule

March 20, 2026	Shared task announcement and call for participation
March 20, 2026	Registration opens
April 15, 2026	Release of detailed task guidelines and training data
May 25, 2026	Registration deadline
June 5, 2026	Phase 1 test data release (open evaluation)
June 15, 2026	Phase 1 submission deadline (extended)
June 16, 2026	Phase 2 test data release (closed evaluation)
June 20, 2026	Phase 2 submission deadline
June 22, 2026	Final leaderboard released and call for system reports and conference papers
July 17, 2026	System paper submission deadline

Results

This shared task received registrations from 41 teams, making it the most highly subscribed shared task across all NLPCC 2026 shared tasks. 29 teams completed registration on the Codabench platform, 23 teams submitted valid results for Phase 1, and 13 teams participated in Phase 2.

Rank	Team	Affiliation	Macro F1
1	Evil-Detect	Institute of Information Engineering, CAS	0.8888
2	Tele_StrarAGI	Star General AI Lab, China Telecom AI Company	0.8393
3	Xiwangdoudui	Zhengzhou University	0.7745
4	ZZUNLP_Zhao	Zhengzhou University	0.7475
5	CEDAR	Beijing Institute of Technology	0.7299

Full rankings are available on Codabench.

Call for Papers

Submissions for system and conference papers are now open. This task has a total quota of 7 invited paper slots for publication in the NLPCC 2026 conference proceedings (Springer LNAI series).

Invited Submissions: The top 5 teams are formally invited to submit system papers describing their methodologies.

Open Submissions: 2 additional paper slots are open to all eligible registered teams for original methodological or analytical papers.

Eligibility: Only teams that submitted valid results in Phase 1 or Phase 2 qualify.

Paper Guidelines

Language: English only.
Format: LNCS format (LaTeX or Word templates).
Page limit: 12 pages including references and appendices.
File format: PDF.
Submission portal: OpenReview.
Deadline: July 17, 2026, 03:59 PM UTC-0 (please confirm with the official NLPCC announcement).
Official guidelines: http://tcci.ccf.org.cn/conference/2026/shared-tasks/.

Important: An invitation does not guarantee acceptance. The organizing committee reserves the right to replace invited papers with high-quality open submissions if quality standards are not met.

Organizers

This shared task is jointly organized by the Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory (NLP2CT) at the University of Macau, Central China Normal University, and Alibaba Cloud.

Junchao Wu | University of Macau （Contact）nlp2ct.junchao@gmail.com | Homepage
Derek Fai Wong | University of Macau | Homepage
Runzhe Zhan | University of Macau | Homepage
Zeyu Wu | University of Macau
Zhiwen Xie | University of Macau / Central China Normal University
Yichao Du | Alibaba Cloud
Longyue Wang | Alibaba Cloud | Homepage

FAQ

Q: Where can I register for this shared task? Can team registration be updated?

A: Fill out the following link to register. Before the registration deadline, you can fill out this link repeatedly to update team registration information.

Q: Is there any restriction on the number of model parameters or data augmentation methods? If an ensemble approach is used, are there any constraints on the total number of parameters?

A: NLPCC 2026 Share Task 6 places no restrictions on the number of parameters for a single model, nor on the total number of parameters after ensemble. There are also no restrictions on data augmentation methods or models used. You may use open-source LLMs or closed-source APIs for text paraphrasing, rewriting, and other data-augmentation operations; these are fully allowed under the competition rules.

Q: The training set is sourced from news and academic writing, using four generation models. Will the subsequent test data also consist of these two sources and these four models, or will it be expanded? What is the generation method for test data? Will it contain formatting artifacts (e.g., "\n") or incomplete texts?

A: As stated in the README under the Dataset section, the test set will be constructed using out-of-distribution data from another dataset (DetectRL benchmark, NeurIPS 2024). It will no longer be limited to the news and academic domains or the four generation models covered in the training set. The test data may introduce texts from unknown domains and unknown generation models, including different data generation schemes, to conduct multi-dimensional stress testing and comprehensively evaluate the detector's actual performance in real-world application scenarios. Test-set LGT text may not use the same front-25%-token continuation as the training set; even if continuation is used, the prefix tokens will be cleaned, resulting in higher overall quality. The test set undergoes strict preprocessing and will not retain redundant formatting symbols such as "\n". It will also avoid abrupt text truncation, though it will not force every text to end with a period, preserving natural forms such as byline signatures to test real-world robustness.

Q: Training data quality issues: some samples contain pure English text labeled as HLT; some end with "123" with preceding text identical to the human text; and the README mentions LGT while the dataset uses MGT. How should these be understood?

A: The pure English samples labeled as HLT and the ~2,000 samples ending with "123" (with preceding text identical to the human text) are both native noise from the CUDRT dataset; no additional cleaning was performed. Teams are expected to design their own data-cleaning strategies. Regarding the label inconsistency: in the current training set, MGT is equivalent to LGT (label=2). We will update the dataset in a subsequent release to uniformly correct MGT to LGT, along with an official correction notice.

Q: Can we use pseudo-labeling on the test set data (i.e., running the test data through our model to obtain pseudo-labels and using them as expanded training data)?

A: No, it is NOT allowed to use test set data for pseudo-labeling. You cannot add any test set samples with predicted pseudo-labels into your training data for model optimization. This practice is not applicable in real-world application scenarios, so we do not permit such operations in the competition. The official test set is strictly for final-submission evaluation only. Any form of using test set data for training, fine-tuning, pseudo-label construction, or data augmentation is considered a violation of the competition rules. You may only use the officially provided training set. Methods to expand the training set are limited to the announced legal data augmentation methods, and no new data sources may be explicitly introduced.

Q: For prediction, can we input the entire dataset at once to output labels, or must we input single texts individually?

A: Prediction must be performed by inputting single texts and outputting labels individually. This requirement ensures that the model is truly usable in real-world application scenarios.

Q: Can we use uncleaned features (e.g., formatting artifacts) for rule-based classification?

A: Rule-based classification using features extracted from the training set is allowed, but extracting features from the test set is prohibited. Teams that adopt such rule-based approaches must provide a statistical script to demonstrate that the features can be statistically observed in the training set. If the rules cannot be verified on the training set, the results will be disqualified — use this approach with caution.

Q: Can we use built-in LLM APIs with internet-connected search (e.g., RAG) for classification?

A: Using built-in LLM APIs for text classification is permitted. However, using RAG (Retrieval-Augmented Generation) or any form of internet-connected querying is prohibited, as it violates the rule against introducing external data sources.

References

If your research uses relevant datasets, or if this task is helpful to you, please consider citing the following literature:

DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios

Wu, Junchao; Zhan, Runzhe; Wong, Derek F; Yang, Shu; Yang, Xinyi; Yuan, Yulin; Chao, Lidia S · Advances in Neural Information Processing Systems · 2024 · pp. 100369–100401

@article{wu2024detectrl,
  title={Detectrl: Benchmarking llm-generated text detection in real-world scenarios},
  author={Wu, Junchao and Zhan, Runzhe and Wong, Derek F and Yang, Shu and Yang, Xinyi and Yuan, Yulin and Chao, Lidia S},
  journal={Advances in Neural Information Processing Systems},
  volume={37},
  pages={100369--100401},
  year={2024}
}

DetectRL-X: Benchmarking LLM-Generated Text Detection in Cross-Domain, Cross-Lingual, and Cross-Model Scenarios

Wu, Junchao; Zhan, Runzhe; Wong, Derek F.; Yang, Shu; Yang, Xinyi; Yuan, Yulin; Chao, Lidia S. · Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics · 2026 · ACL 2026

@inproceedings{wu2026detectrlx,
  title={DetectRL-X: Benchmarking LLM-Generated Text Detection in Cross-Domain, Cross-Lingual, and Cross-Model Scenarios},
  author={Wu, Junchao and Zhan, Runzhe and Wong, Derek F. and Yang, Shu and Yang, Xinyi and Yuan, Yulin and Chao, Lidia S.},
  booktitle={Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics},
  year={2026},
  note={ACL 2026}
}

Overview of the NLPCC 2025 Shared Task 1: LLM-Generated Text Detection

Wu, Junchao; Zhan, Runzhe; Wang, Qianli; Yuan, Yulin; Chao, Lidia S; Wong, Derek F · CCF International Conference on Natural Language Processing and Chinese Computing · 2025 · pp. 263–274

@inproceedings{wu2025overview,
  title={Overview of the NLPCC 2025 Shared Task 1: LLM-Generated Text Detection},
  author={Wu, Junchao and Zhan, Runzhe and Wang, Qianli and Yuan, Yulin and Chao, Lidia S and Wong, Derek F},
  booktitle={CCF International Conference on Natural Language Processing and Chinese Computing},
  pages={263--274},
  year={2025},
  organization={Springer}
}

Toward Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT

Tao, Zhen; Chen, Yanfang; Xi, Dinghao; Li, Zhiyu; Xu, Wei · ACM Transactions on Intelligent Systems and Technology · 2026 · Vol. 17, No. 2 · pp. 1–35

@article{tao2026toward,
  title={Toward Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT},
  author={Tao, Zhen and Chen, Yanfang and Xi, Dinghao and Li, Zhiyu and Xu, Wei},
  journal={ACM Transactions on Intelligent Systems and Technology},
  volume={17},
  number={2},
  pages={1--35},
  year={2026}
}

NLPCC 2026 共享任务6: 第二届大语言模型生成文本检测评测

Macau, China

November 3-5, 2026

报名

报名请填写共享任务报名表： https://alidocs.dingtalk.com/notable/share/form/v018K4nyeZ1wGK0YnLb_dv19yqvsgs3oebp3pcjys_bUfwzOn 如有任何疑问，欢迎随时联系： nlp2ct.junchao@gmail.com

简介

大型语言模型（LLM）的快速发展带来了虚假信息生成、有害内容传播、滥用误用等一系列严峻挑战。在此背景下，高效区分 LLM 生成文本与人类原创文本，已成为自然语言处理领域亟待解决的重要课题。当前，LLM 生成文本检测技术已取得显著进展，然而相关研究多集中于英文场景，面向中文的系统性研究与技术探索仍较为匮乏。本共享任务旨在弥补这一研究缺口，构建性能更强的中文 LLM 生成文本检测模型，推动该领域在中文语境下的研究与应用落地。继第一届 LLM 生成文本检测共享任务（NLPCC2025，新疆）成功举办，2026 第二届 LLM 生成文本检测共享任务在首届基础上实现重要升级：任务形式从二分类扩展为三分类，即在区分人类原创（Human-Written）与 LLM 原生生成（LLM-Generated）文本的基础上，新增对 LLM 优化润色（LLM-Refined）文本的识别，更贴合大型语言模型的实际应用场景。参赛队伍需基于组委会提供的原始训练数据，设计并实现文本检测算法，实现对不同来源文本的精准判别，并接受压力测试。

数据集

本次任务的训练集主要采样和改编自 CUDRT（TIST 2026）公开数据集（中文子集：基于前25%token续写））；第一、二阶段测试集均取自 ACL 2026 录用论文 DetectRL-X 的中文数据集。评测数据集在 DetectRL 基准（NeurIPS 2024）框架基础上扩展构建，包含多种生成模型和领域数据，以保证评测场景的真实性与挑战性。

训练集

包含来自4种大语言模型和2个领域的数据。具体来说，数据来源包括新闻和学术写作，生成模型包括 GPT-4、Qwen、ChatGLM 和 Baichuan。训练集总共包含 20,370 对样本。每对样本包含“ID”, “HWT”, “LGT”, “HLT” 四个字段，ID标识每条样本编号，“HWT”, “LGT”, 和“HLT” 分别代表“人类写作的样本”，“大模型生成的样本”，“大模型增强的样本”。

数据下载

训练数据可以在以下链接中找到：

Github: 数据文件夹
第一阶段测试数据: testp1.json
第二阶段测试数据: testp2.json

带标签的完整测试数据（第一阶段和第二阶段）及官方打分脚本已开源，可用于结果分析、对照实验和基线复现。

数据限制

为了支持检测系统的开发，允许参赛者在提供的训练数据基础上进行数据处理和增强，但不得引入外部额外的、新的数据集。

评估指标

本次任务的官方评估指标是宏平均 F1-Score (macro-averaged F1-Score)。

提交与评测

本评测任务的提交平台为 Codabench。第二阶段（封闭评测，最终排名）现已开启。提交截止时间为 2026年6月20日 23:59（GMT+8）。每队最多可提交 100 次（最终排名以最后一次提交的材料结果为准，系统已设定 Force Last 强制取最后一次提交）。提交期间无法查看分数；最终排行榜将于 2026年6月22日 前统一公布。期间第一阶段的公开评测通道将关闭。

每支队伍需将以下三项材料打包提交，缺少任何一项将视为无效结果。为方便打分脚本自动扫描合规性，三项材料须严格按以下名称命名：

1. 测试结果文件

须命名为 prediction.json。包含所有样本的 JSON 文件，保持 id 字段不变。
每个样本须包含 id、label 两个字段（HWT → 0，LGT → 1，HLT → 2）。

2. 代码与数据

须命名为 code_and_data.zip。数据增强、数据处理、模型训练及推理的全部源代码，以及完整的训练数据集。
需附带 README.md 和环境配置文件（如 requirements.txt）。

3. 技术报告

须命名为 technical_report.pdf。详述所采用的方法。不设语言限制，中文或英文均可。所有方案将严格保密，NLPCC 2026 会议结束后统一删除（不存档）。

账号要求

每队仅限使用一个账号提交，账号名称须与报名时的 System Name (ID) / 系统名称 (ID) 一致。若 Codabench 账号名称与已注册的系统名称 (ID) 不一致，只需将提交的最外层 ZIP 压缩包命名为已注册的系统名称 (ID)，并使用报名时注册的同一邮箱地址提交即可。我们将通过已注册邮箱与压缩包名称交叉核验成绩有效性。
打包格式等详见 Codabench Get Started → Submission 指引。

比赛规则

以下规则严格禁止，违反者将被取消成绩。

禁止使用测试集进行开发 — 不得以任意形式使用第一阶段或第二阶段测试集样本开发或优化检测器。包括但不限于伪标签、对测试集进行无监督方法等。仅可使用官方训练集，扩展方式仅限于合法的基于训练集的数据增强，不可引入新的外部数据源。

单文本预测 — 预测环节须以单个文本作为输入、输出标签，不得对整个数据集进行批量处理，须保证方法的实际可用性。

禁止对测试集提取特征 — 规则方法仅可在训练集上提取特征。采用此类方案的队伍须提供统计脚本，证明特征在训练集上可被统计性地观测到。若无法在训练集上验证，将取消成绩。

禁止外部数据与联网 — 禁止使用 RAG（检索增强生成）、联网搜索或任何形式的外部数据源。允许调用大模型 API 直接进行文本判别，但不得引入外部信息。

大模型 API 日志保留 — 如调用大模型 API 进行检测，须保留完整的问答日志，以备可能的核查。

初步日程 (Tentative Schedule)

2026年3月20日	共享任务发布及参赛征集
2026年3月20日	报名开始
2026年4月15日	发布详细的任务指南和训练数据
2026年5月25日	报名截止
2026年6月5日	第一阶段测试数据发布（开放评测）
2026年6月15日	第一阶段提交截止（已延长）
2026年6月16日	第二阶段测试数据发布（封闭评测）
2026年6月20日	第二阶段提交截止
2026年6月22日	最终排行榜公布，征集系统报告和会议论文
2026年7月17日	系统论文提交截止

成绩

本次共享任务共收到 41 支队伍 报名，为 NLPCC 2026 全部共享任务中报名人数最多的赛道。其中完成 Codabench 平台注册的队伍共 29 支，开放评测阶段提交有效成绩的队伍共 23 支，参与第二阶段封闭评测的队伍共 13 支。

名次	队伍	单位	宏平均 F1
1	Evil-Detect	中国科学院信息工程研究所	0.8888
2	Tele_StrarAGI	中国电信人工智能公司星辰通用人工智能实验室	0.8393
3	希望都队	郑州大学	0.7745
4	ZZUNLP_Zhao	郑州大学	0.7475
5	CEDAR	北京理工大学	0.7299

完整排名请查看 Codabench。

论文征集

系统报告与会议论文征集工作现已正式启动。本次任务共有 7 篇 邀请论文收录名额，录用论文将收录于 Springer LNAI 系列 NLPCC 2026 会议英文论文集。

邀请投稿：前 5 名队伍获正式邀请提交系统论文。

公开投稿：面向所有符合资格的报名队伍额外开放 2 个 论文征集名额。

投稿资格：仅在开放评测或封闭评测阶段提交过有效成绩的队伍具备投稿资格。

论文提交细则

语言：仅限英文。
格式：LNCS 排版格式（支持 LaTeX 或 Word 模板）。
篇幅：全文上限 12 页，参考文献与附录一并计入。
文件格式：PDF。
提交渠道：OpenReview。
截止时间：2026年7月17日 03:59 PM UTC-0（请以 NLPCC 官方通知为准）。
官方说明：http://tcci.ccf.org.cn/conference/2026/shared-tasks/。

特别说明：获得邀请不保证自动录用。若受邀论文质量不达标，组委会有权从公开投稿中择优替代。

组织者

本共享任务由澳门大学自然语言处理与葡汉机器翻译实验室（NLP2CT）、华中师范大学以及阿里云联合主办。

吴俊潮 | 澳门大学（联系人）nlp2ct.junchao@gmail.com | 个人主页
黄辉 | 澳门大学 | 个人主页
詹润哲 | 澳门大学 | 个人主页
吴泽宇 | 澳门大学
谢志文 | 澳门大学 / 华中师范大学
杜逸超 | 阿里云
王龙跃 | 阿里云 | 个人主页

常见问题 (FAQ)

问：我可以在哪里报名参加这个共享任务？团队报名信息可以更新吗？

答：请填写以下链接进行报名。在报名截止日期之前，您可以重复填写此链接以更新团队报名信息。

问：本次任务对模型参数量、数据增强方法或所使用的模型是否有任何限制？如果进行模型集成，对总参数量有没有要求？

答：本次 NLPCC 2026 Share Task 6 任务对单模型参数量没有任何限制，同时若采用模型集成方案，对集成后的总参数量也无相关参数约束。允许使用开源大模型、闭源 API 进行文本释义、改写等数据增强操作，此类操作完全符合比赛规则。

问：训练集的数据来源是新闻和学术写作，所用为四种生成模型。后续测试数据中也是这两种来源和这四种模型吗，还是会扩充？测试数据是否采用与训练集相同的续写方式生成？是否会包含格式符号或不完整文本？

答：正如 README 中【数据集】部分所述，测试集将采用另一个数据集的分布外数据（DetectRL 基准，NeurIPS 2024）扩展构建，不再局限于训练集所涵盖的新闻、学术领域以及四类生成模型。测试数据将可能引入未知领域文本与未知生成模型，包括不同的数据生成方案，进行多维度压力测试，从而全面客观地综合评估检测器在真实应用场景下的实际性能表现。测试集 LGT 文本不一定采用训练集同款前 25% token 续写方式；即便采用续写生成，也会对前置 token 做清洗，整体测试集质量更高。测试集会经过严格预处理清洗，不会保留 "\n" 等冗余格式符号。测试集基本不会出现文本突然中断的情况，但不会强制要求均以句号完整收尾，会保留如新闻末尾署名标注等现实场景自然存在的文本形式，用于检验模型真实鲁棒性。

问：训练集中存在纯英文文本且标签为 HLT、约 2000 条以 "123" 结尾且前文与人类文本相同的样本，且 README 中标注为 LGT 而实际数据集为 MGT，这些应如何理解？

答：训练集中出现的纯英文文本标签为 HLT 的样本，以及约 2000 条以 "123" 结尾、前文与人类文本重复的样本，均为 CUDRT 数据集原生噪声，我方未做额外清洗处理。数据清洗工作由各参赛队伍自行设计方案处理。关于标签不一致：训练集中的 MGT 即为 LGT（标签为 2），后续会更新数据集，将数据内的 MGT 统一修正为 LGT，并同步发布官方更正通知。

问：本次任务是否允许使用伪标签，即测试集的数据能否用我们的模型推理一遍，将预测结果作为伪标签加入训练集以扩充训练数据？

答：本次任务不允许使用测试集数据进行伪标签操作。任何形式的利用测试集样本及预测伪标签加入训练数据进行模型优化的行为均不被允许。因为在真实应用场景中，利用测试集数据进行伪标签的做法并不成立，所以在本次比赛中也不允许此类操作。官方测试集仅用于最终提交的评测。任何形式的利用测试集数据进行训练、微调、伪标签构建或数据增强的行为，将被视为违反比赛规则。参赛队伍仅可使用官方提供的训练集。训练集的扩展方式仅限于已公布的合法的数据增强方法，不可显式引入新的数据来源。

问：预测时是可以通过整个数据集来输出标签，还是必须输入单个文本输出标签？

答：预测环节必须输入单个文本输出标签，该要求旨在保证模型在实际应用场景中具备真实可用性。

问：能否利用未清理干净的特征（如格式符号等）进行规则分类？

答：允许基于训练集提取特征并开展规则分类，但禁止对测试集提取特征。采用此类规则方案的参赛队伍必须提供统计脚本，以证明该特征在训练集上可被统计性地观测到。若规则无法在训练集上验证，将取消成绩——请谨慎使用。

问：能否通过内置大模型 API 进行联网搜索（如 RAG）判别文本？

答：可以调用内置大模型 API 完成文本判别工作，但使用 RAG（检索增强生成）或任何形式的联网查询视为违规，该行为违反了禁止引入外部数据源的赛事规则。

参考文献

如果您的研究使用了相关数据集，或者本任务对您有所帮助，请考虑引用以下文献：