BART预训练任务的数据处理代码

︶ㄣ演戲ㄣ / 2023-05-11 / 原文

Data collator used for BART denoising language modeling. The code is largely copied from
    `<https://github.com/morganmcg1/rotobart/blob/main/data_collator.py#L223>`__.
    For more information on how BART denoising language modeling works, one can take a look
    at the `official paper <https://arxiv.org/pdf/1910.13461.pdf>`__
    or the `official code for preprocessing <https://github.com/facebookresearch/fairseq/blob/main/fairseq/data/denoising_dataset.py>`__ .
  • 官方code:https://github.com/facebookresearch/fairseq/blob/main/fairseq/data/denoising_dataset.py
  • transformer示例代码:https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_bart_dlm_flax.py
  • 其他示例代码:https://github.com/morganmcg1/rotobart/blob/main/data_collator.py#L223
  • bert预训练任务数据处理代码示例:https://blog.csdn.net/Finks_Chen/article/details/119334214
  • 其他数据处理工具包:https://github.com/prajdabre/yanmtt