Trapper: Transformer模型都在此! Trapper(Transformers wrapper)是一个NLP库,旨在使基于下游任务的transformer模型的训练更加容易。该库提供transformer模型实现和训练机制,它为使用transformer模型时遇到的常见任务定义了带有基类的抽象。此外,它还提供了依赖注入机制,并允许通过配置文件定义训练和评估实验。通过这种方式,可以使用不同的模型、优化器等进行实验,只需在配置文件中更改它们的值,而无需编写任何新代码或更改现有代码。这些特性促进了代码重用,减少了代码. 安装的方式非常简单,直接pip install trapper即可.
可以直接用训练好的模型做情感分析,填词等任务: >>> from transformers import pipeline >>> unmasker = pipeline('fill-mask', model='bert-base-uncased') >>> unmasker('Hello I'm a [MASK] model.')
[{'sequence': '[CLS] hello i'm a fashion model. [SEP]', 'score': 0.1073106899857521, 'token': 4827, 'token_str': 'fashion'}, {'sequence': '[CLS] hello i'm a role model. [SEP]', 'score': 0.08774490654468536, 'token': 2535, 'token_str': 'role'}, {'sequence': '[CLS] hello i'm a new model. [SEP]', 'score': 0.05338378623127937, 'token': 2047, 'token_str': 'new'}, {'sequence': '[CLS] hello i'm a super model. [SEP]', 'score': 0.04667217284440994, 'token': 3565, 'token_str': 'super'}, {'sequence': '[CLS] hello i'm a fine model. [SEP]', 'score': 0.027095865458250046, 'token': 2986, 'token_str': 'fine'}]
也可以轻松的进行finetune和预估,以下是pytorch方法,此外还支持keras: # 读数据 from datasets import load_dataset raw_datasets = load_dataset('imdb')
# 预处理 from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('bert-base-cased') inputs = tokenizer(sentences, padding='max_length', truncation=True) def tokenize_function(examples): return tokenizer(examples['text'], padding='max_length', truncation=True) tokenized_datasets = raw_datasets.map(tokenize_function, batched=True) small_train_dataset = tokenized_datasets['train'].shuffle(seed=42).select(range(1000)) small_eval_dataset = tokenized_datasets['test'].shuffle(seed=42).select(range(1000)) full_train_dataset = tokenized_datasets['train'] full_eval_dataset = tokenized_datasets['test']
# 定义自己的模型 from transformers import AutoModelForSequenceClassification from transformers import TrainingArguments from transformers import Trainer model = AutoModelForSequenceClassification.from_pretrained('bert-base-cased', num_labels=2) training_args = TrainingArguments('test_trainer')
trainer = Trainer( model=model, args=training_args, train_dataset=small_train_dataset, eval_dataset=small_eval_dataset ) trainer.train()
# 评估 import numpy as np from datasets import load_metric
metric = load_metric('accuracy')
def compute_metrics(eval_pred): logits, labels = eval_pred predictions = np.argmax(logits, axis=-1) return metric.compute(predictions=predictions, references=labels)
该库支持的transformer模型多达80多个! https://github.com/huggingface/transformers/blob/master/README_zh-hans.md
|