5. Bert 输入输出是什么？

1.Bert 输入

一般情况下，直接将文本用tokenizertoken 后输入 model 即可。

 from transformers import AutoTokenizer, AutoModel, AutoConfig
 
model_name = "hfl/chinese-roberta-wwm-ext"
text = "已有研究表明了过度参数化的模型其实是位于一个低的内在维度上，所以作者假设在模型适应过程中的权重变化也具有较低的“内在等级”。"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
 
inputs = tokenizer.encode_plus(text,
                        return_tensors='pt',
                        add_special_tokens=True,
                        max_length=45,
                        pad_to_max_length=True,
                        truncation=True)
#inputs

这里 inputs 应该是这样的：

 {'input_ids': tensor([[ 101, 2347, 3300, 4777, 4955, 6134, 3209,  749, 6814, 2428, 1346, 3144,
         1265, 4638, 3563, 1798, 1071, 2141, 3221,  855,  754,  671,  702,  856,
         4638, 1079, 1762, 5335, 2428,  677, 8024, 2792,  809,  868, 5442,  969,
         6392, 1762, 3563, 1798, 6844, 2418, 6814, 4923,  102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]), 'offset_mapping': tensor([[[0,  0],
         [0,  1],
         [1,  2],
           ...
         [42, 43],
         [0,  0]]])}

input_ids: 对应词汇表 id
token_type_ids：token 对应的句子 id，值为 0 或 1（0 表示对应的 token 属于第一句，1 表示属于第二句）。形状为(batch_size, sequence_length)。
'attention_mask': 可选参数。各元素的值为 0 或 1，设置来避免在 padding 的 token 上计算 attention（1 不进行 masked，0 则 masked）。形状为(batch_size, sequence_length)。
'offset_mapping': 这里输出是一系列元组，每个位置的元组对应着分词后的 token 在原始文本的起始和结束位置，是一个 map 的列表。简单来说就是 token 后的 token 在原文本中的 start，end 位置。注意，[start, end）左闭右开。

具体来看，将其转换为 tokens, print(tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])). 其输出应该是：

['[CLS]', '已', '有', '研', '究', '表', '明', '了', '过', '度', '参', '数', '化', '的', '模', '型', '其', '实', '是', '位', '于', '一', '个', '低', '的', '内', '在', '维', '度', '上', '，', '所', '以', '作', '者', '假', '设', '在', '模', '型', '适', '应', '过', '程', '[SEP]']

这里，'offset_mapping'对应的 '[CLS]’ 是原文本没有的 [0, 0] 返回。’ 已 ’ 是原文本的第一个位置, 对应 [0, 1].(0 为列表的第一个位置) 这个一般不用，需要返回字符对应索引的才用。

2. Bert 输出

 output = model(**inputs)
#output
BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=tensor([[[ 0.0031,  0.4094,  1.0830,  ..., -0.6175, -0.5913, -0.0447],
         [0.0204,  0.7668,  0.8072,  ..., -0.2872, -0.3725, -0.8013],
         [-0.3821,  0.4607, -0.1044,  ..., -0.0211, -1.0745,  0.0551],
         ...,
         [0.0031,  0.4094,  1.0830,  ..., -0.6175, -0.5913, -0.0447]]],
       grad_fn=<NativeLayerNormBackward0>),
       pooler_output=tensor([[ 0.9944,  0.8845,  0.9474,  0.7460,  0.5017,  0.3897, -0.9517,  0.9616,
        ...,
         -0.9466,  0.9946,  0.8685, -0.4311,  0.5076, -0.9549, -0.9607,  0.5031]],
       grad_fn=<TanhBackward0>), hidden_states=None, past_key_values=None, attentions=None, cross_attentions=None)

这里第一个是 output[0]/output[‘last_hidden_state’]，表示 sequence 所有词的隐状态,shape 为bsxseq_lenxhidden_dim，第二个是output[1]/output[‘pooler_output’]，表示最后一层[CLS] 的隐状态,shape 为bsxhidden_dim。

常用表示方法解释：

last_hidden_state是所有词的隐状态，可以看做所有词的 embedding, 第一个是 [CLS] 的隐状态，所以有 1.BERT 应用中 last_hidden_states[0][:,0,:]写法，来表示第一个 token 的隐状态。这样就可以接其他层来做各种分类任务。
pooler_output是最后一层 [CLS] 的隐状态的, 不过经过了一个线性层和 Tanh 激活函数进一步处理后得到的。官方源码 BertPooler 可以查看。
last_hidden_state[:, 0]: 是 CLS 的最后一层隐藏层状态
outputs[0]是 last_hidden_state[:, 0]经过了一个线性层和 Tanh 激活函数进一步处理后得到的。官方源码 BertPooler可以查看

 class BertPooler(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
        self.activation = nn.Tanh()
 
    def forward(self, hidden_states):
        # We "pool" the model by simply taking the hidden state corresponding
        # to the first token.
        first_token_tensor = hidden_states[:, 0]
        pooled_output = self.dense(first_token_tensor)
        pooled_output = self.activation(pooled_output)
        return pooled_output

output[0].shape, output[1].shape对应为 (torch.Size([1, 45, 768]), torch.Size([1, 768]))

当然你可以，输出所有层的 hidden_states.

 model = AutoModel.from_pretrained(model_name, output_hidden_states=True)
output = model(**inputs)
print(output)

output 输出将会多一个 hidden_states 字段，你可以用 output[2]/output[‘hidden_states’] 来查看，它是 13 个 tensor(1 个 embedding 层和 12 个 hidden states), size 为(1, seq_len, hidden_dim). 比如, print(output[2][0].shape)` 将会得到 torch.Size([1, 45, 768])。

正文完

发表至： NLP

2023-11-26

转载说明：除特殊说明外本站文章皆由CC-4.0协议发布，转载请联系tensortimes@gmail.com。

LLM模型参数

开源Qwen2-VL模型发布：多模态大模型家族的新成员

AutoGen

9. GlobalPointer 和 Efficient GlobalPointer 原理

8.FGM、PGD、AWP 对抗训练技巧

5. Bert 输入输出是什么？

1.Bert 输入

2. Bert 输出

Cursor Free VIP 工具 0.48.x 版本全面介绍

2025年最新高质量Agent项目全面报告

Krillin AI: 一站式视频本地化与增强解决方案

	from transformers import AutoTokenizer, AutoModel, AutoConfig

	model_name = "hfl/chinese-roberta-wwm-ext"
	text = "已有研究表明了过度参数化的模型其实是位于一个低的内在维度上，所以作者假设在模型适应过程中的权重变化也具有较低的“内在等级”。"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModel.from_pretrained(model_name)

	inputs = tokenizer.encode_plus(text,
	return_tensors='pt',
	add_special_tokens=True,
	max_length=45,
	pad_to_max_length=True,
	truncation=True)
	#inputs

	{'input_ids': tensor([[ 101, 2347, 3300, 4777, 4955, 6134, 3209, 749, 6814, 2428, 1346, 3144,
	1265, 4638, 3563, 1798, 1071, 2141, 3221, 855, 754, 671, 702, 856,
	4638, 1079, 1762, 5335, 2428, 677, 8024, 2792, 809, 868, 5442, 969,
	6392, 1762, 3563, 1798, 6844, 2418, 6814, 4923, 102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]), 'offset_mapping': tensor([[[0, 0],
	[0, 1],
	[1, 2],
	...
	[42, 43],
	[0, 0]]])}

	output = model(**inputs)
	#output
	BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=tensor([[[ 0.0031, 0.4094, 1.0830, ..., -0.6175, -0.5913, -0.0447],
	[0.0204, 0.7668, 0.8072, ..., -0.2872, -0.3725, -0.8013],
	[-0.3821, 0.4607, -0.1044, ..., -0.0211, -1.0745, 0.0551],
	...,
	[0.0031, 0.4094, 1.0830, ..., -0.6175, -0.5913, -0.0447]]],
	grad_fn=<NativeLayerNormBackward0>),
	pooler_output=tensor([[ 0.9944, 0.8845, 0.9474, 0.7460, 0.5017, 0.3897, -0.9517, 0.9616,
	...,
	-0.9466, 0.9946, 0.8685, -0.4311, 0.5076, -0.9549, -0.9607, 0.5031]],
	grad_fn=<TanhBackward0>), hidden_states=None, past_key_values=None, attentions=None, cross_attentions=None)

	class BertPooler(nn.Module):
	def __init__(self, config):
	super().__init__()
	self.dense = nn.Linear(config.hidden_size, config.hidden_size)
	self.activation = nn.Tanh()

	def forward(self, hidden_states):
	# We "pool" the model by simply taking the hidden state corresponding
	# to the first token.
	first_token_tensor = hidden_states[:, 0]
	pooled_output = self.dense(first_token_tensor)
	pooled_output = self.activation(pooled_output)
	return pooled_output

	model = AutoModel.from_pretrained(model_name, output_hidden_states=True)
	output = model(**inputs)
	print(output)