dot-concatenate_qkv_weights: Concatenate Attention Weights

.concatenate_qkv_weightsR Documentation

Concatenate Attention Weights

Description

Concatenate weights to format attention parameters appropriately for loading into BERT models. The torch attention module puts the weight/bias values for the q,k,v tensors into a single tensor, rather than three separate ones. We do the concatenation so that we can load into our models.

Usage

.concatenate_qkv_weights(state_dict)

Arguments

state_dict

A state_dict of pretrained weights, probably loaded from a file.

Value

The state_dict with query, key, value weights concatenated.


macmillancontentscience/torchtransformers documentation built on Aug. 6, 2023, 5:35 a.m.