prompting.validators.reward.diversity#

Module Contents#

Classes#

DiversityRewardEvent

DiversityRewardModel

Functions#

mean_pooling(model_output, attention_mask)

Applies mean pooling to the token embeddings generated by the model.

prompting.validators.reward.diversity.mean_pooling(model_output, attention_mask)#

Applies mean pooling to the token embeddings generated by the model. :param model_output: Embedding model output, where the first element contains token embeddings. :type model_output: torch.Tensor :param attention_mask: Attention mask to indicate valid tokens. :type attention_mask: torch.Tensor

Returns:

Mean-pooled representation of the token embeddings.

Return type:

torch.Tensor

Notes

  • The function calculates the mean-pooled representation using the attention mask for valid tokens.

  • Input_mask_expanded is created by expanding the attention mask to match the size of token embeddings.

  • The result is obtained by summing the element-wise multiplication of embeddings and input_mask_expanded,

    and dividing it by the sum of input_mask_expanded after clamping its values to a minimum of 1e-9.

class prompting.validators.reward.diversity.DiversityRewardEvent#

Bases: prompting.validators.reward.reward.BaseRewardEvent

historic: float#
batch: float#
class prompting.validators.reward.diversity.DiversityRewardModel(device)#

Bases: prompting.validators.reward.reward.BaseRewardModel

Parameters:

device (str) –

property name: str#
Return type:

str

diversity_model_path = 'sentence-transformers/all-mpnet-base-v2'#
get_embeddings(sentences)#

Runs a forward pass through the model. :param sentences: text message to be encoded. :type sentences: List[str]

Returns:

Embedding for the message.

Return type:

embedding (torch.FloatTensor)

Parameters:

sentences (List[str]) –

update_historic_embeddings(embeddings)#
Parameters:

embeddings (torch.FloatTensor) –

get_historic_rewards(embeddings)#
Parameters:

embeddings (torch.FloatTensor) –

Return type:

torch.FloatTensor

get_batch_rewards(embeddings)#
Parameters:

embeddings (torch.FloatTensor) –

Return type:

torch.FloatTensor

get_rewards(prompt, completions, name)#
Parameters:
  • prompt (str) –

  • completions (List[str]) –

  • name (str) –

Return type:

List[DiversityRewardEvent]

normalize_rewards(raw_rewards)#

This method normalizes the given rewards by updating the moving mean and variance statistics. The rewards are first standardized, and then scaled to the 0-1 range using a cumulative distribution function (CDF) to ensure they’re in a comparable range across different environments.

Args: rewards (torch.FloatTensor): The reward values to be normalized.

Returns: torch.FloatTensor: The normalized reward values.

Note: - This function uses Welford’s online algorithm to update the mean and variance. - It standardizes the reward values using the updated mean and variance. - It then scales the standardized values to the 0-1 range using the error function (erf) as a CDF.

Parameters:

raw_rewards (torch.FloatTensor) –

Return type:

torch.FloatTensor