gluonnlp.loss

GluonNLP Toolkit provides tools for easily setting up task specific loss.

Masked Loss

MaskedSoftmaxCrossEntropyLoss

Wrapper of the SoftmaxCELoss that supports valid_length as the input (alias: MaskedSoftmaxCELoss)

Label Smoothing

LabelSmoothing

Applies label smoothing.

Activation Regularizers

Activation regularization and temporal activation regularization defined in the following work:

@article{merity2017revisiting,
  title={Revisiting Activation Regularization for Language RNNs},
  author={Merity, Stephen and McCann, Bryan and Socher, Richard},
  journal={arXiv preprint arXiv:1708.01009},
  year={2017}}

ActivationRegularizationLoss

Computes Activation Regularization Loss.

TemporalActivationRegularizationLoss

Computes Temporal Activation Regularization Loss.

API Reference

NLP loss.

class gluonnlp.loss.ActivationRegularizationLoss(alpha=0, weight=None, batch_axis=None, **kwargs)[source]

Computes Activation Regularization Loss. (alias: AR)

The formulation is as below:

\[L = \alpha L_2(h_t)\]

where \(L_2(\cdot) = {||\cdot||}_2, h_t\) is the output of the RNN at timestep t. \(\alpha\) is scaling coefficient.

The implementation follows the work:

@article{merity2017revisiting,
  title={Revisiting Activation Regularization for Language RNNs},
  author={Merity, Stephen and McCann, Bryan and Socher, Richard},
  journal={arXiv preprint arXiv:1708.01009},
  year={2017}
}
Parameters
  • alpha (float, default 0) – The scaling coefficient of the regularization.

  • weight (float or None) – Global scalar weight for loss.

  • batch_axis (int, default 0) – The axis that represents mini-batch.

hybrid_forward(F, *states)[source]
Parameters

states (list) – the stack outputs from RNN, which consists of output from each time step (TNC).

Returns

loss – loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.

Return type

NDArray

class gluonnlp.loss.TemporalActivationRegularizationLoss(beta=0, weight=None, batch_axis=None, **kwargs)[source]

Computes Temporal Activation Regularization Loss. (alias: TAR)

The formulation is as below:

\[L = \beta L_2(h_t-h_{t+1})\]

where \(L_2(\cdot) = {||\cdot||}_2, h_t\) is the output of the RNN at timestep t, \(h_{t+1}\) is the output of the RNN at timestep t+1, \(\beta\) is scaling coefficient.

The implementation follows the work:

@article{merity2017revisiting,
  title={Revisiting Activation Regularization for Language RNNs},
  author={Merity, Stephen and McCann, Bryan and Socher, Richard},
  journal={arXiv preprint arXiv:1708.01009},
  year={2017}
}
Parameters
  • beta (float, default 0) – The scaling coefficient of the regularization.

  • weight (float or None) – Global scalar weight for loss.

  • batch_axis (int, default 0) – The axis that represents mini-batch.

hybrid_forward(F, *states)[source]
Parameters

states (list) – the stack outputs from RNN, which consists of output from each time step (TNC).

Returns

loss – loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.

Return type

NDArray

class gluonnlp.loss.MaskedSoftmaxCrossEntropyLoss(sparse_label=True, from_logits=False, weight=None, **kwargs)[source]

Wrapper of the SoftmaxCELoss that supports valid_length as the input (alias: MaskedSoftmaxCELoss)

If sparse_label is True (default), label should contain integer category indicators:

\[ \begin{align}\begin{aligned}\DeclareMathOperator{softmax}{softmax}\\p = \softmax({pred})\\L = -\sum_i \log p_{i,{label}_i}\end{aligned}\end{align} \]

label’s shape should be pred’s shape with the channel dimension removed. i.e. for pred with shape (1,2,3) label’s shape should be (1,2).

If sparse_label is False, label should contain probability distribution and label’s shape should be the same with pred:

\[ \begin{align}\begin{aligned}p = \softmax({pred})\\L = -\sum_i \sum_j {label}_j \log p_{ij}\end{aligned}\end{align} \]
Parameters
  • sparse_label (bool, default True) – Whether label is an integer array instead of probability distribution.

  • from_logits (bool, default False) – Whether input is a log probability (usually from log_softmax) instead of unnormalized numbers.

  • weight (float or None) – Global scalar weight for loss.

  • Inputs

    • pred: the prediction tensor, shape should be (N, T, C)

    • label: the truth tensor. When sparse_label is True, label’s shape should be pred’s shape with the channel dimension C removed. i.e. for pred with shape (1,2,3) label’s shape should be (1,2) and values should be integers between 0 and 2. If sparse_label is False, label’s shape must be the same as pred and values should be floats in the range [0, 1].

    • valid_length: valid length of each sequence, of shape (batch_size, ) predictions elements longer than their valid_length are masked out

  • Outputs

    • loss: loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.

hybrid_forward(F, pred, label, valid_length)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

gluonnlp.loss.MaskedSoftmaxCELoss

alias of gluonnlp.loss.loss.MaskedSoftmaxCrossEntropyLoss

class gluonnlp.loss.LabelSmoothing(axis=-1, epsilon=0.1, units=None, sparse_label=True, prefix=None, params=None)[source]

Applies label smoothing. See https://arxiv.org/abs/1512.00567.

It changes the construction of the probability to (1 - epsilon) for the true class, epsilon / (num_classes - 1) otherwise.

Parameters
  • axis (int, default -1) – The axis to smooth.

  • epsilon (float, default 0.1) – The epsilon parameter in label smoothing

  • sparse_label (bool, default True) – Whether input is an integer array instead of one hot array.

  • units (int or None) – Vocabulary size. If units is not given, it will be inferred from the input.

  • prefix (str) – Prefix for name of Block`s (and name of weight if params is `None).

  • params (Parameter or None) – Container for weight sharing between cells. Created if None.

hybrid_forward(F, inputs, units=None)[source]
Parameters
  • inputs (Symbol or NDArray) – Shape (batch_size, length) or (batch_size, length, V)

  • units (int or None) –

Returns

smoothed_label – Shape (batch_size, length, V)

Return type

Symbol or NDArray