gluonnlp.loss¶

GluonNLP Toolkit provides tools for easily setting up task specific loss.

Masked Loss¶

MaskedSoftmaxCrossEntropyLoss

Wrapper of the SoftmaxCELoss that supports valid_length as the input (alias: MaskedSoftmaxCELoss)

Label Smoothing¶

LabelSmoothing

Applies label smoothing.

Activation Regularizers¶

Activation regularization and temporal activation regularization defined in the following work:

@article{merity2017revisiting,
  title={Revisiting Activation Regularization for Language RNNs},
  author={Merity, Stephen and McCann, Bryan and Socher, Richard},
  journal={arXiv preprint arXiv:1708.01009},
  year={2017}}

`ActivationRegularizationLoss`	Computes Activation Regularization Loss.
`TemporalActivationRegularizationLoss`	Computes Temporal Activation Regularization Loss.

API Reference¶

NLP loss.

class gluonnlp.loss.ActivationRegularizationLoss(alpha=0, weight=None, batch_axis=None, **kwargs)[source]¶

Computes Activation Regularization Loss. (alias: AR)

The formulation is as below:

\[L = \alpha L_2(h_t)\]

where \(L_2(\cdot) = {||\cdot||}_2, h_t\) is the output of the RNN at timestep t. \(\alpha\) is scaling coefficient.

The implementation follows the work:

@article{merity2017revisiting,
  title={Revisiting Activation Regularization for Language RNNs},
  author={Merity, Stephen and McCann, Bryan and Socher, Richard},
  journal={arXiv preprint arXiv:1708.01009},
  year={2017}
}

Parameters

alpha (float, default 0) – The scaling coefficient of the regularization.
weight (float or None) – Global scalar weight for loss.
batch_axis (int, default 0) – The axis that represents mini-batch.

hybrid_forward(F, *states)[source]¶

Parameters: states (list) – the stack outputs from RNN, which consists of output from each time step (TNC).
Returns: loss – loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.
Return type: NDArray

class gluonnlp.loss.TemporalActivationRegularizationLoss(beta=0, weight=None, batch_axis=None, **kwargs)[source]¶

Computes Temporal Activation Regularization Loss. (alias: TAR)

The formulation is as below:

\[L = \beta L_2(h_t-h_{t+1})\]

where \(L_2(\cdot) = {||\cdot||}_2, h_t\) is the output of the RNN at timestep t, \(h_{t+1}\) is the output of the RNN at timestep t+1, \(\beta\) is scaling coefficient.

The implementation follows the work:

@article{merity2017revisiting,
  title={Revisiting Activation Regularization for Language RNNs},
  author={Merity, Stephen and McCann, Bryan and Socher, Richard},
  journal={arXiv preprint arXiv:1708.01009},
  year={2017}
}

Parameters

beta (float, default 0) – The scaling coefficient of the regularization.
weight (float or None) – Global scalar weight for loss.
batch_axis (int, default 0) – The axis that represents mini-batch.

hybrid_forward(F, *states)[source]¶

Parameters: states (list) – the stack outputs from RNN, which consists of output from each time step (TNC).
Returns: loss – loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.
Return type: NDArray

class gluonnlp.loss.MaskedSoftmaxCrossEntropyLoss(sparse_label=True, from_logits=False, weight=None, **kwargs)[source]¶

Wrapper of the SoftmaxCELoss that supports valid_length as the input (alias: MaskedSoftmaxCELoss)

If sparse_label is True (default), label should contain integer category indicators:

\[ \begin{align}\begin{aligned}\DeclareMathOperator{softmax}{softmax}\\p = \softmax({pred})\\L = -\sum_i \log p_{i,{label}_i}\end{aligned}\end{align} \]

label’s shape should be pred’s shape with the channel dimension removed. i.e. for pred with shape (1,2,3) label’s shape should be (1,2).

If sparse_label is False, label should contain probability distribution and label’s shape should be the same with pred:

\[ \begin{align}\begin{aligned}p = \softmax({pred})\\L = -\sum_i \sum_j {label}_j \log p_{ij}\end{aligned}\end{align} \]

Parameters

sparse_label (bool, default True) – Whether label is an integer array instead of probability distribution.
from_logits (bool, default False) – Whether input is a log probability (usually from log_softmax) instead of unnormalized numbers.
weight (float or None) – Global scalar weight for loss.
Inputs –
- pred: the prediction tensor, shape should be (N, T, C)
- label: the truth tensor. When sparse_label is True, label’s shape should be pred’s shape with the channel dimension C removed. i.e. for pred with shape (1,2,3) label’s shape should be (1,2) and values should be integers between 0 and 2. If sparse_label is False, label’s shape must be the same as pred and values should be floats in the range [0, 1].
- valid_length: valid length of each sequence, of shape (batch_size, ) predictions elements longer than their valid_length are masked out
Outputs –
- loss: loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.

hybrid_forward(F, pred, label, valid_length)[source]¶

Overrides to construct symbolic graph for this Block.

Parameters

x (Symbol or NDArray) – The first input tensor.
*args (list of Symbol or list of NDArray) – Additional input tensors.

gluonnlp.loss.MaskedSoftmaxCELoss¶: alias of gluonnlp.loss.loss.MaskedSoftmaxCrossEntropyLoss

class gluonnlp.loss.LabelSmoothing(axis=-1, epsilon=0.1, units=None, sparse_label=True, prefix=None, params=None)[source]¶

Applies label smoothing. See https://arxiv.org/abs/1512.00567.

It changes the construction of the probability to (1 - epsilon) for the true class, epsilon / (num_classes - 1) otherwise.

Parameters

axis (int, default -1) – The axis to smooth.
epsilon (float, default 0.1) – The epsilon parameter in label smoothing
sparse_label (bool, default True) – Whether input is an integer array instead of one hot array.
units (int or None) – Vocabulary size. If units is not given, it will be inferred from the input.
prefix (str) – Prefix for name of Block`s (and name of weight if params is `None).
params (Parameter or None) – Container for weight sharing between cells. Created if None.

hybrid_forward(F, inputs, units=None)[source]¶

Parameters

inputs (Symbol or NDArray) – Shape (batch_size, length) or (batch_size, length, V)
units (int or None) –

Returns

smoothed_label – Shape (batch_size, length, V)

Return type

Symbol or NDArray