gluonnlp.loss¶
GluonNLP Toolkit provides tools for easily setting up task specific loss.
Masked Loss¶
Wrapper of the SoftmaxCELoss that supports valid_length as the input (alias: MaskedSoftmaxCELoss) |
Label Smoothing¶
Applies label smoothing. |
Activation Regularizers¶
Activation regularization and temporal activation regularization defined in the following work:
@article{merity2017revisiting,
title={Revisiting Activation Regularization for Language RNNs},
author={Merity, Stephen and McCann, Bryan and Socher, Richard},
journal={arXiv preprint arXiv:1708.01009},
year={2017}}
Computes Activation Regularization Loss. |
|
Computes Temporal Activation Regularization Loss. |
API Reference¶
NLP loss.
-
class
gluonnlp.loss.
ActivationRegularizationLoss
(alpha=0, weight=None, batch_axis=None, **kwargs)[source]¶ Computes Activation Regularization Loss. (alias: AR)
The formulation is as below:
\[L = \alpha L_2(h_t)\]where \(L_2(\cdot) = {||\cdot||}_2, h_t\) is the output of the RNN at timestep t. \(\alpha\) is scaling coefficient.
The implementation follows the work:
@article{merity2017revisiting, title={Revisiting Activation Regularization for Language RNNs}, author={Merity, Stephen and McCann, Bryan and Socher, Richard}, journal={arXiv preprint arXiv:1708.01009}, year={2017} }
- Parameters
-
class
gluonnlp.loss.
TemporalActivationRegularizationLoss
(beta=0, weight=None, batch_axis=None, **kwargs)[source]¶ Computes Temporal Activation Regularization Loss. (alias: TAR)
The formulation is as below:
\[L = \beta L_2(h_t-h_{t+1})\]where \(L_2(\cdot) = {||\cdot||}_2, h_t\) is the output of the RNN at timestep t, \(h_{t+1}\) is the output of the RNN at timestep t+1, \(\beta\) is scaling coefficient.
The implementation follows the work:
@article{merity2017revisiting, title={Revisiting Activation Regularization for Language RNNs}, author={Merity, Stephen and McCann, Bryan and Socher, Richard}, journal={arXiv preprint arXiv:1708.01009}, year={2017} }
- Parameters
-
class
gluonnlp.loss.
MaskedSoftmaxCrossEntropyLoss
(sparse_label=True, from_logits=False, weight=None, **kwargs)[source]¶ Wrapper of the SoftmaxCELoss that supports valid_length as the input (alias: MaskedSoftmaxCELoss)
If sparse_label is True (default), label should contain integer category indicators:
\[ \begin{align}\begin{aligned}\DeclareMathOperator{softmax}{softmax}\\p = \softmax({pred})\\L = -\sum_i \log p_{i,{label}_i}\end{aligned}\end{align} \]label’s shape should be pred’s shape with the channel dimension removed. i.e. for pred with shape (1,2,3) label’s shape should be (1,2).
If sparse_label is False, label should contain probability distribution and label’s shape should be the same with pred:
\[ \begin{align}\begin{aligned}p = \softmax({pred})\\L = -\sum_i \sum_j {label}_j \log p_{ij}\end{aligned}\end{align} \]- Parameters
sparse_label (bool, default True) – Whether label is an integer array instead of probability distribution.
from_logits (bool, default False) – Whether input is a log probability (usually from log_softmax) instead of unnormalized numbers.
Inputs –
pred: the prediction tensor, shape should be (N, T, C)
label: the truth tensor. When sparse_label is True, label’s shape should be pred’s shape with the channel dimension C removed. i.e. for pred with shape (1,2,3) label’s shape should be (1,2) and values should be integers between 0 and 2. If sparse_label is False, label’s shape must be the same as pred and values should be floats in the range [0, 1].
valid_length: valid length of each sequence, of shape (batch_size, ) predictions elements longer than their valid_length are masked out
Outputs –
loss: loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.
-
gluonnlp.loss.
MaskedSoftmaxCELoss
¶ alias of
gluonnlp.loss.loss.MaskedSoftmaxCrossEntropyLoss
-
class
gluonnlp.loss.
LabelSmoothing
(axis=-1, epsilon=0.1, units=None, sparse_label=True, prefix=None, params=None)[source]¶ Applies label smoothing. See https://arxiv.org/abs/1512.00567.
It changes the construction of the probability to (1 - epsilon) for the true class, epsilon / (num_classes - 1) otherwise.
- Parameters
axis (int, default -1) – The axis to smooth.
epsilon (float, default 0.1) – The epsilon parameter in label smoothing
sparse_label (bool, default True) – Whether input is an integer array instead of one hot array.
units (int or None) – Vocabulary size. If units is not given, it will be inferred from the input.
prefix (str) – Prefix for name of Block`s (and name of weight if params is `None).
params (Parameter or None) – Container for weight sharing between cells. Created if None.