gluonnlp.initializer

This page describes initializers that are useful for multiple NLP model architectures.

Highway Bias Initializer

We now provide Highway bias initializer defined in the following work.

@inproceedings{srivastava2015training,
     title={Training very deep networks},
     author={Srivastava, Rupesh K and Greff, Klaus and Schmidhuber, J{\"u}rgen},
     booktitle={Advances in neural information processing systems},
     pages={2377--2385},
     year={2015}}

HighwayBias

Initialize all biases of an Highway layer by setting the biases of nonlinear transformer and the transform gate differently.

API Reference

NLP initializer.

class gluonnlp.initializer.HighwayBias(nonlinear_transform_bias=0.0, transform_gate_bias=-2.0, **kwargs)[source]

Initialize all biases of an Highway layer by setting the biases of nonlinear transformer and the transform gate differently. The dimension of the biases are identical and equals to the \(arr.shape[0]/2\), where \(arr\) is the bias tensor.

The definition of the biases follows the work:

@inproceedings{srivastava2015training,
 title={Training very deep networks},
 author={Srivastava, Rupesh K and Greff, Klaus and Schmidhuber, J{\"u}rgen},
 booktitle={Advances in neural information processing systems},
 pages={2377--2385},
 year={2015}
}
Parameters
  • nonlinear_transform_bias (float, default 0.0) – bias for the non linear transformer. We set the default according to the above original work.

  • transform_gate_bias (float, default -2.0) – bias for the transform gate. We set the default according to the above original work.

class gluonnlp.initializer.TruncNorm(mean=0, stdev=0.01, **kwargs)[source]

Initialize the weight by drawing sample from truncated normal distribution with provided mean and standard deviation. Values whose magnitude is more than 2 standard deviations from the mean are dropped and re-picked..

Parameters
  • mean (float, default 0) – Mean of the underlying normal distribution

  • stdev (float, default 0.01) – Standard deviation of the underlying normal distribution

  • **kwargs (dict) – Additional parameters for base Initializer.