gluonnlp.utils¶
GluonNLP Toolkit provides tools for easily setting up task specific loss.
Parameter and Training¶
Rescales gradients of parameters so that the sum of their 2-norm is smaller than max_norm. |
Serialization and Deserialization¶
Load parameters from file previously saved by save_parameters. |
|
Loads trainer states (e.g. |
|
Save parameters to file. |
|
Saves trainer states (e.g. |
API Reference¶
Module for utility functions.
-
class
gluonnlp.utils.
Parallelizable
[source]¶ Base class for parallelizable unit of work, which can be invoked by Parallel. The subclass must implement the forward_backward method, and be used together with Parallel. For example:
class ParallelNet(Parallelizable): def __init__(self): self._net = Model() self._loss = gluon.loss.SoftmaxCrossEntropyLoss() def forward_backward(self, x): data, label = x with mx.autograd.record(): out = self._net(data) loss = self._loss(out, label) loss.backward() return loss net = ParallelNet() ctx = [mx.gpu(0), mx.gpu(1)] parallel = Parallel(len(ctx), net) # Gluon block is initialized after forwarding the first batch initialized = False for batch in batches: for x in gluon.utils.split_and_load(batch, ctx): parallel.put(x) losses = [parallel.get() for _ in ctx] trainer.step()
-
class
gluonnlp.utils.
Parallel
(num_workers, parallizable, serial_init=True)[source]¶ Class for parallel processing with Parallelizable`s. It invokes a `Parallelizable with multiple Python threads. For example:
class ParallelNet(Parallelizable): def __init__(self): self._net = Model() self._loss = gluon.loss.SoftmaxCrossEntropyLoss() def forward_backward(self, x): data, label = x mx.autograd.record(): out = self._net(data) loss = self._loss(out, label) loss.backward() return loss net = ParallelNet() ctx = [mx.gpu(0), mx.gpu(1)] parallel = Parallel(len(ctx), net) for batch in batches: for x in gluon.utils.split_and_load(batch, ctx): parallel.put(x) losses = [parallel.get() for _ in ctx] trainer.step()
- Parameters
num_workers (int) – Number of worker threads. If set to 0, the main thread is used as the worker for debugging purpose.
parallelizable – Parallelizable net whose forward and backward methods are invoked by multiple worker threads.
serial_init (bool, default True) – Execute the first num_workers inputs in main thread, so that the Block used in parallizable is initialized serially. Initialize a Block with multiple threads may cause unexpected behavior.
-
gluonnlp.utils.
grad_global_norm
(parameters, max_norm=None)[source]¶ Calculate the 2-norm of gradients of parameters, and how much they should be scaled down such that their 2-norm does not exceed max_norm, if max_norm if provided.
If gradients exist for more than one context for a parameter, user needs to explicitly call
trainer.allreduce_grads
so that the gradients are summed first before calculating the 2-norm.Note
This function is only for use when update_on_kvstore is set to False in trainer.
Example:
trainer = Trainer(net.collect_params(), update_on_kvstore=False, ...) for x, y in mx.gluon.utils.split_and_load(X, [mx.gpu(0), mx.gpu(1)]): with mx.autograd.record(): y = net(x) loss = loss_fn(y, label) loss.backward() trainer.allreduce_grads() norm = grad_global_norm(net.collect_params().values()) ...
- Parameters
parameters (list of Parameters) –
max_norm (NDArray, optional) – The maximum L2 norm threshold. If provided, ratio and is_finite will be returned.
- Returns
NDArray – Total norm. Shape is (1,)
NDArray – Ratio for rescaling gradients based on max_norm s.t. grad = grad / ratio. If total norm is NaN, ratio will be NaN, too. Returned if max_norm is provided. Shape is (1,)
NDArray – Whether the total norm is finite, returned if max_norm is provided. Shape is (1,)
-
gluonnlp.utils.
clip_grad_global_norm
(parameters, max_norm, check_isfinite=True)[source]¶ Rescales gradients of parameters so that the sum of their 2-norm is smaller than max_norm. If gradients exist for more than one context for a parameter, user needs to explicitly call
trainer.allreduce_grads
so that the gradients are summed first before calculating the 2-norm.Note
This function is only for use when update_on_kvstore is set to False in trainer. In cases where training happens on multiple contexts, this method should be used in conjunction with
trainer.allreduce_grads()
andtrainer.update()
. (nottrainer.step()
)Example:
trainer = Trainer(net.collect_params(), update_on_kvstore=False, ...) for x, y in mx.gluon.utils.split_and_load(X, [mx.gpu(0), mx.gpu(1)]): with mx.autograd.record(): y = net(x) loss = loss_fn(y, label) loss.backward() trainer.allreduce_grads() nlp.utils.clip_grad_global_norm(net.collect_params().values(), max_norm) trainer.update(batch_size) ...
- Parameters
- Returns
Total norm. Return type is NDArray of shape (1,) if check_isfinite is False. Otherwise a float is returned.
- Return type
NDArray or float
-
gluonnlp.utils.
save_parameters
(model, filename)[source]¶ Save parameters to file.
Saved parameters can only be loaded with Block.load_parameters. Note that this method only saves parameters, not model structure.
Both local file system path and S3 URI are supported. For example, ‘s3://mybucket/folder/net.params’, ‘./folder/net.params’.
- Parameters
model (mx.gluon.Block) – The model to save.
uri (str) – Path to file.
-
gluonnlp.utils.
save_states
(trainer, fname)[source]¶ Saves trainer states (e.g. optimizer, momentum) to a file.
Both local file system path and S3 URI are supported. For example, ‘s3://mybucket/folder/net.states’, ‘./folder/net.states’.
- Parameters
trainer (mxnet.gluon.Trainer) – The trainer whose states will be saved.
fname (str) – Path to output states file.
Note
optimizer.param_dict, which contains Parameter information (such as lr_mult and wd_mult) will not be saved.
-
gluonnlp.utils.
load_parameters
(model, filename, ctx=None, allow_missing=False, ignore_extra=False, cast_dtype=None)[source]¶ Load parameters from file previously saved by save_parameters.
Both local file system path and S3 URI are supported. For example, ‘s3://mybucket/folder/net.params’, ‘./folder/net.params’.
- Parameters
filename (str) – Path to parameter file.
ctx (Context or list of Context, default cpu()) – Context(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
-
gluonnlp.utils.
load_states
(trainer, fname)[source]¶ Loads trainer states (e.g. optimizer, momentum) from a file.
Both local file system path and S3 URI are supported. For example, ‘s3://mybucket/folder/net.states’, ‘./folder/net.states’.
- Parameters
trainer (mxnet.gluon.Trainer) – The trainer whose states will be loaded.
fname (str) – Path to input states file.
Note
optimizer.param_dict, which contains Parameter information (such as lr_mult and wd_mult) will not be loaded from the file, but rather set based on current Trainer’s parameters.
-
gluonnlp.utils.
mkdir
(dirname)[source]¶ Create a directory.
- Parameters
dirname (str) – The name of the target directory to create.
-
gluonnlp.utils.
glob
(url, separator=', ')[source]¶ Return a list of paths matching a pathname pattern.
The pattern may contain simple shell-style wildcards. Input may also include multiple patterns, separated by separator.
-
gluonnlp.utils.
remove
(filename)[source]¶ Remove a file
- Parameters
filename (str) – The name of the target file to remove