gluonnlp.utils¶

GluonNLP Toolkit provides tools for easily setting up task specific loss.

File Handling¶

`glob`	Return a list of paths matching a pathname pattern.
`mkdir`	Create a directory.

Parameter and Training¶

clip_grad_global_norm

Rescales gradients of parameters so that the sum of their 2-norm is smaller than max_norm.

Serialization and Deserialization¶

`load_parameters`	Load parameters from file previously saved by save_parameters.
`load_states`	Loads trainer states (e.g.
`save_parameters`	Save parameters to file.
`save_states`	Saves trainer states (e.g.

Setting Seed¶

set_seed

Sets the seed for reproducibility

API Reference¶

Module for utility functions.

class gluonnlp.utils.Parallelizable[source]¶

Base class for parallelizable unit of work, which can be invoked by Parallel. The subclass must implement the forward_backward method, and be used together with Parallel. For example:

class ParallelNet(Parallelizable):
    def __init__(self):
        self._net = Model()
        self._loss = gluon.loss.SoftmaxCrossEntropyLoss()

    def forward_backward(self, x):
        data, label = x
        with mx.autograd.record():
            out = self._net(data)
            loss = self._loss(out, label)
        loss.backward()
        return loss

net = ParallelNet()
ctx = [mx.gpu(0), mx.gpu(1)]
parallel = Parallel(len(ctx), net)
# Gluon block is initialized after forwarding the first batch
initialized = False

for batch in batches:
    for x in gluon.utils.split_and_load(batch, ctx):
        parallel.put(x)
    losses = [parallel.get() for _ in ctx]
    trainer.step()

forward_backward(x)[source]¶: Forward and backward computation.

class gluonnlp.utils.Parallel(num_workers, parallizable, serial_init=True)[source]¶

Class for parallel processing with Parallelizable`s. It invokes a `Parallelizable with multiple Python threads. For example:

class ParallelNet(Parallelizable):
    def __init__(self):
        self._net = Model()
        self._loss = gluon.loss.SoftmaxCrossEntropyLoss()

    def forward_backward(self, x):
        data, label = x
        mx.autograd.record():
            out = self._net(data)
            loss = self._loss(out, label)
        loss.backward()
        return loss

net = ParallelNet()
ctx = [mx.gpu(0), mx.gpu(1)]
parallel = Parallel(len(ctx), net)

for batch in batches:
    for x in gluon.utils.split_and_load(batch, ctx):
        parallel.put(x)
    losses = [parallel.get() for _ in ctx]
    trainer.step()

Parameters

num_workers (int) – Number of worker threads. If set to 0, the main thread is used as the worker for debugging purpose.
parallelizable – Parallelizable net whose forward and backward methods are invoked by multiple worker threads.
serial_init (bool, default True) – Execute the first num_workers inputs in main thread, so that the Block used in parallizable is initialized serially. Initialize a Block with multiple threads may cause unexpected behavior.

get()[source]¶: Get an output of previous parallizable.forward_backward calls. This method blocks if none of previous parallizable.forward_backward calls have return any result.

put(x)[source]¶: Assign input x to an available worker and invoke parallizable.forward_backward with x.

gluonnlp.utils.grad_global_norm(parameters, max_norm=None)[source]¶

Calculate the 2-norm of gradients of parameters, and how much they should be scaled down such that their 2-norm does not exceed max_norm, if max_norm if provided.

If gradients exist for more than one context for a parameter, user needs to explicitly call trainer.allreduce_grads so that the gradients are summed first before calculating the 2-norm.

Note

This function is only for use when update_on_kvstore is set to False in trainer.

Example:

trainer = Trainer(net.collect_params(), update_on_kvstore=False, ...)
for x, y in mx.gluon.utils.split_and_load(X, [mx.gpu(0), mx.gpu(1)]):
    with mx.autograd.record():
        y = net(x)
        loss = loss_fn(y, label)
    loss.backward()
trainer.allreduce_grads()
norm = grad_global_norm(net.collect_params().values())
...

Parameters

parameters (list of Parameters) –
max_norm (NDArray, optional) – The maximum L2 norm threshold. If provided, ratio and is_finite will be returned.

Returns

NDArray – Total norm. Shape is (1,)
NDArray – Ratio for rescaling gradients based on max_norm s.t. grad = grad / ratio. If total norm is NaN, ratio will be NaN, too. Returned if max_norm is provided. Shape is (1,)
NDArray – Whether the total norm is finite, returned if max_norm is provided. Shape is (1,)

gluonnlp.utils.clip_grad_global_norm(parameters, max_norm, check_isfinite=True)[source]¶

Rescales gradients of parameters so that the sum of their 2-norm is smaller than max_norm. If gradients exist for more than one context for a parameter, user needs to explicitly call trainer.allreduce_grads so that the gradients are summed first before calculating the 2-norm.

Note

This function is only for use when update_on_kvstore is set to False in trainer. In cases where training happens on multiple contexts, this method should be used in conjunction with trainer.allreduce_grads() and trainer.update(). (not trainer.step())

Example:

trainer = Trainer(net.collect_params(), update_on_kvstore=False, ...)
for x, y in mx.gluon.utils.split_and_load(X, [mx.gpu(0), mx.gpu(1)]):
    with mx.autograd.record():
        y = net(x)
        loss = loss_fn(y, label)
    loss.backward()
trainer.allreduce_grads()
nlp.utils.clip_grad_global_norm(net.collect_params().values(), max_norm)
trainer.update(batch_size)
...

Parameters

parameters (list of Parameters) –
max_norm (float) –
check_isfinite (bool, default True) – If True, check that the total_norm is finite (not nan or inf). This requires a blocking .asscalar() call.

Returns

Total norm. Return type is NDArray of shape (1,) if check_isfinite is False. Otherwise a float is returned.

Return type

NDArray or float

gluonnlp.utils.save_parameters(model, filename)[source]¶

Save parameters to file.

Saved parameters can only be loaded with Block.load_parameters. Note that this method only saves parameters, not model structure.

Both local file system path and S3 URI are supported. For example, ‘s3://mybucket/folder/net.params’, ‘./folder/net.params’.

Parameters

model (mx.gluon.Block) – The model to save.
uri (str) – Path to file.

gluonnlp.utils.save_states(trainer, fname)[source]¶

Saves trainer states (e.g. optimizer, momentum) to a file.

Both local file system path and S3 URI are supported. For example, ‘s3://mybucket/folder/net.states’, ‘./folder/net.states’.

Parameters

trainer (mxnet.gluon.Trainer) – The trainer whose states will be saved.
fname (str) – Path to output states file.

Note

optimizer.param_dict, which contains Parameter information (such as lr_mult and wd_mult) will not be saved.

gluonnlp.utils.load_parameters(model, filename, ctx=None, allow_missing=False, ignore_extra=False, cast_dtype=None)[source]¶

Load parameters from file previously saved by save_parameters.

Both local file system path and S3 URI are supported. For example, ‘s3://mybucket/folder/net.params’, ‘./folder/net.params’.

Parameters

filename (str) – Path to parameter file.
ctx (Context or list of Context, default cpu()) – Context(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

gluonnlp.utils.load_states(trainer, fname)[source]¶

Loads trainer states (e.g. optimizer, momentum) from a file.

Both local file system path and S3 URI are supported. For example, ‘s3://mybucket/folder/net.states’, ‘./folder/net.states’.

Parameters

trainer (mxnet.gluon.Trainer) – The trainer whose states will be loaded.
fname (str) – Path to input states file.

Note

optimizer.param_dict, which contains Parameter information (such as lr_mult and wd_mult) will not be loaded from the file, but rather set based on current Trainer’s parameters.

gluonnlp.utils.mkdir(dirname)[source]¶

Create a directory.

Parameters: dirname (str) – The name of the target directory to create.

gluonnlp.utils.glob(url, separator=', ')[source]¶

Return a list of paths matching a pathname pattern.

The pattern may contain simple shell-style wildcards. Input may also include multiple patterns, separated by separator.

Parameters

url (str) – The name of the files
separator (str, default is ',') – The separator in url to allow multiple patterns in the input

gluonnlp.utils.remove(filename)[source]¶

Remove a file

Parameters: filename (str) – The name of the target file to remove

gluonnlp.utils.check_version(min_version, warning_only=False, library=None)[source]¶

Check the version of gluonnlp satisfies the provided minimum version. An exception is thrown if the check does not pass.

Parameters

min_version (str) – Minimum version
warning_only (bool) – Printing a warning instead of throwing an exception.
library (optional module, default None) – The target library for version check. Checks gluonnlp by default

gluonnlp.utils.set_seed(seed=0)[source]¶

Sets the seed for reproducibility

Parameters: seed (int) – Value of the seed to set