Dependency Parsing¶
Deep Biaffine Dependency Parser¶
This package contains an implementation of Deep Biaffine Attention for Neural Dependency Parsing proposed by Dozat and Manning (2016), with SOTA accuracy.
Train¶
As the Penn Treebank dataset (PTB) is proprietary, we are unable to distribute it.
If you have a legal copy, please place it in tests/data/biaffine/ptb
, use this pre-processing script to convert it into conllx format.
The tree view of data folder should be as follows.
$ tree tests/data/biaffine
tests/data/biaffine
└── ptb
├── dev.conllx
├── test.conllx
└── train.conllx
Then Run the following code to train the biaffine model.
parser = DepParser()
parser.train(train_file='tests/data/biaffine/ptb/train.conllx',
dev_file='tests/data/biaffine/ptb/dev.conllx',
test_file='tests/data/biaffine/ptb/test.conllx', save_dir='tests/data/biaffine/model',
pretrained_embeddings=('glove', 'glove.6B.100d'))
parser.evaluate(test_file='tests/data/biaffine/ptb/test.conllx', save_dir='tests/data/biaffine/model')
The expected UAS should be around 96%
(see training log and evaluation log). The trained model will be saved in following folder.
$ tree tests/data/biaffine/model
tests/data/biaffine/model
├── config.pkl
├── model.bin
├── test.log
├── train.log
└── vocab.pkl
Note that the embeddings are not kept in model.bin
, in order to reduce file size.
Users need to keep embeddings at the same place after training.
A good practice is to place embeddings in the model folder and distribute them together.
Decode¶
Once we trained a model or downloaded a pre-trained one, we can load it and decode raw sentences.
parser = DepParser()
parser.load('tests/data/biaffine/model')
sentence = [('Is', 'VBZ'), ('this', 'DT'), ('the', 'DT'), ('future', 'NN'), ('of', 'IN'), ('chamber', 'NN'),
('music', 'NN'), ('?', '.')]
print(parser.parse(sentence))
The output should be as follows.
1 Is _ _ VBZ _ 4 cop _ _
2 this _ _ DT _ 4 nsubj _ _
3 the _ _ DT _ 4 det _ _
4 future _ _ NN _ 0 root _ _
5 of _ _ IN _ 4 prep _ _
6 chamber _ _ NN _ 7 nn _ _
7 music _ _ NN _ 5 pobj _ _
8 ? _ _ . _ 4 punct _ _