Model Catalog

Language Model

Language Model Model Zoo Index

Word Language Model

Dataset: Wikitext-2

Pre-trained Model

Test Perplexity

Training Command

log

standard_lstm_lm_200_wikitext-2 1

101.64

command

log

standard_lstm_lm_650_wikitext-2 1

86.91

command

log

standard_lstm_lm_1500_wikitext-2 1

82.29

command

log

awd_lstm_lm_600_wikitext-2 1

80.67

command

log

awd_lstm_lm_1150_wikitext-2 1

65.62

command

log

Cache Language Model

Dataset: Wikitext-2

Pre-trained Model

Test Perplexity

Training Command

log

cache_awd_lstm_lm_1150_wikitext-2 2

51.46

command

log

cache_awd_lstm_lm_600_wikitext-2 2

62.19

command

log

cache_standard_lstm_lm_1500_wikitext-2 2

62.79

command

log

cache_standard_lstm_lm_650_wikitext-2 2

65.85

command

log

cache_standard_lstm_lm_200_wikitext-2 2

73.74

command

log

Large Scale Word Language Model

Dataset: Google’s 1 billion words dataset

Pre-trained Model

Test Perplexity

Training Command

log

LSTM-2048-512 3

43.62

command

log

Machine Translation

Machine Translation Model Zoo Index

Google Neural Machine Translation

Dataset: IWLST2015-en-vi

Pre-trained Model

Test BLEU

Training Command

log

GNMT 4

26.2

command

Transformers

Dataset: WMT14-en-de Requisite: sacremoses package: pip install scaremoses –user

Pre-trained Model

Test BLEU

Training Command

log

transformer_en_de_512_WMT2014

27.65

command

log

Sentiment Analysis

Sentiment Analysis Model Zoo Index

Through Fine-tuning Word Language Model

Dataset: IMDB

Model

Test Accuracy

Training Command

log

lstm from scratch

85.60%

command

log

lstm with pre-trained model

86.46%

command

log

TextCNN

Dataset: MR

Model

Cross-Validation Accuracy

Training Command

Log

TextCNN-rand 5

75.80%

command

log

TextCNN-static 5

79.40%

command

log

TextCNN-non-static 5

80.00%

command

log

TextCNN-multichannel 5

80.00%

command

log

Dataset: Subj

Model

Cross-Validation Accuracy

Training Command

Log

TextCNN-rand 5

89.30%

command

log

TextCNN-static 5

91.80%

command

log

TextCNN-non-static 5

91.90%

command

log

TextCNN-multichannel 5

92.10%

command

log

Dataset: CR

Model

Cross-Validation Accuracy

Training Command

Log

TextCNN-rand 5

79.50%

command

log

TextCNN-static 5

83.10%

command

log

TextCNN-non-static 5

82.90%

command

log

TextCNN-multichannel 5

83.30%

command

log

Dataset: MPQA

Model

Cross-Validation Accuracy

Training Command

Log

TextCNN-rand 5

85.30%

command

log

TextCNN-static 5

89.60%

command

log

TextCNN-non-static 5

89.20%

command

log

TextCNN-multichannel 5

89.60%

command

log

Dataset: SST-1

Model

Cross-Validation Accuracy

Training Command

Log

TextCNN-rand 5

44.30%

command

log

TextCNN-static 5

48.10%

command

log

TextCNN-non-static 5

47.00%

command

log

TextCNN-multichannel 5

48.10%

command

log

Dataset: SST-2

Model

Cross-Validation Accuracy

Training Command

Log

TextCNN-rand 5

82.10%

command

log

TextCNN-static 5

87.10%

command

log

TextCNN-non-static 5

85.60%

command

log

TextCNN-multichannel 5

85.80%

command

log

Dataset: TREC

Model

Cross-Validation Accuracy

Training Command

Log

TextCNN-rand 5

90.20%

command

log

TextCNN-static 5

91.40%

command

log

TextCNN-non-static 5

93.20%

command

log

TextCNN-multichannel 5

93.20%

command

log

Finetuning

BERT Model Zoo Index

Task: Sentence Classification

Dataset: MRPC

Pretrained Model

Validation Accuracy

Training Command

Log

BERT-base

88.70%

command

log

Dataset: RTE

Pretrained Model

Validation Accuracy

Training Command

Log

BERT-base

70.80%

command

log

Dataset: SST-2

Pretrained Model

Validation Accuracy

Training Command

Log

BERT-base

93%

command

log

RoBERTa-base

95.3%

command

log

Dataset: MNLI-M/MM

Pretrained Model

Validation Accuracy

Training Command

Log

BERT-base

84.55%/84.66%

command

log

RoBERTa-base

87.69%/87.23%

command

log

Dataset: XNLI(Chinese)

Pretrained Model

Validation Accuracy

Training Command

Log

BERT-base

78.27%

command

log

Task: Question Answering

Dataset: SQuAD 1.1

Pretrained Model

F1/EM

Training Command

Log

BERT-base

88.53%/80.98%

command

log

BERT-large

90.97%/84.05%

command

log

Dataset: SQuAD 2.0

Pretrained Model

F1/EM

Training Command

Log

BERT-large

77.96%/81.02%

command

log

Task: Named Entity Recognition

Requisite: python3 and seqeval package: pip3 install seqeval –user

Dataset: CoNLL-2003

Pretrained Model

F1

Training Command

Log

BERT-large

92.20%

log

Task: Joint Intent Classification and Slot Labelling

Requisite: python3 and seqeval & tqdm packages: pip3 install seqeval –user and pip3 install tqdm –user

Dataset: ATIS

Pretrained Model

F1/Accuracy

Training Command

Log

BERT-base

95.83%/98.66%

Dataset: SNIPS

Pretrained Model

F1/Accuracy

Training Command

Log

BERT-base

96.06%/98.71%

1(1,2,3,4,5)

Merity, S., et al. “Regularizing and optimizing LSTM language models”. ICLR 2018

2(1,2,3,4,5)

Grave, E., et al. “Improving neural language models with a continuous cache”.ICLR 2017

3

Jozefowicz, Rafal, et al. “Exploring the limits of language modeling”.arXiv preprint arXiv:1602.02410 (2016).

4

Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., … & Klingner, J. (2016). “Google’s neural machine translation system: Bridging the gap between human and machine translation.”. arXiv preprint arXiv:1609.08144.

5(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28)

Kim, Y. (2014). “Convolutional neural networks for sentence classification”. arXiv preprint arXiv:1408.5882.