Model Catalog¶

Language Model¶

Word Language Model¶

Dataset: Wikitext-2

Pre-trained Model	Test Perplexity	Training Command	log
standard_lstm_lm_200_wikitext-2 1	101.64	command	log
standard_lstm_lm_650_wikitext-2 1	86.91	command	log
standard_lstm_lm_1500_wikitext-2 1	82.29	command	log
awd_lstm_lm_600_wikitext-2 1	80.67	command	log
awd_lstm_lm_1150_wikitext-2 1	65.62	command	log

Cache Language Model¶

Dataset: Wikitext-2

Pre-trained Model	Test Perplexity	Training Command	log
cache_awd_lstm_lm_1150_wikitext-2 2	51.46	command	log
cache_awd_lstm_lm_600_wikitext-2 2	62.19	command	log
cache_standard_lstm_lm_1500_wikitext-2 2	62.79	command	log
cache_standard_lstm_lm_650_wikitext-2 2	65.85	command	log
cache_standard_lstm_lm_200_wikitext-2 2	73.74	command	log

Large Scale Word Language Model¶

Dataset: Google’s 1 billion words dataset

Pre-trained Model	Test Perplexity	Training Command	log
LSTM-2048-512 3	43.62	command	log

Machine Translation¶

Machine Translation Model Zoo Index

Google Neural Machine Translation¶

Dataset: IWLST2015-en-vi

Pre-trained Model	Test BLEU	Training Command	log
GNMT 4	26.2	command

Transformers¶

Dataset: WMT14-en-de Requisite: sacremoses package: pip install scaremoses –user

Pre-trained Model	Test BLEU	Training Command	log
transformer_en_de_512_WMT2014	27.65	command	log

Sentiment Analysis¶

Sentiment Analysis Model Zoo Index

Through Fine-tuning Word Language Model¶

Dataset: IMDB

Model	Test Accuracy	Training Command	log
lstm from scratch	85.60%	command	log
lstm with pre-trained model	86.46%	command	log

TextCNN¶

Dataset: MR

Model	Cross-Validation Accuracy	Training Command	Log
TextCNN-rand 5	75.80%	command	log
TextCNN-static 5	79.40%	command	log
TextCNN-non-static 5	80.00%	command	log
TextCNN-multichannel 5	80.00%	command	log

Dataset: Subj

Model	Cross-Validation Accuracy	Training Command	Log
TextCNN-rand 5	89.30%	command	log
TextCNN-static 5	91.80%	command	log
TextCNN-non-static 5	91.90%	command	log
TextCNN-multichannel 5	92.10%	command	log

Dataset: CR

Model	Cross-Validation Accuracy	Training Command	Log
TextCNN-rand 5	79.50%	command	log
TextCNN-static 5	83.10%	command	log
TextCNN-non-static 5	82.90%	command	log
TextCNN-multichannel 5	83.30%	command	log

Dataset: MPQA

Model	Cross-Validation Accuracy	Training Command	Log
TextCNN-rand 5	85.30%	command	log
TextCNN-static 5	89.60%	command	log
TextCNN-non-static 5	89.20%	command	log
TextCNN-multichannel 5	89.60%	command	log

Dataset: SST-1

Model	Cross-Validation Accuracy	Training Command	Log
TextCNN-rand 5	44.30%	command	log
TextCNN-static 5	48.10%	command	log
TextCNN-non-static 5	47.00%	command	log
TextCNN-multichannel 5	48.10%	command	log

Dataset: SST-2

Model	Cross-Validation Accuracy	Training Command	Log
TextCNN-rand 5	82.10%	command	log
TextCNN-static 5	87.10%	command	log
TextCNN-non-static 5	85.60%	command	log
TextCNN-multichannel 5	85.80%	command	log

Dataset: TREC

Model	Cross-Validation Accuracy	Training Command	Log
TextCNN-rand 5	90.20%	command	log
TextCNN-static 5	91.40%	command	log
TextCNN-non-static 5	93.20%	command	log
TextCNN-multichannel 5	93.20%	command	log

Finetuning¶

BERT Model Zoo Index

Task: Sentence Classification¶

Dataset: MRPC

Pretrained Model	Validation Accuracy	Training Command	Log
BERT-base	88.70%	command	log

Dataset: RTE

Pretrained Model	Validation Accuracy	Training Command	Log
BERT-base	70.80%	command	log

Dataset: SST-2

Pretrained Model	Validation Accuracy	Training Command	Log
BERT-base	93%	command	log
RoBERTa-base	95.3%	command	log

Dataset: MNLI-M/MM

Pretrained Model	Validation Accuracy	Training Command	Log
BERT-base	84.55%/84.66%	command	log
RoBERTa-base	87.69%/87.23%	command	log

Dataset: XNLI(Chinese)

Pretrained Model	Validation Accuracy	Training Command	Log
BERT-base	78.27%	command	log

Task: Question Answering¶

Dataset: SQuAD 1.1

Pretrained Model	F1/EM	Training Command	Log
BERT-base	88.53%/80.98%	command	log
BERT-large	90.97%/84.05%	command	log

Dataset: SQuAD 2.0

Pretrained Model	F1/EM	Training Command	Log
BERT-large	77.96%/81.02%	command	log

Task: Named Entity Recognition¶

Requisite: python3 and seqeval package: pip3 install seqeval –user

Dataset: CoNLL-2003

Pretrained Model	F1	Training Command	Log
BERT-large	92.20%		log

Task: Joint Intent Classification and Slot Labelling¶

Requisite: python3 and seqeval & tqdm packages: pip3 install seqeval –user and pip3 install tqdm –user

Dataset: ATIS

Pretrained Model	F1/Accuracy	Training Command	Log
BERT-base	95.83%/98.66%

Dataset: SNIPS

Pretrained Model	F1/Accuracy	Training Command	Log
BERT-base	96.06%/98.71%

1(1,2,3,4,5): Merity, S., et al. “Regularizing and optimizing LSTM language models”. ICLR 2018
2(1,2,3,4,5): Grave, E., et al. “Improving neural language models with a continuous cache”.ICLR 2017
3: Jozefowicz, Rafal, et al. “Exploring the limits of language modeling”.arXiv preprint arXiv:1602.02410 (2016).
4: Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., … & Klingner, J. (2016). “Google’s neural machine translation system: Bridging the gap between human and machine translation.”. arXiv preprint arXiv:1609.08144.
5(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28): Kim, Y. (2014). “Convolutional neural networks for sentence classification”. arXiv preprint arXiv:1408.5882.