Evaluating Pre-trained Word Embeddings¶
Word embeddings can be evaluated on intrinsic and extrinsic tasks.
gluonnlp
facilitates the work with both of them by providing common
datasets and helpful abstractions. In this notebook we show how to
evaluate embeddings on the intrinsic similarity and analogy tasks.
The used GloVe and fastText word embeddings in this tutorial are from the following sources:
- GloVe project website:https://nlp.stanford.edu/projects/glove/
- fastText project website:https://fasttext.cc/
Let us first import the following packages.
In [1]:
import warnings
warnings.filterwarnings('ignore')
import mxnet as mx
import gluonnlp as nlp
Intrinsic evaluation¶
While word embeddings are in industry mainly interesting for their use in improving performance in downstream tasks, direct evaluation on those tasks may be expensive and infeasible while experimenting with a large number of embeddings. Evaluation of word embeddings on such downstream tasks is called extrinsic evaluation.
Intrinsic evaluation tasks on the contrary aim to judge the quality of word embeddings directly.
Word Similarity and Relatedness Task¶
Word embeddings should capture the relationsship between words in natural language. In the Word Similarity and Relatedness Task word embeddings are evaluated by comparing word similarity scores computed from a pair of words with human labels for the similarity or relatedness of the pair.
gluonnlp
includes a number of common datasets for the Word
Similarity and Relatedness Task. The included datasets are listed in the
API
documentation.
We use several of them in the evaluation example below.
We first show a few samples from the WordSim353 dataset, to get an overall feeling of the Dataset structur
In [2]:
wordsim353 = nlp.data.WordSim353()
for i in range(15):
print(*wordsim353[i])
computer keyboard 7.62
Jerusalem Israel 8.46
planet galaxy 8.11
canyon landscape 7.53
OPEC country 5.63
day summer 3.94
day dawn 7.53
country citizen 7.31
planet people 5.75
environment ecology 8.81
Maradona football 8.62
OPEC oil 8.59
money bank 8.5
computer software 8.5
law lawyer 8.38
Evaluation: Loading the embeddings¶
To evaluate word embeddings on the WordSim353 dataset, we first load pretrained embeddings and construct a vocabulary object. Here we load the fasttext word embeddings created from the crawl-300d-2M source. As they are quite large, executing the following cell may take a minute or two.
In [3]:
embedding = nlp.embedding.create('fasttext', source='crawl-300d-2M')
In [4]:
counter = nlp.data.utils.Counter(w for wpair in wordsim353 for w in wpair[:2])
vocab = nlp.vocab.Vocab(counter)
vocab.set_embedding(embedding)
We then replace the words in the WordSim353 dataset with indices from the vocabulary.
In [5]:
wordsim353_coded = [[vocab[d[0]], vocab[d[1]], d[2]] for d in wordsim353]
words1, words2, scores = zip(*wordsim353_coded)
Evaluation: Running the task¶
The gluonnlp
toolkit contains helpers for evaluation word embeddings
on the word similarity and relatedness task.
In the following we create a WordEmbeddingSimilarity
block, which
predicts similarity score between word pairs given an embedding matrix.
In [6]:
# context = mx.cpu() # Replace this with mx.gpu(0) if you got a GPU
context = mx.gpu(0) # Replace this with mx.cpu() if you got no GPU
evaluator = nlp.embedding.evaluation.WordEmbeddingSimilarity(
idx_to_vec=vocab.embedding.idx_to_vec,
similarity_function="CosineSimilarity")
evaluator.initialize(ctx=context)
evaluator.hybridize()
The similarities can be predicted by passing the two arrays of words
through the evaluator. Thereby the ith word in words1
will be
compared with the ith word in words2
.
In [7]:
pred_similarity = evaluator(
mx.nd.array(words1, ctx=context), mx.nd.array(words2, ctx=context))
print(pred_similarity[:10])
[0.4934404 0.69630307 0.5902223 0.31201977 0.16985895 0.3822252
0.42938995 0.36722115 0.22559652 0.51560944]
<NDArray 10 @gpu(0)>
We can evaluate the predicted similarities, and thereby the word embeddings, by computing the Spearman Rank Correlation between the predicted similarities and the groundtruth, human, similarity scores from the dataset:
In [8]:
import numpy as np
from scipy import stats
sr = stats.spearmanr(pred_similarity.asnumpy(), np.array(scores))
print('Spearman rank correlation on {}: {}'.format(wordsim353.__class__.__name__,
sr.correlation.round(3)))
Spearman rank correlation on WordSim353: 0.792
Word Analogy Task¶
In the Word Analogy Task word embeddings are evaluated by inferring an
analogous word D
, which is related to a given word C
in the same
way as a given pair of words A, B
are related.
gluonnlp
includes a number of common datasets for the Word Analogy
Task. The included datasets are listed in the API
documentation.
In this notebook we use the GoogleAnalogyTestSet dataset.
In [9]:
google_analogy = nlp.data.GoogleAnalogyTestSet()
We first demonstrate the structure of the dataset by printing a few examples
In [10]:
sample = []
print(('Printing every 1000st analogy question '
'from the {} questions'
'in the Google Analogy Test Set:').format(len(google_analogy)))
print('')
for i in range(0, 19544, 1000):
print(*google_analogy[i])
sample.append(google_analogy[i])
Printing every 1000st analogy question from the 19544 questionsin the Google Analogy Test Set:
athens greece baghdad iraq
baku azerbaijan dushanbe tajikistan
dublin ireland kathmandu nepal
lusaka zambia tehran iran
rome italy windhoek namibia
zagreb croatia astana kazakhstan
philadelphia pennsylvania tampa florida
wichita kansas shreveport louisiana
shreveport louisiana oxnard california
complete completely lucky luckily
comfortable uncomfortable clear unclear
good better high higher
young younger tight tighter
weak weakest bright brightest
slow slowing describe describing
ireland irish greece greek
feeding fed sitting sat
slowing slowed decreasing decreased
finger fingers onion onions
play plays sing sings
In [11]:
words1, words2, words3, words4 = list(zip(*sample))
We again construct a vocabulary object from the loaded pretrained embeddings. To speed up computation, we restrict ourselves here to the most frequent 300000 words in the vocabulary.
In [12]:
counter = nlp.data.utils.Counter(embedding.idx_to_token[:300000])
vocab = nlp.vocab.Vocab(counter)
vocab.set_embedding(embedding)
We then throw away all analogy questions that contain words not in the frequent words subset selected above.
In [13]:
google_analogy_subset = [
d for d in google_analogy if (d[0] in vocab and d[1] in vocab
and d[2] in vocab and d[3] in vocab)
]
print('Dropped {} pairs from {} as the were OOV.'.format(
len(google_analogy) - len(google_analogy_subset),
len(google_analogy)))
Dropped 5108 pairs from 19544 as the were OOV.
In [14]:
google_analogy_coded = [[vocab[d[0]], vocab[d[1]], vocab[d[2]], vocab[d[3]]]
for d in google_analogy_subset]
google_analogy_coded_batched = mx.gluon.data.DataLoader(
google_analogy_coded, batch_size=64)
In [15]:
evaluator = nlp.embedding.evaluation.WordEmbeddingAnalogy(
idx_to_vec=vocab.embedding.idx_to_vec,
exclude_question_words=True,
analogy_function="ThreeCosMul")
evaluator.initialize(ctx=context)
evaluator.hybridize()
To show a visual progressbar, make sure the progressbar2
package is
installed. You can remove the #
from below cell to optionally
install it.
In [16]:
#! pip install --user progressbar2
In [17]:
try:
import progressbar
except:
progressbar = None
acc = mx.metric.Accuracy()
if progressbar is not None:
google_analogy_coded_batched = progressbar.progressbar(google_analogy_coded_batched)
for batch in google_analogy_coded_batched:
batch = batch.as_in_context(context)
words1, words2, words3, words4 = (batch[:, 0], batch[:, 1],
batch[:, 2], batch[:, 3])
pred_idxs = evaluator(words1, words2, words3)
acc.update(pred_idxs[:, 0], words4.astype(np.float32))
print('Accuracy on %s: %s'% (google_analogy.__class__.__name__, acc.get()[1].round(3)))
100% (226 of 226) |######################| Elapsed Time: 0:00:32 Time: 0:00:32
Accuracy on GoogleAnalogyTestSet: 0.794
Aggregated Results on all datasets¶
We have precomputed the results on the similarity and analogy tasks on
all respective datasets and all pretrained embeddings (targeted at
English) included in the Gluon NLP toolkit. If you are interested in
reproducing the results, please run the run_all.sh
bash script in
the scripts/word_embeddings_evaluation
folder. That folder also
contains a notebook with extended, unaggregated results that detail the
performance of the different embeddings on each category in the
datasets.
We first load the CSV file containing the results and define a highlighter function that will help us to highlight the best-performinging method per dataset.
In [18]:
import pandas as pd
pd.options.display.max_rows = 999
pd.options.display.precision = 3
df = pd.read_table("../../../scripts/word_embedding_evaluation/results-vocablimit.csv",
header=None, names=[
"evaluation_type", "dataset", "kwargs", "embedding_name",
"embedding_source", "evaluation", "value", "num_samples"
])
Similarity task¶
We then select the results from the similarity task and generate a table. To keep this page concise, we report the mean value over all datasets. Please see the extended results notebook at the Scripts page for detailed results.
In [19]:
dfs = df[~df["dataset"].isin(["BiggerAnalogyTestSet", "GoogleAnalogyTestSet"])].drop(["evaluation_type", "evaluation", "num_samples"], axis=1)
dfs = dfs[dfs["embedding_source"].isin([
"glove.42B.300d",
"glove.6B.100d",
"glove.6B.200d",
"glove.6B.300d",
"glove.6B.50d",
"glove.840B.300d",
"glove.twitter.27B.100d",
"glove.twitter.27B.200d",
"glove.twitter.27B.25d",
"glove.twitter.27B.50d",
"wiki.en",
"wiki.simple",
"crawl-300d-2M",
"wiki-news-300d-1M",
"wiki-news-300d-1M-subword"
])]
dfs = dfs.groupby(["embedding_name", "embedding_source"]).mean()
dfs.sort_values(by='value', ascending=False)
Out[19]:
value | ||
---|---|---|
embedding_name | embedding_source | |
fasttext | crawl-300d-2M | 0.690 |
wiki-news-300d-1M-subword | 0.658 | |
wiki-news-300d-1M | 0.649 | |
glove | glove.840B.300d | 0.629 |
fasttext | wiki.en | 0.569 |
glove | glove.42B.300d | 0.520 |
glove.6B.300d | 0.518 | |
glove.6B.200d | 0.495 | |
fasttext | wiki.simple | 0.476 |
glove | glove.6B.100d | 0.464 |
glove.6B.50d | 0.432 | |
glove.twitter.27B.200d | 0.373 | |
glove.twitter.27B.100d | 0.356 | |
glove.twitter.27B.50d | 0.323 | |
glove.twitter.27B.25d | 0.253 |
Analogy task¶
For the analogy task, we report the aggregate results per category type in the datasets.
Note that the analogy task is a open vocabulary task: Given a query of 3 words, we ask the model to select a 4th word from the whole vocabulary. Different pre-trained embeddings have vocabularies of different size. In general the vocabulary of embeddings pretrained on more tokens (indicated by a bigger number before the B in the embedding source name) include more tokens in their vocabulary. While training embeddings on more tokens improves their quality, the larger vocabulary also makes the analogy task harder.
In this experiment all results are reported with reducing the vocabulary to the 300k most frequent tokens. Questions containing Out Of Vocabulary words are ignored.
Google Analogy Test Set¶
We first display the results on the Google Analogy Test Set.
- Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of the International Conference on Learning Representations (ICLR).
The Google Analogy Test Set contains the following categories. All analogy questions per category follow the pattern specified by the category name. We group them into semantic and syntactic analogy questions.
In [20]:
import json
pd.Series(df[df["dataset"] == "GoogleAnalogyTestSet"]["kwargs"].unique()).apply(
json.loads).apply(lambda x: x['category'])
Out[20]:
0 capital-common-countries
1 capital-world
2 currency
3 city-in-state
4 family
5 gram1-adjective-to-adverb
6 gram2-opposite
7 gram3-comparative
8 gram4-superlative
9 gram5-present-participle
10 gram6-nationality-adjective
11 gram7-past-tense
12 gram8-plural
13 gram9-plural-verbs
dtype: object
We first load the results from the output of the
word_embedding_evaluation.py
script.
In [21]:
dfa_google = df[df["dataset"] == "GoogleAnalogyTestSet"].drop(
["evaluation_type", "num_samples", "dataset"], axis=1)
dfa_google = dfa_google[dfa_google["embedding_source"].isin([
"glove.42B.300d",
"glove.6B.100d",
"glove.6B.200d",
"glove.6B.300d",
"glove.6B.50d",
"glove.840B.300d",
"glove.twitter.27B.100d",
"glove.twitter.27B.200d",
"glove.twitter.27B.25d",
"glove.twitter.27B.50d",
"wiki.en",
"wiki.simple",
"crawl-300d-2M",
"wiki-news-300d-1M",
"wiki-news-300d-1M-subword",
])]
dfa_google["category"] = dfa_google["kwargs"].apply(json.loads).apply(lambda x: str(x['category']))
dfa_google.drop("kwargs", axis=1, inplace=True)
groups = dfa_google["category"].apply(lambda x: "syntactic" if x.startswith("gram") else "semantic")
dfa_google_aggregate = dfa_google.drop("category", axis=1)
dfa_google_aggregate["group"] = groups
google_aggregate = dfa_google_aggregate.groupby(["group", "embedding_name", "embedding_source", "evaluation"]).mean()
google_aggregate = google_aggregate.sort_values(by='value', ascending=False).sort_index(level=[0], sort_remaining=False)
Syntactic¶
We first present aggregate results over syntactic analogy questions.
In [22]:
google_aggregate.loc["syntactic"]
Out[22]:
value | |||
---|---|---|---|
embedding_name | embedding_source | evaluation | |
fasttext | wiki-news-300d-1M-subword | threecosmul | 0.871 |
threecosadd | 0.863 | ||
wiki-news-300d-1M | threecosmul | 0.809 | |
threecosadd | 0.794 | ||
crawl-300d-2M | threecosmul | 0.787 | |
threecosadd | 0.764 | ||
glove | glove.840B.300d | threecosmul | 0.728 |
fasttext | wiki.en | threecosmul | 0.724 |
glove | glove.42B.300d | threecosmul | 0.702 |
fasttext | wiki.en | threecosadd | 0.701 |
glove | glove.840B.300d | threecosadd | 0.700 |
glove.42B.300d | threecosadd | 0.670 | |
glove.6B.300d | threecosmul | 0.654 | |
threecosadd | 0.634 | ||
glove.6B.200d | threecosadd | 0.625 | |
threecosmul | 0.622 | ||
fasttext | wiki.simple | threecosmul | 0.596 |
glove | glove.6B.100d | threecosadd | 0.579 |
fasttext | wiki.simple | threecosadd | 0.552 |
glove | glove.6B.100d | threecosmul | 0.545 |
glove.twitter.27B.200d | threecosmul | 0.536 | |
threecosadd | 0.529 | ||
glove.twitter.27B.100d | threecosadd | 0.467 | |
threecosmul | 0.436 | ||
glove.6B.50d | threecosadd | 0.405 | |
threecosmul | 0.322 | ||
glove.twitter.27B.50d | threecosadd | 0.319 | |
threecosmul | 0.271 | ||
glove.twitter.27B.25d | threecosadd | 0.135 | |
threecosmul | 0.102 |
Semantic¶
We then present aggregate results over semantic analogy questions.
In [23]:
google_aggregate.loc["semantic"]
Out[23]:
value | |||
---|---|---|---|
embedding_name | embedding_source | evaluation | |
glove | glove.42B.300d | threecosmul | 0.751 |
threecosadd | 0.747 | ||
glove.6B.300d | threecosmul | 0.712 | |
fasttext | wiki.en | threecosmul | 0.711 |
glove | glove.6B.300d | threecosadd | 0.708 |
fasttext | wiki.en | threecosadd | 0.703 |
glove | glove.6B.200d | threecosadd | 0.684 |
threecosmul | 0.676 | ||
glove.6B.100d | threecosadd | 0.619 | |
threecosmul | 0.589 | ||
glove.840B.300d | threecosmul | 0.580 | |
threecosadd | 0.574 | ||
fasttext | crawl-300d-2M | threecosmul | 0.569 |
threecosadd | 0.560 | ||
glove | glove.6B.50d | threecosadd | 0.481 |
glove.twitter.27B.200d | threecosadd | 0.439 | |
threecosmul | 0.427 | ||
fasttext | wiki-news-300d-1M | threecosmul | 0.404 |
threecosadd | 0.401 | ||
glove | glove.6B.50d | threecosmul | 0.400 |
fasttext | wiki-news-300d-1M-subword | threecosmul | 0.349 |
threecosadd | 0.348 | ||
glove | glove.twitter.27B.100d | threecosadd | 0.324 |
threecosmul | 0.293 | ||
fasttext | wiki.simple | threecosmul | 0.261 |
threecosadd | 0.205 | ||
glove | glove.twitter.27B.50d | threecosadd | 0.188 |
threecosmul | 0.155 | ||
glove.twitter.27B.25d | threecosadd | 0.108 | |
threecosmul | 0.080 |
Bigger Analogy Test Set¶
We then display the results on the Bigger Analogy Test Set (BATS).
- Gladkova, A., Drozd, A., & Matsuoka, S. (2016). Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. In Proceedings of the NAACL-HLT SRW (pp. 47–54). San Diego, California, June 12-17, 2016: ACL. Retrieved from https://www.aclweb.org/anthology/N/N16/N16-2002.pdf
Unlike the Google Analogy Test Set, BATS is balanced across 4 types of relations (inflectional morphology, derivational morphology, lexicographic semantics, encyclopedic semantics).
We first load the results for the BATS dataset:
In [24]:
dfa_bats = df[df["dataset"] == "BiggerAnalogyTestSet"].drop(
["evaluation_type", "num_samples", "dataset"], axis=1)
dfa_bats = dfa_bats[dfa_bats["embedding_source"].isin([
"glove.42B.300d",
"glove.6B.100d",
"glove.6B.200d",
"glove.6B.300d",
"glove.6B.50d",
"glove.840B.300d",
"glove.twitter.27B.100d",
"glove.twitter.27B.200d",
"glove.twitter.27B.25d",
"glove.twitter.27B.50d",
"wiki.en",
"wiki.simple",
"crawl-300d-2M",
"wiki-news-300d-1M",
"wiki-news-300d-1M-subword",
])]
dfa_bats["category"] = dfa_bats["kwargs"].apply(json.loads).apply(lambda x: str(x['category']))
dfa_bats.drop("kwargs", axis=1, inplace=True)
groups = dfa_bats["category"].str[0].apply(lambda x: {
'I':'Inflectional morphology',
'D':'Derivational morphology',
'L':'Lexicographic semantics',
'E':'Encyclopedic semantics'}[x])
dfa_bats_aggregate = dfa_bats.drop("category", axis=1)
dfa_bats_aggregate["group"] = groups
bats_aggregate = dfa_bats_aggregate.groupby(
["group", "embedding_name", "embedding_source", "evaluation"]).mean()
bats_aggregate = bats_aggregate.sort_values(
by='value', ascending=False).sort_index(level=[0], sort_remaining=False)
For BATS we present the results aggregated over all categories grouped by the respective 4 types of relations (inflectional morphology, derivational morphology, lexicographic semantics, encyclopedic semantics):
Inflectional morphology¶
In [25]:
bats_aggregate.loc["Inflectional morphology"]
Out[25]:
value | |||
---|---|---|---|
embedding_name | embedding_source | evaluation | |
fasttext | wiki-news-300d-1M-subword | threecosmul | 0.923 |
threecosadd | 0.917 | ||
wiki-news-300d-1M | threecosmul | 0.856 | |
threecosadd | 0.847 | ||
crawl-300d-2M | threecosmul | 0.835 | |
threecosadd | 0.799 | ||
glove | glove.840B.300d | threecosmul | 0.768 |
threecosadd | 0.760 | ||
glove.42B.300d | threecosmul | 0.674 | |
fasttext | wiki.en | threecosmul | 0.643 |
glove | glove.42B.300d | threecosadd | 0.630 |
glove.6B.300d | threecosmul | 0.627 | |
fasttext | wiki.en | threecosadd | 0.601 |
glove | glove.6B.200d | threecosmul | 0.598 |
glove.6B.300d | threecosadd | 0.593 | |
glove.6B.200d | threecosadd | 0.591 | |
glove.6B.100d | threecosadd | 0.574 | |
threecosmul | 0.552 | ||
fasttext | wiki.simple | threecosmul | 0.494 |
threecosadd | 0.433 | ||
glove | glove.twitter.27B.200d | threecosmul | 0.431 |
threecosadd | 0.425 | ||
glove.twitter.27B.100d | threecosadd | 0.394 | |
glove.6B.50d | threecosadd | 0.391 | |
glove.twitter.27B.100d | threecosmul | 0.362 | |
glove.6B.50d | threecosmul | 0.311 | |
glove.twitter.27B.50d | threecosadd | 0.282 | |
threecosmul | 0.232 | ||
glove.twitter.27B.25d | threecosadd | 0.135 | |
threecosmul | 0.098 |
Derivational morphology¶
In [26]:
bats_aggregate.loc["Derivational morphology"]
Out[26]:
value | |||
---|---|---|---|
embedding_name | embedding_source | evaluation | |
fasttext | wiki-news-300d-1M-subword | threecosmul | 0.414 |
threecosadd | 0.356 | ||
wiki-news-300d-1M | threecosmul | 0.307 | |
crawl-300d-2M | threecosmul | 0.278 | |
wiki.simple | threecosmul | 0.268 | |
wiki-news-300d-1M | threecosadd | 0.248 | |
wiki.simple | threecosadd | 0.228 | |
wiki.en | threecosmul | 0.212 | |
crawl-300d-2M | threecosadd | 0.193 | |
wiki.en | threecosadd | 0.179 | |
glove | glove.42B.300d | threecosmul | 0.146 |
threecosadd | 0.118 | ||
glove.6B.300d | threecosmul | 0.087 | |
threecosadd | 0.079 | ||
glove.6B.200d | threecosadd | 0.078 | |
glove.6B.100d | threecosadd | 0.077 | |
glove.6B.200d | threecosmul | 0.076 | |
glove.6B.100d | threecosmul | 0.063 | |
glove.6B.50d | threecosadd | 0.047 | |
glove.twitter.27B.200d | threecosadd | 0.037 | |
threecosmul | 0.034 | ||
glove.twitter.27B.100d | threecosadd | 0.026 | |
glove.6B.50d | threecosmul | 0.023 | |
glove.twitter.27B.100d | threecosmul | 0.019 | |
glove.twitter.27B.50d | threecosadd | 0.016 | |
threecosmul | 0.007 | ||
glove.twitter.27B.25d | threecosadd | 0.005 | |
threecosmul | 0.002 |
Lexicographic semantics¶
In [27]:
bats_aggregate.loc["Lexicographic semantics"]
Out[27]:
value | |||
---|---|---|---|
embedding_name | embedding_source | evaluation | |
fasttext | wiki-news-300d-1M | threecosadd | 0.087 |
wiki-news-300d-1M-subword | threecosadd | 0.087 | |
threecosmul | 0.087 | ||
wiki-news-300d-1M | threecosmul | 0.087 | |
crawl-300d-2M | threecosmul | 0.065 | |
glove | glove.6B.300d | threecosadd | 0.063 |
fasttext | crawl-300d-2M | threecosadd | 0.062 |
glove | glove.6B.200d | threecosadd | 0.061 |
glove.6B.100d | threecosadd | 0.059 | |
glove.twitter.27B.200d | threecosadd | 0.056 | |
glove.6B.300d | threecosmul | 0.051 | |
fasttext | wiki.en | threecosadd | 0.051 |
threecosmul | 0.048 | ||
glove | glove.twitter.27B.200d | threecosmul | 0.045 |
glove.6B.200d | threecosmul | 0.045 | |
glove.twitter.27B.100d | threecosadd | 0.042 | |
glove.6B.100d | threecosmul | 0.037 | |
glove.6B.50d | threecosadd | 0.034 | |
glove.twitter.27B.100d | threecosmul | 0.027 | |
fasttext | wiki.simple | threecosadd | 0.024 |
glove | glove.twitter.27B.50d | threecosadd | 0.023 |
fasttext | wiki.simple | threecosmul | 0.022 |
glove | glove.twitter.27B.50d | threecosmul | 0.014 |
glove.6B.50d | threecosmul | 0.014 | |
glove.twitter.27B.25d | threecosadd | 0.009 | |
threecosmul | 0.005 |
Encyclopedic semantics¶
In [28]:
bats_aggregate.loc["Encyclopedic semantics"]
Out[28]:
value | |||
---|---|---|---|
embedding_name | embedding_source | evaluation | |
glove | glove.42B.300d | threecosadd | 0.272 |
fasttext | wiki.en | threecosmul | 0.256 |
glove | glove.42B.300d | threecosmul | 0.254 |
glove.6B.300d | threecosadd | 0.242 | |
threecosmul | 0.240 | ||
fasttext | wiki.en | threecosadd | 0.236 |
glove | glove.6B.200d | threecosadd | 0.230 |
threecosmul | 0.214 | ||
glove.6B.100d | threecosadd | 0.198 | |
fasttext | crawl-300d-2M | threecosmul | 0.177 |
threecosadd | 0.166 | ||
glove | glove.6B.100d | threecosmul | 0.164 |
glove.twitter.27B.200d | threecosadd | 0.142 | |
fasttext | wiki-news-300d-1M | threecosmul | 0.139 |
glove | glove.6B.50d | threecosadd | 0.135 |
fasttext | wiki-news-300d-1M | threecosadd | 0.131 |
glove | glove.twitter.27B.200d | threecosmul | 0.128 |
fasttext | wiki-news-300d-1M-subword | threecosmul | 0.116 |
threecosadd | 0.114 | ||
glove | glove.twitter.27B.100d | threecosadd | 0.101 |
fasttext | wiki.simple | threecosmul | 0.099 |
glove | glove.6B.50d | threecosmul | 0.090 |
fasttext | wiki.simple | threecosadd | 0.077 |
glove | glove.twitter.27B.100d | threecosmul | 0.076 |
glove.twitter.27B.50d | threecosadd | 0.054 | |
threecosmul | 0.035 | ||
glove.twitter.27B.25d | threecosadd | 0.028 | |
threecosmul | 0.017 |