Evaluating Pre-trained Word Embeddings¶

Word embeddings can be evaluated on intrinsic and extrinsic tasks. gluonnlp facilitates the work with both of them by providing common datasets and helpful abstractions. In this notebook we show how to evaluate embeddings on the intrinsic similarity and analogy tasks.

The used GloVe and fastText word embeddings in this tutorial are from the following sources:

GloVe project website：https://nlp.stanford.edu/projects/glove/
fastText project website：https://fasttext.cc/

Let us first import the following packages.

In [1]:

import warnings
warnings.filterwarnings('ignore')

import mxnet as mx
import gluonnlp as nlp

Intrinsic evaluation¶

While word embeddings are in industry mainly interesting for their use in improving performance in downstream tasks, direct evaluation on those tasks may be expensive and infeasible while experimenting with a large number of embeddings. Evaluation of word embeddings on such downstream tasks is called extrinsic evaluation.

Intrinsic evaluation tasks on the contrary aim to judge the quality of word embeddings directly.

Word Similarity and Relatedness Task¶

Word embeddings should capture the relationsship between words in natural language. In the Word Similarity and Relatedness Task word embeddings are evaluated by comparing word similarity scores computed from a pair of words with human labels for the similarity or relatedness of the pair.

gluonnlp includes a number of common datasets for the Word Similarity and Relatedness Task. The included datasets are listed in the API documentation. We use several of them in the evaluation example below.

We first show a few samples from the WordSim353 dataset, to get an overall feeling of the Dataset structur

In [2]:

wordsim353 = nlp.data.WordSim353()
for i in range(15):
    print(*wordsim353[i])

computer keyboard 7.62
Jerusalem Israel 8.46
planet galaxy 8.11
canyon landscape 7.53
OPEC country 5.63
day summer 3.94
day dawn 7.53
country citizen 7.31
planet people 5.75
environment ecology 8.81
Maradona football 8.62
OPEC oil 8.59
money bank 8.5
computer software 8.5
law lawyer 8.38

Evaluation: Loading the embeddings¶

To evaluate word embeddings on the WordSim353 dataset, we first load pretrained embeddings and construct a vocabulary object. Here we load the fasttext word embeddings created from the crawl-300d-2M source. As they are quite large, executing the following cell may take a minute or two.

In [3]:

embedding = nlp.embedding.create('fasttext', source='crawl-300d-2M')

In [4]:

counter = nlp.data.utils.Counter(w for wpair in wordsim353 for w in wpair[:2])
vocab = nlp.vocab.Vocab(counter)
vocab.set_embedding(embedding)

We then replace the words in the WordSim353 dataset with indices from the vocabulary.

In [5]:

wordsim353_coded = [[vocab[d[0]], vocab[d[1]], d[2]] for d in wordsim353]
words1, words2, scores = zip(*wordsim353_coded)

Evaluation: Running the task¶

The gluonnlp toolkit contains helpers for evaluation word embeddings on the word similarity and relatedness task.

In the following we create a WordEmbeddingSimilarity block, which predicts similarity score between word pairs given an embedding matrix.

In [6]:

# context = mx.cpu()  # Replace this with mx.gpu(0) if you got a GPU
context = mx.gpu(0)  # Replace this with mx.cpu() if you got no GPU


evaluator = nlp.embedding.evaluation.WordEmbeddingSimilarity(
    idx_to_vec=vocab.embedding.idx_to_vec,
    similarity_function="CosineSimilarity")
evaluator.initialize(ctx=context)
evaluator.hybridize()

The similarities can be predicted by passing the two arrays of words through the evaluator. Thereby the ith word in words1 will be compared with the ith word in words2.

In [7]:

pred_similarity = evaluator(
    mx.nd.array(words1, ctx=context), mx.nd.array(words2, ctx=context))
print(pred_similarity[:10])

[0.4934404  0.69630307 0.5902223  0.31201977 0.16985895 0.3822252
 0.42938995 0.36722115 0.22559652 0.51560944]
<NDArray 10 @gpu(0)>

We can evaluate the predicted similarities, and thereby the word embeddings, by computing the Spearman Rank Correlation between the predicted similarities and the groundtruth, human, similarity scores from the dataset:

In [8]:

import numpy as np
from scipy import stats

sr = stats.spearmanr(pred_similarity.asnumpy(), np.array(scores))
print('Spearman rank correlation on {}: {}'.format(wordsim353.__class__.__name__,
                                                   sr.correlation.round(3)))

Spearman rank correlation on WordSim353: 0.792

Word Analogy Task¶

In the Word Analogy Task word embeddings are evaluated by inferring an analogous word D, which is related to a given word C in the same way as a given pair of words A, B are related.

gluonnlp includes a number of common datasets for the Word Analogy Task. The included datasets are listed in the API documentation. In this notebook we use the GoogleAnalogyTestSet dataset.

In [9]:

google_analogy = nlp.data.GoogleAnalogyTestSet()

We first demonstrate the structure of the dataset by printing a few examples

In [10]:

sample = []
print(('Printing every 1000st analogy question '
       'from the {} questions'
        'in the Google Analogy Test Set:').format(len(google_analogy)))
print('')
for i in range(0, 19544, 1000):
    print(*google_analogy[i])
    sample.append(google_analogy[i])

Printing every 1000st analogy question from the 19544 questionsin the Google Analogy Test Set:

athens greece baghdad iraq
baku azerbaijan dushanbe tajikistan
dublin ireland kathmandu nepal
lusaka zambia tehran iran
rome italy windhoek namibia
zagreb croatia astana kazakhstan
philadelphia pennsylvania tampa florida
wichita kansas shreveport louisiana
shreveport louisiana oxnard california
complete completely lucky luckily
comfortable uncomfortable clear unclear
good better high higher
young younger tight tighter
weak weakest bright brightest
slow slowing describe describing
ireland irish greece greek
feeding fed sitting sat
slowing slowed decreasing decreased
finger fingers onion onions
play plays sing sings

In [11]:

words1, words2, words3, words4 = list(zip(*sample))

We again construct a vocabulary object from the loaded pretrained embeddings. To speed up computation, we restrict ourselves here to the most frequent 300000 words in the vocabulary.

In [12]:

counter = nlp.data.utils.Counter(embedding.idx_to_token[:300000])
vocab = nlp.vocab.Vocab(counter)
vocab.set_embedding(embedding)

We then throw away all analogy questions that contain words not in the frequent words subset selected above.

In [13]:

google_analogy_subset = [
    d for d in google_analogy if (d[0] in vocab and d[1] in vocab
    and d[2] in vocab and d[3] in vocab)
]
print('Dropped {} pairs from {} as the were OOV.'.format(
    len(google_analogy) - len(google_analogy_subset),
    len(google_analogy)))

Dropped 5108 pairs from 19544 as the were OOV.

In [14]:

google_analogy_coded = [[vocab[d[0]], vocab[d[1]], vocab[d[2]], vocab[d[3]]]
                 for d in google_analogy_subset]
google_analogy_coded_batched = mx.gluon.data.DataLoader(
    google_analogy_coded, batch_size=64)

In [15]:

evaluator = nlp.embedding.evaluation.WordEmbeddingAnalogy(
    idx_to_vec=vocab.embedding.idx_to_vec,
    exclude_question_words=True,
    analogy_function="ThreeCosMul")
evaluator.initialize(ctx=context)
evaluator.hybridize()

To show a visual progressbar, make sure the progressbar2 package is installed. You can remove the # from below cell to optionally install it.

In [16]:

#! pip install --user progressbar2

In [17]:

try:
    import progressbar
except:
    progressbar = None

acc = mx.metric.Accuracy()

if progressbar is not None:
    google_analogy_coded_batched = progressbar.progressbar(google_analogy_coded_batched)
for batch in google_analogy_coded_batched:
    batch = batch.as_in_context(context)
    words1, words2, words3, words4 = (batch[:, 0], batch[:, 1],
                                      batch[:, 2], batch[:, 3])
    pred_idxs = evaluator(words1, words2, words3)
    acc.update(pred_idxs[:, 0], words4.astype(np.float32))

print('Accuracy on %s: %s'% (google_analogy.__class__.__name__, acc.get()[1].round(3)))

100% (226 of 226) |######################| Elapsed Time: 0:00:32 Time:  0:00:32

Accuracy on GoogleAnalogyTestSet: 0.794

Aggregated Results on all datasets¶

We have precomputed the results on the similarity and analogy tasks on all respective datasets and all pretrained embeddings (targeted at English) included in the Gluon NLP toolkit. If you are interested in reproducing the results, please run the run_all.sh bash script in the scripts/word_embeddings_evaluation folder. That folder also contains a notebook with extended, unaggregated results that detail the performance of the different embeddings on each category in the datasets.

We first load the CSV file containing the results and define a highlighter function that will help us to highlight the best-performinging method per dataset.

In [18]:

import pandas as pd
pd.options.display.max_rows = 999
pd.options.display.precision = 3

df = pd.read_table("../../../scripts/word_embedding_evaluation/results-vocablimit.csv",
                   header=None, names=[
                       "evaluation_type", "dataset", "kwargs", "embedding_name",
                       "embedding_source", "evaluation", "value", "num_samples"
                   ])

Similarity task¶

We then select the results from the similarity task and generate a table. To keep this page concise, we report the mean value over all datasets. Please see the extended results notebook at the Scripts page for detailed results.

In [19]:

dfs = df[~df["dataset"].isin(["BiggerAnalogyTestSet", "GoogleAnalogyTestSet"])].drop(["evaluation_type", "evaluation", "num_samples"], axis=1)
dfs = dfs[dfs["embedding_source"].isin([
    "glove.42B.300d",
    "glove.6B.100d",
    "glove.6B.200d",
    "glove.6B.300d",
    "glove.6B.50d",
    "glove.840B.300d",
    "glove.twitter.27B.100d",
    "glove.twitter.27B.200d",
    "glove.twitter.27B.25d",
    "glove.twitter.27B.50d",
    "wiki.en",
    "wiki.simple",
    "crawl-300d-2M",
    "wiki-news-300d-1M",
    "wiki-news-300d-1M-subword"
])]

dfs = dfs.groupby(["embedding_name", "embedding_source"]).mean()
dfs.sort_values(by='value', ascending=False)

Out[19]:

		value
embedding_name	embedding_source
fasttext	crawl-300d-2M	0.690
	wiki-news-300d-1M-subword	0.658
	wiki-news-300d-1M	0.649
glove	glove.840B.300d	0.629
fasttext	wiki.en	0.569
glove	glove.42B.300d	0.520
	glove.6B.300d	0.518
	glove.6B.200d	0.495
fasttext	wiki.simple	0.476
glove	glove.6B.100d	0.464
	glove.6B.50d	0.432
	glove.twitter.27B.200d	0.373
	glove.twitter.27B.100d	0.356
	glove.twitter.27B.50d	0.323
	glove.twitter.27B.25d	0.253

Analogy task¶

For the analogy task, we report the aggregate results per category type in the datasets.

Note that the analogy task is a open vocabulary task: Given a query of 3 words, we ask the model to select a 4th word from the whole vocabulary. Different pre-trained embeddings have vocabularies of different size. In general the vocabulary of embeddings pretrained on more tokens (indicated by a bigger number before the B in the embedding source name) include more tokens in their vocabulary. While training embeddings on more tokens improves their quality, the larger vocabulary also makes the analogy task harder.

In this experiment all results are reported with reducing the vocabulary to the 300k most frequent tokens. Questions containing Out Of Vocabulary words are ignored.

Google Analogy Test Set¶

We first display the results on the Google Analogy Test Set.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of the International Conference on Learning Representations (ICLR).

The Google Analogy Test Set contains the following categories. All analogy questions per category follow the pattern specified by the category name. We group them into semantic and syntactic analogy questions.

In [20]:

import json
pd.Series(df[df["dataset"] == "GoogleAnalogyTestSet"]["kwargs"].unique()).apply(
    json.loads).apply(lambda x: x['category'])

Out[20]:

      capital-common-countries
                 capital-world
                      currency
                 city-in-state
                        family
     gram1-adjective-to-adverb
                gram2-opposite
             gram3-comparative
             gram4-superlative
      gram5-present-participle
  gram6-nationality-adjective
             gram7-past-tense
                 gram8-plural
           gram9-plural-verbs
dtype: object

We first load the results from the output of the word_embedding_evaluation.py script.

In [21]:

dfa_google = df[df["dataset"] == "GoogleAnalogyTestSet"].drop(
    ["evaluation_type", "num_samples", "dataset"], axis=1)
dfa_google = dfa_google[dfa_google["embedding_source"].isin([
    "glove.42B.300d",
    "glove.6B.100d",
    "glove.6B.200d",
    "glove.6B.300d",
    "glove.6B.50d",
    "glove.840B.300d",
    "glove.twitter.27B.100d",
    "glove.twitter.27B.200d",
    "glove.twitter.27B.25d",
    "glove.twitter.27B.50d",
    "wiki.en",
    "wiki.simple",
    "crawl-300d-2M",
    "wiki-news-300d-1M",
    "wiki-news-300d-1M-subword",
])]
dfa_google["category"] = dfa_google["kwargs"].apply(json.loads).apply(lambda x: str(x['category']))
dfa_google.drop("kwargs", axis=1, inplace=True)

groups = dfa_google["category"].apply(lambda x: "syntactic" if x.startswith("gram") else "semantic")
dfa_google_aggregate = dfa_google.drop("category", axis=1)
dfa_google_aggregate["group"] = groups
google_aggregate = dfa_google_aggregate.groupby(["group", "embedding_name", "embedding_source", "evaluation"]).mean()
google_aggregate = google_aggregate.sort_values(by='value', ascending=False).sort_index(level=[0], sort_remaining=False)

Syntactic¶

We first present aggregate results over syntactic analogy questions.

In [22]:

google_aggregate.loc["syntactic"]

Out[22]:

			value
embedding_name	embedding_source	evaluation
fasttext	wiki-news-300d-1M-subword	threecosmul	0.871
	wiki-news-300d-1M-subword	threecosadd	0.863
	wiki-news-300d-1M	threecosmul	0.809
	wiki-news-300d-1M	threecosadd	0.794
	crawl-300d-2M	threecosmul	0.787
	crawl-300d-2M	threecosadd	0.764
glove	glove.840B.300d	threecosmul	0.728
fasttext	wiki.en	threecosmul	0.724
glove	glove.42B.300d	threecosmul	0.702
fasttext	wiki.en	threecosadd	0.701
glove	glove.840B.300d	threecosadd	0.700
	glove.42B.300d	threecosadd	0.670
	glove.6B.300d	threecosmul	0.654
	glove.6B.300d	threecosadd	0.634
	glove.6B.200d	threecosadd	0.625
	glove.6B.200d	threecosmul	0.622
fasttext	wiki.simple	threecosmul	0.596
glove	glove.6B.100d	threecosadd	0.579
fasttext	wiki.simple	threecosadd	0.552
glove	glove.6B.100d	threecosmul	0.545
	glove.twitter.27B.200d	threecosmul	0.536
	glove.twitter.27B.200d	threecosadd	0.529
	glove.twitter.27B.100d	threecosadd	0.467
	glove.twitter.27B.100d	threecosmul	0.436
	glove.6B.50d	threecosadd	0.405
	glove.6B.50d	threecosmul	0.322
	glove.twitter.27B.50d	threecosadd	0.319
	glove.twitter.27B.50d	threecosmul	0.271
	glove.twitter.27B.25d	threecosadd	0.135
	glove.twitter.27B.25d	threecosmul	0.102

Semantic¶

We then present aggregate results over semantic analogy questions.

In [23]:

google_aggregate.loc["semantic"]

Out[23]:

			value
embedding_name	embedding_source	evaluation
glove	glove.42B.300d	threecosmul	0.751
	glove.42B.300d	threecosadd	0.747
	glove.6B.300d	threecosmul	0.712
fasttext	wiki.en	threecosmul	0.711
glove	glove.6B.300d	threecosadd	0.708
fasttext	wiki.en	threecosadd	0.703
glove	glove.6B.200d	threecosadd	0.684
	glove.6B.200d	threecosmul	0.676
	glove.6B.100d	threecosadd	0.619
	glove.6B.100d	threecosmul	0.589
	glove.840B.300d	threecosmul	0.580
	glove.840B.300d	threecosadd	0.574
fasttext	crawl-300d-2M	threecosmul	0.569
fasttext	crawl-300d-2M	threecosadd	0.560
glove	glove.6B.50d	threecosadd	0.481
	glove.twitter.27B.200d	threecosadd	0.439
	glove.twitter.27B.200d	threecosmul	0.427
fasttext	wiki-news-300d-1M	threecosmul	0.404
fasttext	wiki-news-300d-1M	threecosadd	0.401
glove	glove.6B.50d	threecosmul	0.400
fasttext	wiki-news-300d-1M-subword	threecosmul	0.349
fasttext	wiki-news-300d-1M-subword	threecosadd	0.348
glove	glove.twitter.27B.100d	threecosadd	0.324
glove	glove.twitter.27B.100d	threecosmul	0.293
fasttext	wiki.simple	threecosmul	0.261
fasttext	wiki.simple	threecosadd	0.205
glove	glove.twitter.27B.50d	threecosadd	0.188
	glove.twitter.27B.50d	threecosmul	0.155
	glove.twitter.27B.25d	threecosadd	0.108
	glove.twitter.27B.25d	threecosmul	0.080

Bigger Analogy Test Set¶

We then display the results on the Bigger Analogy Test Set (BATS).

Gladkova, A., Drozd, A., & Matsuoka, S. (2016). Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. In Proceedings of the NAACL-HLT SRW (pp. 47–54). San Diego, California, June 12-17, 2016: ACL. Retrieved from https://www.aclweb.org/anthology/N/N16/N16-2002.pdf

Unlike the Google Analogy Test Set, BATS is balanced across 4 types of relations (inflectional morphology, derivational morphology, lexicographic semantics, encyclopedic semantics).

We first load the results for the BATS dataset:

In [24]:

dfa_bats = df[df["dataset"] == "BiggerAnalogyTestSet"].drop(
    ["evaluation_type", "num_samples", "dataset"], axis=1)
dfa_bats = dfa_bats[dfa_bats["embedding_source"].isin([
    "glove.42B.300d",
    "glove.6B.100d",
    "glove.6B.200d",
    "glove.6B.300d",
    "glove.6B.50d",
    "glove.840B.300d",
    "glove.twitter.27B.100d",
    "glove.twitter.27B.200d",
    "glove.twitter.27B.25d",
    "glove.twitter.27B.50d",
    "wiki.en",
    "wiki.simple",
    "crawl-300d-2M",
    "wiki-news-300d-1M",
    "wiki-news-300d-1M-subword",
])]
dfa_bats["category"] = dfa_bats["kwargs"].apply(json.loads).apply(lambda x: str(x['category']))
dfa_bats.drop("kwargs", axis=1, inplace=True)

groups = dfa_bats["category"].str[0].apply(lambda x: {
    'I':'Inflectional morphology',
    'D':'Derivational morphology',
    'L':'Lexicographic semantics',
    'E':'Encyclopedic semantics'}[x])
dfa_bats_aggregate = dfa_bats.drop("category", axis=1)
dfa_bats_aggregate["group"] = groups
bats_aggregate = dfa_bats_aggregate.groupby(
    ["group", "embedding_name", "embedding_source", "evaluation"]).mean()
bats_aggregate = bats_aggregate.sort_values(
    by='value', ascending=False).sort_index(level=[0], sort_remaining=False)

For BATS we present the results aggregated over all categories grouped by the respective 4 types of relations (inflectional morphology, derivational morphology, lexicographic semantics, encyclopedic semantics):

Inflectional morphology¶

In [25]:

bats_aggregate.loc["Inflectional morphology"]

Out[25]:

			value
embedding_name	embedding_source	evaluation
fasttext	wiki-news-300d-1M-subword	threecosmul	0.923
	wiki-news-300d-1M-subword	threecosadd	0.917
	wiki-news-300d-1M	threecosmul	0.856
	wiki-news-300d-1M	threecosadd	0.847
	crawl-300d-2M	threecosmul	0.835
	crawl-300d-2M	threecosadd	0.799
glove	glove.840B.300d	threecosmul	0.768
	glove.840B.300d	threecosadd	0.760
	glove.42B.300d	threecosmul	0.674
fasttext	wiki.en	threecosmul	0.643
glove	glove.42B.300d	threecosadd	0.630
glove	glove.6B.300d	threecosmul	0.627
fasttext	wiki.en	threecosadd	0.601
glove	glove.6B.200d	threecosmul	0.598
	glove.6B.300d	threecosadd	0.593
	glove.6B.200d	threecosadd	0.591
	glove.6B.100d	threecosadd	0.574
	glove.6B.100d	threecosmul	0.552
fasttext	wiki.simple	threecosmul	0.494
fasttext	wiki.simple	threecosadd	0.433
glove	glove.twitter.27B.200d	threecosmul	0.431
	glove.twitter.27B.200d	threecosadd	0.425
	glove.twitter.27B.100d	threecosadd	0.394
	glove.6B.50d	threecosadd	0.391
	glove.twitter.27B.100d	threecosmul	0.362
	glove.6B.50d	threecosmul	0.311
	glove.twitter.27B.50d	threecosadd	0.282
	glove.twitter.27B.50d	threecosmul	0.232
	glove.twitter.27B.25d	threecosadd	0.135
	glove.twitter.27B.25d	threecosmul	0.098

Derivational morphology¶

In [26]:

bats_aggregate.loc["Derivational morphology"]

Out[26]:

			value
embedding_name	embedding_source	evaluation
fasttext	wiki-news-300d-1M-subword	threecosmul	0.414
	wiki-news-300d-1M-subword	threecosadd	0.356
	wiki-news-300d-1M	threecosmul	0.307
	crawl-300d-2M	threecosmul	0.278
	wiki.simple	threecosmul	0.268
	wiki-news-300d-1M	threecosadd	0.248
	wiki.simple	threecosadd	0.228
	wiki.en	threecosmul	0.212
	crawl-300d-2M	threecosadd	0.193
	wiki.en	threecosadd	0.179
glove	glove.42B.300d	threecosmul	0.146
	glove.42B.300d	threecosadd	0.118
	glove.6B.300d	threecosmul	0.087
	glove.6B.300d	threecosadd	0.079
	glove.6B.200d	threecosadd	0.078
	glove.6B.100d	threecosadd	0.077
	glove.6B.200d	threecosmul	0.076
	glove.6B.100d	threecosmul	0.063
	glove.6B.50d	threecosadd	0.047
	glove.twitter.27B.200d	threecosadd	0.037
	glove.twitter.27B.200d	threecosmul	0.034
	glove.twitter.27B.100d	threecosadd	0.026
	glove.6B.50d	threecosmul	0.023
	glove.twitter.27B.100d	threecosmul	0.019
	glove.twitter.27B.50d	threecosadd	0.016
	glove.twitter.27B.50d	threecosmul	0.007
	glove.twitter.27B.25d	threecosadd	0.005
	glove.twitter.27B.25d	threecosmul	0.002

Lexicographic semantics¶

In [27]:

bats_aggregate.loc["Lexicographic semantics"]

Out[27]:

			value
embedding_name	embedding_source	evaluation
fasttext	wiki-news-300d-1M	threecosadd	0.087
	wiki-news-300d-1M-subword	threecosadd	0.087
	wiki-news-300d-1M-subword	threecosmul	0.087
	wiki-news-300d-1M	threecosmul	0.087
	crawl-300d-2M	threecosmul	0.065
glove	glove.6B.300d	threecosadd	0.063
fasttext	crawl-300d-2M	threecosadd	0.062
glove	glove.6B.200d	threecosadd	0.061
	glove.6B.100d	threecosadd	0.059
	glove.twitter.27B.200d	threecosadd	0.056
	glove.6B.300d	threecosmul	0.051
fasttext	wiki.en	threecosadd	0.051
fasttext	wiki.en	threecosmul	0.048
glove	glove.twitter.27B.200d	threecosmul	0.045
	glove.6B.200d	threecosmul	0.045
	glove.twitter.27B.100d	threecosadd	0.042
	glove.6B.100d	threecosmul	0.037
	glove.6B.50d	threecosadd	0.034
	glove.twitter.27B.100d	threecosmul	0.027
fasttext	wiki.simple	threecosadd	0.024
glove	glove.twitter.27B.50d	threecosadd	0.023
fasttext	wiki.simple	threecosmul	0.022
glove	glove.twitter.27B.50d	threecosmul	0.014
	glove.6B.50d	threecosmul	0.014
	glove.twitter.27B.25d	threecosadd	0.009
	glove.twitter.27B.25d	threecosmul	0.005

Encyclopedic semantics¶

In [28]:

bats_aggregate.loc["Encyclopedic semantics"]

Out[28]:

			value
embedding_name	embedding_source	evaluation
glove	glove.42B.300d	threecosadd	0.272
fasttext	wiki.en	threecosmul	0.256
glove	glove.42B.300d	threecosmul	0.254
	glove.6B.300d	threecosadd	0.242
	glove.6B.300d	threecosmul	0.240
fasttext	wiki.en	threecosadd	0.236
glove	glove.6B.200d	threecosadd	0.230
	glove.6B.200d	threecosmul	0.214
	glove.6B.100d	threecosadd	0.198
fasttext	crawl-300d-2M	threecosmul	0.177
fasttext	crawl-300d-2M	threecosadd	0.166
glove	glove.6B.100d	threecosmul	0.164
glove	glove.twitter.27B.200d	threecosadd	0.142
fasttext	wiki-news-300d-1M	threecosmul	0.139
glove	glove.6B.50d	threecosadd	0.135
fasttext	wiki-news-300d-1M	threecosadd	0.131
glove	glove.twitter.27B.200d	threecosmul	0.128
fasttext	wiki-news-300d-1M-subword	threecosmul	0.116
fasttext	wiki-news-300d-1M-subword	threecosadd	0.114
glove	glove.twitter.27B.100d	threecosadd	0.101
fasttext	wiki.simple	threecosmul	0.099
glove	glove.6B.50d	threecosmul	0.090
fasttext	wiki.simple	threecosadd	0.077
glove	glove.twitter.27B.100d	threecosmul	0.076
	glove.twitter.27B.50d	threecosadd	0.054
	glove.twitter.27B.50d	threecosmul	0.035
	glove.twitter.27B.25d	threecosadd	0.028
	glove.twitter.27B.25d	threecosmul	0.017