Tensorflow

如何使用預訓練的 word2vec 模型?

  • March 13, 2017

我在哪裡可以找到word2vec經過一些英文文章訓練的可靠模型?

我需要一個word2vec黑盒子,例如,我可以將一個句子作為數組傳遞: ["London", "is", "the", "capital", "of", "Great", "Britain"]

並收到: [some_vector_of_floats1, some_vector_of_floats2, some_vector_of_floats3, some_vector_of_floats4, some_vector_of_floats5, some_vector_of_floats6, some_vector_of_floats7]

在 Python 中,您可以使用Gensim

import gensim
model = gensim.models.Word2Vec.load_word2vec_format('path-to-vectors.txt', binary=False)
# if you vector file is in binary format, change to binary=True
sentence = ["London", "is", "the", "capital", "of", "Great", "Britain"]
vectors = [model[w] for w in sentence]

這些向量應該比使用 word2vec 獲得的預訓練向量提供更好的性能。

引用自:https://stats.stackexchange.com/questions/267169

comments powered by Disqus