How to change word2vec_inner.pyx and compile using cython

If you are working on word2vector, sometimes you need to make changes in train_batch_sg method. It is easy to change in python code. But, python code is not faster for training billion of data.


Hence, gensim library also keeps a cython file for word2vector that use both python and c features to ensure faster performance. You should change the word2vec_inner.pyx file to get faster result. 



However, when you are going to change  word2vec_inner.pyx file, you need to compile this file using cython command. The command is here:


$ cython word2vec_inner.pyx




You also need to generate .so file by running setup.py file using the following command:


setup.py:
from distutils.core import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize("word2vec_inner.pyx")
)


$ python setup.py build_ext --inplace


It will generate word2vec_inner.c and word2vec_inner.so file in your folder. Now, it is ready to run your updated code. 



Recommendation:


It would be better if you run the updated word2vec_inner.pyx file in your local directory. So, if you make any mistake, you can go back to the original file. If you want to run this file in your local directory, then you also need to copy voidptr.h, word2vec_inner.pxd from gensim model. 

Notes for gensim 3.40:

- The new version of gensim comes with some new models, like FastText, document2vector and so on. To keep same data structure in all of these methods, they changed the previous word2vec structure. They now introduced model.wv data. You can find the data in the following programs:

from gensim.models.keyedvectors import Vocab, Word2VecKeyedVectors
from gensim.models.base_any2vec import BaseWordEmbeddingsModel

- For debugging, we use logger.info(). If it does not print any log sentences in your program, compile the main file at first, i.e. word2vec.py, fasttext.py.





References:
http://cython.readthedocs.io/en/latest/src/tutorial/cython_tutorial.html
https://groups.google.com/forum/#!topic/gensim/LTdrGBysMyw

মন্তব্যসমূহ