Let's look at how to setup an environment for spaCy, and how to install all the necessary models for it. I don't recommend the default way of installing models, because you'd have to download the model for every virtualenv that you create, which leads to a lot of bloat.
Setting Up a Python Env
I'm going to assume that we're using Python 3. If you're using Python 2, you'd have to make slight syntax adjustments before the sample code will run.
I recommend installing Anaconda. Select the Python 3.x version, 64-bit installer (I believe everyone most likely is on a 64-bit system). The instructions to follow will assume that you're on a UNIX-like environment.
Create a new virtual environment for your project:
conda create -n my-env
Installing spaCy should be as simple as doing:
pip install -U spacy
It could take a while for the compilation to finish.
Install spaCy models
This is where I'm going to give you a slightly different advice than the default method that the documentation suggests.
As you use spaCy in more and more projects, you'd be creating a lot of virtual environments, and you'd not want to copy such huge data files in each of them.
- Go to the list of models here: https://spacy.io/models/en
- You'll see a button called "Release Details" on the top right of the
en_core_web_smmodel. Click on it, and it will take you to GitHub: https://github.com/explosion/spacy-models/releases//tag/en_core_web_sm-2.0.0
- Download the model from within the "Assets" subsection in GitHub: https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz
- Create a new folder in your home directory called
spacy_models. Extract the above archive into this folder.
Linking the spaCy model
You now have to associative the above model with your installation so that you can load it in your code.
python -m spacy link "<path to extracted model>/en_core_web_sm-2.0.0/en_core_web_sm/" en --force
Note that we're linking the model to the
en alias. You can use any alias that you want here, as long as you remember to use the same alias while loading the model in your code.
Note that we always have just one copy of the model on disk, which can be shared amongst installations. This will save you a lot of disk space.
Using the spaCy model
In your Python code, the following should load the model:
import spacy nlp = spacy.load('en')
Note that we used the same alias
en as in the previous step.