Before we just into using spaCy, we need to understand what problems it is designed to solve. You should also be aware of problems for which it is not designed.

What kinds of problems does spaCy solve

At this point, spaCy would be concerned with solving all the problems associated with the task Information Extraction. The technique used is called Natural Language Processing.

For any problem concerning spaCy, you need to have some unstructured text from which you'd want to extract some structure. Examples would be, but not limited to:

  1. Extracting named entities
  2. Extracting parts of speech
  3. Extracting relationships between entities and phrases

Things that spaCy doesn't do

A lot of users for spaCy are from the machine learning community. However, spaCy in itself does not do machine learning. Sure, it comes with pre-trained models for natural language processing, and a few hooks to customize training for your own models. It mainly integrates with other deep learning frameworks to help you train your own models.

Generally, this would be your workflow:

  1. Pick a pre-trained spaCy model or train your own
  2. Use spaCy to extract information out of unstructured text
  3. Either use the structured output from spaCy to train other models, or just use spaCy's structured data as the final output for your pipeline

Here's what Matthew Honnibal has to say about the main use-case behind spaCy:

Matthew Honnibal on the major use-case for spaCy


It is important for you to have a source of unstructured text, and some idea of what structured data you want to extract from him. It's a prerequisite to effectively use spaCy.