Structured metadata can be used to improve both precision and recall. Let's look at examples of how to do it.

Structured Metadata

When we think of search, we usually think of full-text search. That's where we look at TF-IDF, match documents based on which keywords appear in which documents, and rank them based on some algorithm.

However, we could also do search on structured data.

Let's say you have a clothing e-commerce website. Usually, you'll not have title or description, just a couple of images. You can't search images directly using ElasticSearch, so you'll need to improvise.

You could hire a team to manually go through each clothing item and tag them by:

  1. The color
  2. The type of fabric
  3. The general style of clothing
  4. The season where it'd be the most suitable to wear

The points above can get indexed in ES can cater to search queries like:

  1. red tees
  2. striped shirts
  3. denim

Netflix

This tagging is what also powers Netflix too (https://medium.com/swlh/how-netflix-uses-big-data-20b5419c1edf). When you type "stand-up comedy," you get a lot of standup comedy items, although they're not explicitly mentioned in the title. You could even support queries like "movies with cliffhangers."

Indexing and Querying for Structured Data

The ElasticSearch schema could look like:

{
    "metadata": ["red", "shirt", "summer", "denim"]
}

It's not always essential to separate the metadata into multiple fields as follows:

{
    "color": ["red", "green"],
    "type": ["shirt"],
    "season": ["summer", "spring"]
}

Separating metadata into such fields is okay at the database level, but will slow down your search to the extent to which you search for different fields. Your users expect very quick response times.

Also, it is often helpful to keep this field analyzed so that you could support different inflections.

Your query could look like:

{
    "query": {
        "match": {
            {
                "metadata": "user's query"
            }
        }
    }
}

This will help you greatly improve precision, at the cost of losing recall, because we're strictly only searching for metadata. If you have a full-text field as well, you could add it to the search too using ElasticSearch's multi_match.

Tagging Items in your ElasticSearch Index Automatically

If you have images that you'd like to index, then Google Cloud Vision API could come in handy.

For enriching text data (by tagging them automatically), you could use something like MonkeyLearn.

Takeaway

Whether you have a full-text field or an image that you need to search on, use manual work or automatic tagging wherever possible to add structured metadata.

What these metadata should contain is entirely dependent on what the information need is. We briefly saw the e-commerce and Netflix example.

Your first step should be to find out what kind of structured information you can isolate about your dataset, and how it will help users get better results.