What is Hierarchical Clustering? (And What Does it Mean for Your SEO Strategy?)

Hierarchical clustering has become one of the hottest – and most misunderstood – online marketing buzzwords of late. Despite the sound of its name, it has nothing to do with astrological signs or big pharma’s latest miracle drug.

It could, however, be the miracle cure that sends your search engine optimization results into orbit.

Before you can control the full power of this content strategy, you need to understand that it is a very distant relation to its complex mathematical cousin used by market forecasters and scientists in many disciplines. You don’t need to know how to construct a mathematical cluster you only need to understand the surface definitions.

Defining Complex Hierarchical Clustering

More than 61 scholarly papers have been devoted to defining hierarchical clusters and topic modeling over the last decade. Each leap in machine learning has brought even more attention and importance to understanding what these clusters can be and how they are used.

To boil all the research down to the bottom line, clustering is a way of identifying groups within a large dataset. One very common way of picturing this in action is found on your own computer.

All the files and folders that are on your hard drive right now represent a dataset or hierarchy. These clusters can be identified from the top down (divisive) or from the bottom up (agglomerative).

Breaking Things Down and Building Them Up

In a divisive clustering, you start with one massive folder, like a “My Documents” folder on your hard drive. Then you open that folder and look for ways to break the dataset down into different sub-folders within it.

To continue our example, these would be “My Images,” “My Word docs,” “My Videos,” etc. You continue to open each folder and break each down into smaller pieces until you get to the final, individual file level.

The agglomerative method works in reverse of the divisive. Here you start with a pile of individual files and try to find commonalities between them to build up your folder structures.

It’s important to note that even though you may be looking at the exact same dataset, applying one method or the other can result in very different outcomes.

Google Begins Latent Semantic Indexing

In 2013, Google made a radical change to the way it read and classified web pages within its index. Prior to that, search engine optimization specialists knew that the best way to get Google to value a webpage was to build it around one keyword only.

Even very closely related topics would have separate pages built to maximize the value of that page’s focus keyword. For example, a dentist’s website might have one page devoted to “pediatric dental services” and a second page on “children’s dental services.”

Of course, many black hat SEO practitioners would take one page of content and just change the focus keyword and a few random words of text on a page to spawn hundreds of pages of content that could rank for hundreds of different keywords. Sometimes they would get the same page listed multiple times on one search result page. 

The 2013 Hummingbird update made that type of search optimization work obsolete overnight. Google had learned how to read your web pages well enough to understand what the overall topic was, not just the individual keywords within it.

Search Engine Optimizers Overcompensated

Overnight topics, and not keywords, had become the key to high Google rankings. SEO’s and business owners themselves spent the next couple of years frantically rewriting their websites to fit a new “one-topic, one-page” model.

This model was just as flawed but in reverse. Instead of having hundreds of articles on a site each outlining one of the ways you could prepare eggs for eating, for example, you now had one.

That one page didn’t just cover ways to cook eggs. It also covered every different type of egg available, where they could be found, all the different ways you could dye eggshells or make decorations from eggs, the health benefits and allergic reactions people can have to eggs and more.

Whew! No wonder both human visitors and search engine spiders became even more confused and disappointed in search results.  

Topic Clustering for Real Results

Hierarchical clustering for search engine optimization offers a middle ground approach to building a site’s content.  A good clustering strategy can be used on your website, your blog, one or more of your social media sites. It can even be centered in one of these places and spread to encompass all of the others.

Let’s take another look at the topic of “eggs” and how it would be handled under today’s best clustering practices. First, take a divisive view of the overall topic as it appeared on the one page, one topic model. There were these sub-topics identified:

  • Ways to cook eggs
  • Different kinds and where they can be found
  • Ways to decorate with eggs
  • Health benefits and ailments associated with eggs.

Each of these sub-topics could be covered in a single, usually long and detailed piece of content. Frequently these are found on blogs rather than category level pages on a site where they are called “pillar” or “cornerstone” posts.

Linking Binds the Clusters

Other, more specific blog posts would be planned around each pillar post and have a tighter keyword focus on the page. For example, you could have several blog posts each covering different breakfast egg recipe. You could also have separate posts on preparing eggs for entertaining including one on egg dishes for the Fourth of July and another on cooked-egg casseroles for a crowd.

Not all of these sub-topic posts need to be placed on your site to help you with this strategy. You could offer to do a guest post on any one of these topics for another blog or site in a field related to yours and it will still help.

The linchpin to this strategy is linking. Each of those specific blog posts would link back to the pillar post. In that way, they increase the authority of that pillar post for that topic and the overall authority of your site on the topic.

Get Started Today

Once you understand the strategy, it’s relatively easy to see how it could be applied to your products and services. The hardest part of actually implementing the strategy may be in deciding whether to begin working from the top down on each topic or the bottom up.

The hierarchical clustering content strategy can take a long time to set up. One pillar post and five to ten supporting blog posts can take months to write.

The hierarchical clustering content strategy can take a long time to set up. One pillar post and five to ten supporting blog posts can take months to write.

It frequently takes more than six months to begin seeing the results of your effort as well.