$2,000 Free Audit

Get A Proposal

Blogarrow-right SEO

How Search Engines Work: Crawling, Indexing, and Ranking

Jon Bennion

Jon Bennion | July 23, 2021 |

30 min read
How Search Engines Work

If search engines can't find your website, nothing else you do online matters.

There are up to 60 billion searches happening on Google in the United States each month. 60 BILLION!

And a massive 70-80% of search engine users only focus on the organic results (HubSpot).

So, how do you get your content ranking high in the search results and seen by your future customers?

You need to first get it noticed by the search engines.

Whether you partner with an SEO agency or try it yourself, it pays to know how search engines work before you invest any time or effort into search engine optimization.

The ultimate goal of this article is to help you build your knowledge about the ins and outs of search, so you can optimize your pages to rank higher on Google and get notices.

Let's dive in...

What are search engines?

Search engines discover, understand and organize the internet's content to offer the most useful, high quality and relevant results to a searcher's query.

Google does that better than any other search engine (unsurprisingly!), which is why more people go to Google than any other search engine. In fact, more than 90% of global web searches happen on Google, followed by Bing at 2.61% and at Yahoo 1.9%. (StatCounter).

The search goliath processes over  40,000 search queries every second on average. That's more than 3.5 billion searches every single day and 1.2 trillion searches per year (Internet Live Stats).

To understand how search engines work, you need to know their goal - which is to keep users coming back. Everything a search engine does revolves around that objective.

They do that by investing billions every year in developing algorithms that predict as accurately as possible which web pages searchers will find most useful and relevant in search results.

So, how does a search engine find the most relevant results for searcher queries? Especially when there's so much new content being created for the web every second?

How search engines work

  • To provide the most relevant, useful web pages, search engines do three things:

  1. Crawl: Search engines use robots (known as “spiders” or “web crawlers”) to scour the internet for content. Search engine's web crawlers look through the code and content of each URL, whether that’s a blog article, image, video, PDF, web page, or any other format. 

  2. Index: Once crawled, the content is organized into the search engine index. This happens fast. In 1999, it took Google one month to crawl and build an index of about 50 million pages. In 2012, the same task was accomplished in less than one minute! These “indexed” pages can then be accessed quickly by the search engine when a user types in a search query.

  3. Rank: When a user types in a search query, the search engine uses a ranking algorithm to calculate the quality, relevance and usefulness of web pages based on what the user is searching for. The results are then ordered from most to least relevant on the search engine results page (SERP). This is what searchers see. 

Google has an index of “hundreds of billions” of web pages. So, when you search on Google, the search engine scans the index and feeds it through an algorithm to find a set of results that give the best possible results to your search query. 

This means what you see on the search engine results pages are the web pages that Google finds to be the most relevant, trustworthy, and authoritative on the topic or keywords you’re searching.

Crawling

It's absolutely critical to make it as easy as possible for search engines to crawl your website. Crawling and indexing is everything in search engine optimization. If search engine's web crawlers can’t crawl your website, they cannot index or rank it, which means it won’t be shown to searchers. 

Can search engines find your pages?

Check to see if your web pages are indexed. Use the Index Coverage report in Google Search Console. Start by signing up for a free Google Search Console account if you don't already have one.

Then, submit sitemaps for your website and monitor how many submitted pages have actually been added to Google's index.

If you're not showing up anywhere in the search results, there are a few reasons why :

  • Your site is new and Google hasn't crawled yet

  • Your site has no links from any external websites

  • Your structure makes it hard for a robot to crawl it

  • Your site contains some basic code that is blocking search engines

  • Your site has been penalized by Google

How can you make sure search engines can crawl your website?

There are a few techniques you can do right now to make sure search engines can crawl your web pages with ease. The most important thing is to improve the coding and structure to ensure it can be understood by Google’s web crawlers.

In other words, you need to tell search engines how to crawl your site.

Here are two things you need for this:

  1. Robots.txt files - A robots.txt file is located in the root directory of websites and instructs search engines on which parts of your site they should and shouldn't crawl. Implementing a robots.txt file into your website is one of the easiest technical SEO techniques to introduce, but it’s also one that you can just as easily botch. Use our guide to create the perfect robots.txt file.

  2. Sitemap - Create a sitemap file that ticks all the boxes for Google's standards and submit it through Google Search Console. This will help crawlers follow a path to your important pages.

setting up a sitemap

Image credit: Magento

Here are common mistakes that stop search engines from effectively crawling and indexing your website:

Poor site navigation

How you structure your website really matters. The better your site structure, the easier the web crawlers can access and crawl the content. There are lots of navigational issues that hinder crawlers, including orphan pages (pages that aren’t linked to any other pages). Also, if your mobile navigation is different from desktop navigation, this is a barrier for web crawlers. The reality is crawlers cannot automatically discover everything on your site. Google admits “[there are] pages on your site we might not…discover” (That’s why sitemaps are necessary - more on that latter.) However, web crawlers will make easier work of accessing, crawling, indexing, and returning the pages of your site if it has a strong structure.

Gated content

Is your content hidden behind forms? If you ask users to log in or fill out forms before accessing content, search engine crawlers can’t see the protected pages either. Every time the crawler hits a form or gated piece of content, you run the risk of creating a barrier for search engines to discover any more content on that URL pathway. 

Search forms

Crawlers can’t use search forms, so if you rely on search boxes to help a user navigate your website and discover content, you need to rethink the structure.

Text hidden within non-text content

Everyone loves gifs and motivational quotes on images, but if you want this text to be indexed, you need to find another way to display it. Avoid embedding text in images, especially important text elements like page headings and menu items. To ensure maximum accessibility of your content to crawlers, keep text in HTML and provide alt text for images.

Errors

Site errors are all the crawl errors that prevent the search engine crawler from accessing your website. They might be crawl errors (404 errors) or internal server errors (500). If your Google Search Console shows server errors, this means the crawler wasn’t able to access your website and the request might have timed out. Server errors also happen when there are issues with your code that prevent a page from loading, or your website has so many visitors that the server couldn’t handle all the requests. 

One more thing about crawling -

Google only allocates a certain amount of crawling per website. This is known as the crawl budget.

Let's say your site has millions of pages that change frequently, for example Amazon, Google might not be able to crawl your whole site as often as you want in order for your content to show up in search results. So, you need to point Google to the most important pages on your website - the ones you really want them to rank.

You (or your SEO agency) can do this by listing your most recently updated or most important pages in your sitemaps, and even hiding less important pages using robots.txt rules.

Indexing

How do search engines interpret and store your pages?

A page is indexed by Google if it has been visited by the Google crawler, analyzed for content and meaning, and stored in the Google index. It's only these indexed pages that will be shown in Google search results pages (provided they also follow Google's webmaster guidelines).

As Google explains,"The index is like a library, except it contains more info than in all the world’s libraries put together."

As you can imagine, this means the Google Search index contains hundreds of billions of webpages and is over 100,000,000 gigabytes in size. There's an entry for every word found on every webpage that Google indexes. Google also something called the Knowledge Graph to go beyond keyword matching to better understand and index the things its users care about, such as travel times, data from the World Bank and more.

Why might a page not be indexed?

You have rogue "Noindex" tags

Sometimes, you might want to prevent a page from appearing in search results - maybe the data is private, or you don't want Google to index the shopping cart or checkout pages on your e-commerce store.

You can do this by including a noindex meta tag in the page's HTML code, so when a crawler next crawls that page and sees the tag or header, it will drop that page entirely from search results, even if other sites link to it. A user will still be able to visit that web page by following links from external sites, but it won't show up in search results.

If you have rogue Noindex tags, you might be stopping pages from being indexed when you want them to show.

To find all pages with a noindex meta tag on your site, you can use a tool like Ahrefs and run a crawl audit. This will show you “Noindex page” warnings, so you can remove the noindex meta tag from any pages where it doesn’t belong.

You aren’t using canonical tags to tell Google about duplicate pages

Do you have a web page with multiple URLs - maybe it's published on both your US and AU websites? Or do you have a web page that is essentially the same as an existing page but with minor variations?  Google sees these as duplicates. Search engines work by choosing one page URL to crawl and index. This is known as the canonical version. All other URLs will be considered duplicates, which means they will be crawled and indexed less often. 

Here's the important part: you can tell Google which URL is canonical using the canonical tag. In other words, tell them which page you want them to show in the search results. If you don't, Google will choose for you.

It's worth noting that duplicate content doesn't result in a penalty BUT if you have a large amount of duplicate content, Google may begin to doubt your trustworthiness, which could damage your rankings.

As with no index tags, you might have some rogue canonical tags that are preventing the right pages from being indexed. To find rogue canonical tags across your entire site, run a crawl in a Site Audit tool like Ahrefs.

Using your robots.txt file to tell search engines how to index your site

The robots.txt file is arguably the most important tool you can use to tell search engines how to crawl and index your site.

Use it to tell Google to ignore certain a page or to pay more attention to other pages.

Because it's so powerful, getting your robots.txt wrong can lead to a page not being indexed when you want them to be.

To check what pages are currently blocked by robots.txt, go to Google Search Console and check the “Indexed, though blocked by robots.txt” report.

Ranking

Once a search engine has indexed your site, how does it rank it?

Just when you thought you had the hang of all this, we're going to throw a spanner into the works:

Not all search engines do things the same way.

That's right - Google and Bing, for example, have different ways of ranking sites.

We’ll focus mostly on Google here, for the obvious reason that it’s the search engine used by most people in the US and globally.

Google uses complex search algorithms to sort through the hundreds of billions of pages in its search index to find the most useful, relevant results for its searchers - that’s its whole reason for being. 

These are known as "organic search results”.

This means they rank based 100% on merit and are not paid for.

As the search engine explains in How Search Works, "Google never accepts payment to crawl a site more frequently — we provide the same tools to all websites to ensure the best possible results for our users."

Search engines use lots of different ranking factors to ranks the organic search results, including:

  • Social metrics

  • Keyword usage

  • Brand signals

  • User interactions

  • Quality content

  • User generated content links

  • Links from authority URLs

  • ...and many more.  

Google uses more than 200 ranking factors in its algorithm, so we won’t list all the ranking factors here. But it's worth knowing the key ranking factors when understanding how search engines work.

Some ranking factors are more important than others in the algorithm- here's how some SEO experts weigh up the importance of different ranking factors in Google:

seo ranking factors moz diagram

Image credit: Moz

Ultimately, all of these factors come back to three key themes:
  1. Relevance: Google looks for pages that are most closely related to the searcher's query and keywords.
  2. Authority: Google determines whether the content is accurate and trustworthy, and whether the website itself is authoritative.
  3. Usefulness: Content can be relevant and authoritative, but it must be deemed useful if Google is going to position it at the top of the search engine results page.

But how search engines work is not that simple. 

Google regularly changes its search algorithm to make sure it’s always meeting its objective of providing useful results. 

While most changes are minor, Google periodically rolls out a major update that significantly affects search rankings. Knowing these Google updates (or working with an SEO specialist who does) can help you prepare and improve your SEO efforts.

For example, have you heard about the Local Pack?

The Local Pack is a section of Google's search results that shows the local business that is most related to the search query. Google looks for signals in content, links, social profiles and directory listings so it can provide the most relevant local results to the searcher query. Knowing this means you can invest in Local SEO to get your local business seen.

The 3 pillars of SEO

  • Now you know how search engines work, you get a better understanding of what you need to do to rank well through SEO. Here are the three SEO pillars to focus on:

  • Technical stuff – This is all the technical SEO you need to cover to make sure your site is crawled and indexed, such as structured data, sitemaps and site architecture, canonical tags, and more.

  • Onsite SEO – On-page optimization is the practice of creating a site structure, pages and content that answer user’s questions and engage them from the first click, while making sure that search engines know exactly what your content is about and why it deserves to rank high. Examples of onsite SEO are headlines, meta descriptions and internal links.

  • Offsite SEO– This is all the stuff you do externally to your website to help your rankings. If you do nothing else offsite, invest in generating quality backlinks.

Recap

A lot goes into how search engines work, and it’s critical that you know the basics before you invest in SEO. Whether you're doing it yourself or hiring an SEO specialist, you should know how search engines work if you are going to get the best return on your investment.

Now you know how search engines work, find out how your pages are being crawled, indexed and ranked with our FREE SEO audit.

New Call-to-action

Let's increase
your sales.
100% transparency, no excuses, and no dodgy practices. Just serious digital results.