• Impact Web Design

Get Your Website Indexed By Google

Updated: 21 hours ago

It all begins with crawling!


Crawling is the process used by search engines to analyse, understand, and index the content of a website. Most CMS provide a simple way to set this up, and you should absolutely understand what your optimal indexation strategy is.


Indexation is necessary to make crawls possible:

  1. Robots.txt File

  2. Sitemap

  3. Meta Robots Tag (Page Headers)

  4. Canonicals

  5. Bonus

  6. No Blah-Blah

Google-crawl

1. Robots.txt File


Being well-indexed will make the crawl easier and more accessible. In other words, if you optimise components, Google can crawl them faster and more frequently (daily).


He will lead his research to other sites that will make his job simpler.

If not, Google will redirect its research to other sites that will make its job simpler.


Robots.txt:

  • Must be called robots.txt (in lowercase only)

  • Must be less than 500KB

  • Must return a status 200 to Google

  • Must be at the root of your site = URL yoursite.com/robots.txt

  • Must contain only 4 indications (for Google) in the following order:

○ User-agent:

○ Disallow:

○ Allow:

○ Sitemap:


Examples:


Full permission Full permission + Sitemap

User-agent: * User-agent: *

Disallow: Disallow:

Sitemap: https://www.yoursute.com/sitemap_index.xml



Total exclusion with one exception: Google Partial exlusion on admin

User-agent: * User-agent: *

Disallow: / Disallow: /wp-admin/

User-agent: googlebot Allow: /wp-admin/admin-ajax.php

Disallow:


Important: You could simply instruct Google not to index certain pages of your website, but this is not a guarantee (noindex).

Ultimately, the bot may index the page if it came from another domain or a backlink. Follow the third step to ensure it does not happen.


You can edit and verify your robots.txt file using:

  • Yoast → Tools → File editor (If you are using WordPress)

  • Check it: Google Search Console → Robots Testing Tool


2. Sitemap


  • Be structured using an XML language

  • Be accessible at the root of your site (yoursite.com/sitemap.xml)

  • Contain a maximum of 50 K URLs and weigh 50 MB (unzipped)

  • Be subdivided into several sitemaps referenced in a sitemap_index if the size requires it (or the multiplicity of spaces: blog vs showcase)

  • Be complete and up to date

  • Only contain indexed URLs

  • Be submitted directly to search engines (Google Search Console)


yoast-seo-xml-sitemap

google-search-console-overview

Edit it: Yoast → General settings → Features → XML sitemap (if you are using WordPress).


As an alternative: manually edit a sitemap in line with XML-sitemaps or with ScreamingFrog.


Check your sitemap file using Google Search Console and submit it









3. Meta Robots Tag


It specifies whether:

  • A page should appear in the search engines.

  • The links on the page should be explored.

The default index, follow code is also equal to: <meta name="robots" content="index, follow">.


Three versions of the modified meta robots tag:


<meta name=''robots'' <meta name=''robots''

content=''noindex, nofollow''> : content=''noindex, follow''> :

There will be no indexing of the page The page will not be indexed, but the links

nor consideration of links. will be taken into account.


<meta name="robots" content="index, nofollow"> :

The page will be indexed, but we will inform robots that we are unsure about the quality of the external links and that they should not be followed.


Edit it: Yoast -> Under each page on the editing side (If you are using WordPress).

This option is available in any other CMS as well.



4. Canonicals


URLs that can help you solve a common problem in e-commerce. As an example, having several products pages that are quite similar (same product, different color). Google will not be able to determine which page is the reference page and will conclude that the repeated content is a duplication.


You can use Siteliner to check for your duplicate content.


Show to Google that many pages are related to one another and that just the reference page should be referred to. The canonical tag informs Google that this page is a variation.


Insert a canonical in its <head> tag


Exemple:

<head>

<link rel=''canonical''

href=''http://yoursite.com/favoritepage'' />

</head>


Yoast and any CMS will do it for you.



5. Bonus


What If I Want To Request That Google Try Reindexing Right Now?


Go to your Google Search Console, enter the URL you want to reindex, and submit an indexing request.


Request-indexing-google-search-console

What If I Want to Urgently Deindex One Page?


Modify the meta tag from Index to Noindex and go to the Search Console - Removals to speed up the process.


google-search-console-noindex

You May As Well Exclude Malicious Bots

  • File-sharing Bots

The bots respond to your user's query by stating that the file is available for download. The user then clicks on the link and unknowingly infects their computer.

  • Spambots

These bots send spam to interrupt your chats and bombard you with instant messages. Some advertisers use these bots to learn about the users' demographic information.

  • Zombie Bots Or Botnet

It's a group of compromised computers that perform various tasks and commands together. They are used to carry out large-scale attacks.


Robot.txt won't stop them, you will have to download protection tools like Cloudflare.


Wouldn't Be Easier To Noindex My Similar Pages Instead Of Setting Canonicals?


While the meta robots noindex tag is a fast approach to remove duplicate content from ranking consideration, it will be detrimental for your organic traffic (not visible on search).


NO BLAH-BLAH

robot-crawling-definition
robot-testing-tool
sitemap