Get Your Website Indexed By Google
Updated: Sep 23, 2021
It all begins with crawling!
Crawling is the process used by search engines to analyse, understand, and index the content of a website. Most CMS provide a simple way to set this up, and you should absolutely understand what your optimal indexation strategy is.
Indexation is necessary to make crawls possible:
1. Robots.txt File
Being well-indexed will make the crawl easier and more accessible. In other words, if you optimise components, Google can crawl them faster and more frequently (daily).
He will lead his research to other sites that will make his job simpler.
If not, Google will redirect its research to other sites that will make its job simpler.
Must be called robots.txt (in lowercase only)
Must be less than 500KB
Must return a status 200 to Google
Must be at the root of your site = URL yoursite.com/robots.txt
Must contain only 4 indications (for Google) in the following order:
Full permission Full permission + Sitemap
User-agent: * User-agent: *
Total exclusion with one exception: Google Partial exlusion on admin
User-agent: * User-agent: *
Disallow: / Disallow: /wp-admin/
User-agent: googlebot Allow: /wp-admin/admin-ajax.php
Important: You could simply instruct Google not to index certain pages of your website, but this is not a guarantee (noindex).
Ultimately, the bot may index the page if it came from another domain or a backlink. Follow the third step to ensure it does not happen.
You can edit and verify your robots.txt file using:
Yoast → Tools → File editor (If you are using WordPress)
Check it: Google Search Console → Robots Testing Tool
Be structured using an XML language
Be accessible at the root of your site (yoursite.com/sitemap.xml)
Contain a maximum of 50 K URLs and weigh 50 MB (unzipped)
Be subdivided into several sitemaps referenced in a sitemap_index if the size requires it (or the multiplicity of spaces: blog vs showcase)
Be complete and up to date
Only contain indexed URLs
Be submitted directly to search engines (Google Search Console)
Edit it: Yoast → General settings → Features → XML sitemap (if you are using WordPress).
As an alternative: manually edit a sitemap in line with XML-sitemaps or with ScreamingFrog.
Check your sitemap file using Google Search Console and submit it
3. Meta Robots Tag
It specifies whether:
A page should appear in the search engines.
The links on the page should be explored.
The default index, follow code is also equal to: <meta name="robots" content="index, follow">.
Three versions of the modified meta robots tag:
<meta name=''robots'' <meta name=''robots''
content=''noindex, nofollow''> : content=''noindex, follow''> :
There will be no indexing of the page The page will not be indexed, but the links
nor consideration of links. will be taken into account.
<meta name="robots" content="index, nofollow"> :
The page will be indexed, but we will inform robots that we are unsure about the quality of the external links and that they should not be followed.
Edit it: Yoast -> Under each page on the editing side (If you are using WordPress).
This option is available in any other CMS as well.
URLs that can help you solve a common problem in e-commerce. As an example, having several products pages that are quite similar (same product, different color). Google will not be able to determine which page is the reference page and will conclude that the repeated content is a duplication.
You can use Siteliner to check for your duplicate content.
Show to Google that many pages are related to one another and that just the reference page should be referred to. The canonical tag informs Google that this page is a variation.
Insert a canonical in its <head> tag
Yoast and any CMS will do it for you.
What If I Want To Request That Google Try Reindexing Right Now?
Go to your Google Search Console, enter the URL you want to reindex, and submit an indexing request.
What If I Want to Urgently Deindex One Page?
Modify the meta tag from Index to Noindex and go to the Search Console - Removals to speed up the process.
You May As Well Exclude Malicious Bots
The bots respond to your user's query by stating that the file is available for download. The user then clicks on the link and unknowingly infects their computer.
These bots send spam to interrupt your chats and bombard you with instant messages. Some advertisers use these bots to learn about the users' demographic information.
Zombie Bots Or Botnet
It's a group of compromised computers that perform various tasks and commands together. They are used to carry out large-scale attacks.
Robot.txt won't stop them, you will have to download protection tools like Cloudflare.
Wouldn't Be Easier To Noindex My Similar Pages Instead Of Setting Canonicals?
While the meta robots noindex tag is a fast approach to remove duplicate content from ranking consideration, it will be detrimental for your organic traffic (not visible on search).