Making your site more visible on search engines

While the content of your website matters the most, making sure everything else supports better visibility of the content on search engines is important too. The following sections explain some ways you can do this.

Directing search spiders to your site map

Site maps inform search engines of the existence of pages within your site that are otherwise not discoverable; perhaps they are not linked to from other pages on your site, or from external sites.

Some CMSs provide plugins to generate site maps, listed at code.google.com/p/sitemap-generators/wiki/SitemapGenerators, or you can write one yourself using the guidelines at www.sitemaps.org/protocol.html.

Once you have written your site map, you can let search engine spiders discover it when they crawl your website if you add a link to the sitemap by using the following:

<linkrel="sitemap" type="application/xml" title="Sitemap" href="/sitemap.xml">

You can also submit the site map to individual search engines instead of linking to the site map within the HTML page, if you would like to make your page as small as possible.

Implementing X-Robots-Tag headers

You will likely sometimes have a staging server, such as staging.example.com for your site example.com. If an external site links to the files on the staging server (say you were asking a question about some feature not working on a forum and link to the staging server), it is likely to be indexed by search engines even though the domain name does not figure in the robots.txt file or does not hold a robots.txt file.

To prevent this, you can add X-Robots-Tag HTTP header tags by appending and uncommenting the following code snippet to the .htaccess file on the staging server:

# ------------------------------------------------------------
# Disable URL indexing by crawlers (FOR DEVELOPMENT/STAGE)
# ------------------------------------------------------------

# Avoid search engines (Google, Yahoo, etc) indexing website's content
# http://yoast.com/prevent-site-being-indexed/
# http://code.google.com/web/controlcrawlindex/docs/robots_meta_tag.html
# Matt Cutt (from Google Webmaster Central) on this topic:
# http://www.youtube.com/watch?v=KBdEwpRQRD0

# IMPORTANT: serving this header is recommended only for
# development/stage websites (or for live websites that don't
# want to be indexed). This will avoid the website
# being indexed in SERPs (search engines result pages).
# This is a better approach than using robots.txt
# to disallow the SE robots crawling your website,
# because disallowing the robots doesn't exactly
# mean that your website won't get indexed (read links above).

# <IfModulemod_headers.c>
#   Header set X-Robots-Tag "noindex, nofollow, noarchive"
#   <FilesMatch ".(doc|pdf|png|jpe?g|gif)$">
#     Header set X-Robots-Tag "noindex, noarchive, nosnippet"
#   </FilesMatch>
# </IfModule>

Trailing slash redirects

Search engines consider folder URLs http://example.com/foo and http://example.com/foo/ as two different URLs and as such would consider the content to be duplicates of each other. To prevent this, rewrite the URLs either to change http://example.com/foo to http://example.com/foo/ or http://example.com/foo/ to http://example.com/foo.

The way we do this is to edit the .htaccess file for Apache server and add the following rewrite rules (see Chapter 5, Customizing the Apache Server, for details on how we edit .htaccess files).

Option 1: Rewrite example.com/foo to example.com/foo/

The following code snippet helps us to rewrite example.com/foo to example.com/foo/:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !(.[a-zA-Z0-9]{1,5}|/|#(.*))$
RewriteRule ^(.*)$ $1/ [R=301,L]

Option 2: Rewrite example.com/foo/ to example.com/foo

The following code snippet helps us to rewrite example.com/foo/ to example.com/foo:

RewriteRule ^(.*)/$ $1 [R=301,L]

If you have existing rewrite rules, perform the following steps to make sure you set up your rewrite rules correctly. Not doing so can cause incorrect redirects and 404 errors.

  • Keep a backup: Back up the .htaccess file you are going to add redirects to, before you start adding them. This way you can quickly go back to the backup file if you are unable to access your site because of an error in the .htaccess file.
  • Do not append or replace existing rewrite rules: Instead of appending or replacing existing rules from CMSes you are using, merge them within.
  • Watch the order of rewrite rules: Make sure you add the slash first and then your existing rules, which might rewrite the end paths.
  • Confirm the RewriteBase path: If your website is in a subfolder, ensure you have set the right RewriteBase path for your rewrite rules. If you have a working RewriteBase path, do not remove it.

    Note

    Finally, consider implementing guidelines from Google's SEO Starter Guide at http://googlewebmastercentral.blogspot.com/2008/11/googles-seo-starter-guide.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset