How Google's New Sitemap Protocol Sets a New Standard for WebPage Submissions | | |
| Get more of your site indexed faster, easier, and updated automatically! | ||
Ya know how we've been saying for the past several years that submitting your webpages to Google is a virtual waste of time? ...how Google likes to find links to your site's pages and discover your new pages on its own? Well, to paraphrase Emily Litella—Never mind, sort of.
It's all-of-a-sudden very different now that Google is beta-testing their new Sitemaps submission protocol. The sun is rising on a fresh new way to tell Google all about your new and updated webpages.
Google says...
Google Sitemaps is an easy way for you to help improve your coverage in the Google index. It's a collaborative crawling system that enables you to communicate directly with Google to keep us informed of all your web pages, and when you make changes to these pages.
With Google Sitemaps you get:Creating your Sitemaps is easy
- Better crawl coverage to help people find more of your web pages
- Fresher search results
- A smarter crawl because you can provide specific information about all your web pages, such as when a page was last modified or how frequently a page changes
Use the Sitemap Generator to create an XML Sitemap or submit a simple text file with all your URLs.
Get started today — it's free
Send us your sitemap today and help increase the visibility of your web pages.
A More Efficient Way to Submit Your Webpages ...could even become 'The Standard' for all search engines.
Up until now, getting new pages or sites indexed by Google depended on external links from pages that Google already knew about. Google's spider typically revisits pages that are already indexed and discovers new links that point to new pages. The advent of Sitemaps (Beta), however, means that telling Google about new or updated content can be as straight-forward as presenting a specifically formatted list directly to them. In short, the submission process is back—reincarnated in an altered form that outsources some of the heavy lifting to webmasters and site managers.
Google Sitemaps invites you to place a specially formatted site map file on your web server. Then, whenever you notify them of new sites, pages, or updated content, their spider will crawl your pages. You're even invited to prioritize your pages and inform Google (they call it a hint) of your update frequencies. Imagine that! ...the only catch so far is, there's no actual guarantee that your pages will get indexed although we strongly suspect that most pages will.
Getting Started with the Sitemap Protocol
To get started you'll first need to sign-up (registered Google Accounts users can skip the sign-up). After logging in, you'll get a welcome page like this...
|
| |
Once you get past the technical terms, it's actually not so hard.
|
Rest easy–Google accepts site maps in either plain text or XML to accommodate the cross-section of webmaster expertise. Now, if you're a bit unsure about XML, relax—we'll simplify what you need to know in a minute. Highly technical site managers will, however, recognize how XML allows them to create a fully automated XML feed—a script on your web server that monitors site changes in order to automatically regenerate and resubmit your feed to Google.
Even so, for the less technically inclined webmaster, a simple list of website URLs can be submitted and resubmitted manually whenever there is new site content for Google to crawl and index. In other words, Google offers the best of both worlds—a simple submit process for the non-technical web master and a more advanced option that allows for useful automation and maintenance of the submission process for the technologically enabled.
Here's how it works
The least technical way to submit your site's webpages is to create a simple sitemap.txt file with a list of URLs, one per line, as such...
http://www.domain.com/
http://www.domain.com/products.html
http://www.domain.com/products.php?product_id=292
http://www.domain.com/products.php?product_id=983&cat=10
...and while the example above is a perfectly valid way to submit, it doesn't fully utilize the more advanced options Google Sitemaps offers.
To harness the full power being offered by Sitemaps, you'll need to use the following sitemap.xml format (which we will simplify for you in a moment as well as introduce tools that will auto-generate the file).
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84
http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">
http://www.domain.com
2005-06-03T04:20:36Z
always
1.0
http://www.domain.com/products.html
2005-06-02T20:20:36Z
daily
0.8
As you can see, the XML file contains extra data (known as metadata—data that describes other data) not found in our simple sitemap.txt example file. The XML tags defined by Google that make up a Sitemap file are very specific and must be used precisely. Some tags are optional and some are required.
Here's the precise breakdown regarding the purpose, meaning and requirement (or not) of each of the XML tags used in the example above:
[Required]
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84
http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">A no-brainer—there's nothing you need to know about this block other than to include it exactly as it is displayed above. This is a required part of the document which simply describes the document itself.
[Required]
Indicates the beginning and end of a set of URLs to be crawled.[...]
[Required] Specifies the start and finish of an individual URL (webpage) entry.[...]
[Required] The full URL of the web page you wish to submit, including the domain name and path just as it would be entered in a web browser's address bar. You're limited to 2048 characters (which would be an unbelievably long URL, anyway).[...]
[Optional]
The date and time the document was last modified. This date and time must be specified using the ISO 8601 standard.[...]
[Optional]
Here's where you can suggest how often Google should revisit this URL. Bear in mind it's not a command, but rather a hint. The days specified can be set to any one of the following values: always, hourly, daily, weekly, monthly, yearly, and never.[...]
[Optional]
The relative priority of this URL compared to other URLs on your own site. Here's where you can assign a crawl preference to your more important pages. The scale is from 0.0 to 1.0, in increments of .1. For example, 0.3, 0.5, 1.0 would be priorities listed from lowest to highest.[...]
This has no direct affect on your actual search engine ranking and it's actual importance is thus far untested. We can only speculate that one of the usages would be to help Google decide which pages to crawl in the event a spider is unwilling to crawl them all.
Google provides complete technical specifications of the Sitemaps protocol at: https://www.google.com/webmasters/sitemaps/docs/en/protocol.html
Sitemap Generators
Now you can take a deep breath and relax. The good news is that you don't actually have to (and ideally you should not) manage this Sitemap XML file manually.
When Google announced the beta availability of their new Sitemap protocol, they also provided an open source Python script that can generate a Sitemap for you. And, since the script's release, a number of third-party developers have also created a variety of tools that enable you do the job more easily.
Working within Content Management Systems
Site managers who use a Content Management System (CMS) to manage their website should check with the developers of their CMS software for a plug-in that will generate a site map file automatically.
Here's a list of a few popular CMS plug-ins that have already been released, most of which have been developed by third parties. If you prefer, contact the creator of your particular CMS directly if you wish to use an 'official' version.
- MovableType
http://www.jacobsen.no/anders/blog/archives/2005/06/06/google_sitemaps_
for_movable_type_now_with_correct_last_modified_dates.html
- Plone
http://plone.org/documentation/how-to/google-sitemap
- OsCommerce
http://www.oscommerce.com/community/contributions,3233/category,all/search,seo
- Wordpress
http://www.arnebrachhold.de/2005/06/05/google-sitemaps-generator-v2-final
http://www.socialpatterns.com/search-engine-optimization/
google-sitemaps-with-wordpress/
- NucleusCMS
http://freshmeat.net/projects/nucmap/?branch_id=58613&release_id=198255
Server-Based Scripts
|
Also available are scripts that you can run directly on your web server. These scripts will crawl the actual file structure of your web server to develop the site map file.
It's important to understand that this method will find files that are not necessarily linked from other pages and that you may not want indexed in a search engine. Therefore...
...be sure to examine the generated site map file carefully to ensure that it contains only URLs you want indexed.
Generally speaking, the following scripts require an intermediate to advanced level of technical ability:
Web-based Generators
You can also use a web-based generator to create a site map file for you. Web-based generators will only locate pages on your website that are linked from the initial page you specify.
A web-based generator will crawl your site in much the same way that a search engine would. So, if the generator can find it, chances are the engine already knows about it. The exception to this would be for new sites not yet indexed.
Windows-based Generators
A Windows-based generator is a software program you install on your Windows-based desktop computer. The best software program we've found so far is the freeware Gsitemap, by VIGOS Software.
This program will connect to your web site and generate a sitemap based on the criteria you specify. You can also import log files or URL lists to generate your sitemap. Once created, Gsitemap supports uploading the sitemap file and automatically notifying Google.
Submitting your Sitemap
Once you've created your site map, upload it to a publicly accessible location on your web server. Unlike a robots.txt file, your site map file doesn't have to be located in the root web directory of your web server. Google will accept all URLs under the directory where you post the Sitemap file.
For example, if you post a Sitemap at www.domain.com/dir/sitemap.xml, they'll assume that you have permission to submit information about URLs that begin with www.domain.com/dir/ since, obviously, server access is required for someone to post a file at that location.
To submit a site map to Google, you must first create a Google Account. If you're already using Gmail or another of Google's services that require a login (other than AdWords or AdSense), you already have a Google account.
If you don't yet have an account, you can create one for free at the Google Sitemaps homepage:
Once you've logged into your account and arrived at the Sitemaps homepage, click the Add a Sitemap link and paste the URL into your site map. Within a few hours of submitting your site map, Google will download it and let you know whether or not it encountered any errors. Be sure to check back with Google within a day of submitting to verify that everything got processed without a hitch.
Remember, the service is in testing (beta) mode so, for now, it's reasonable to expect a few bumps in the road.
Benefits of Using a Site Map
More efficient crawling
Providing the search engines with a personalized map to the pages within your site is your best strategy for getting your site categorized and indexed exactly how you'd like it to be. With the metadata supported sitemap you're providing a ready-to-use map of your pages that's prioritized and tagged with hints about update frequencies and last-modified dates.
You're basically removing all the guesswork out of crawling your site. Over the long run we believe this will facilitate more frequent updates of your important content within Google's index.
No waiting lines
Since you can resubmit your site map at any time, you don't have to wait until the spiders come crawling for the engines to pick up your new pages.
An alternative to Yahoo's paid inclusion?!?
In the past we've recommended Yahoo's paid inclusion as an alternative to getting hard-to-index pages found by Google. Now Google is effectively offering a free channel into their index. They're also making the entire protocol available for use by any other search engine which means the Sitemaps protocol is likely to become an industry standard for webpage submission and update notification.
Not a replacement for 'regular' SEO
Google states that using a Sitemap feed doesn't automatically mean better listings in the organic search engine results pages. Your pages will still be subject to the same ranking algorithms as sites that don't use a Sitemap feed. Sitemap is intended to be a complement to, not a replacement of, their regular spidering and crawling the web to index pages. Google's hope, however, is that the hints being offered through the use of server-based Sitemap XML files will help them do a better job than the regular crawl while saving bandwidth, resources and, ultimately, money.
The advantages for the webmaster include getting more of your hard-to-crawl pages listed in the index than a site that doesn't use the Sitemap protocol—and sometimes that's all that's standing between you and your competition.
A word to the wise, don't spam this format...
Over the past five years, Google has considerably increased their ability to detect and eliminate search engine spam. Our opinion is that it would be foolish to list pages that use objectionable techniques in a Sitemap XML feed—something akin to raising one's hand in a police lineup.
Clearly it would be too easy to get caught. Bear in mind that Google can choose at any time to use their own nuclear option—banning a site for life. So, be smart. Forgo the temptation to push the envelope for short term gains and play it straight. We're sure you'll be better off in the long run for having done so.
For instance, bear in mind that the priority tag suggests relative priority. So, if you happen to set every page to 1.0 (defined as the highest priority) it'll literally mean that all of your pages are equal just as if you had set no priority at all.
If you exaggerate update frequency or fudge on last update tags, Google can easily figure that out and flag your domain as one that provides unreliable metadata. Remember, these tags aren't commands—they're hints. Google is under no obligation to follow your hints but you can bet they are taking notes, making a list, and checking it twice. There's every reason for you to play it straight and no clear benefit to gain by cheating. Ultimately, you'll want your site(s) listed in their white-hat database, not their black-hat one.
Also bear in mind that Google cannot guarantee they'll crawl or index all of your URLs. Their primary goal is to gain a relational understanding of the data in the hopes to get more of it into their crawls and, ultimately, into their indices. Spamming them at this stage would be like painting a bulls-eye over the heart of your business–not smart!
Learn how now, or at least soon...
We strongly suspect the Sitemap XML protocol will become the submission standard of the not-so-distant future. Therefore, we recommend that you budget in the time it takes to negotiate the learning curve—or at least assign the task to someone within your company.
While there's no doubt that Google will maintain their standard crawler method of finding and indexing pages in the near-term, there's are so many incentives for them to shift the emphasis to the Sitemap XML protocol in the long term. For those of us that make the adjustment now (or at least soon), the search-marketing-scape will be all-that-much more comprehensible when the protocol becomes the de facto standard for getting websites indexed.
In other words, Google has begun training us to do it their way. And we don't really see that there's much choice because the advantages to the engines of the Sitemap XML protocol are just too numerous and compelling for them to pass up.
Are we having fun yet?

Nuk ka komente:
Posto një koment