Google, Wherefore Art Thou Google? Sites Abandoned by Googlebot!

Google, Wherefore Art Thou Google? Sites Abandoned by Googlebot!

© August 30, 2004

As a search engine optimization specialist I often optimize
existing web pages for small business clients, upload them to
the site and see pages re-indexed by Google within a week.
This only happens with existing business sites that have been
online for a few years. Google seems to be updating their
index as often as every other week at this point and older
established sites that are already indexed seem to be re-
crawled on that twice a month schedule on a fairly routine
basis.

Two clients that hired me for recent work saw their rankings
shoot to the top for a newly targeted search phrase in a
weekend when I did optimization on a Thursday and they were
ranked instantly by Saturday. Now keep in mind that this
doesn’t happen for everyone, only those that have been online
for some period and already have significant content that
simply needs tweaking and proper title and metatag information
added. They usually have relatively good existing PageRank and
do well for other RELEVANT search phrases already. I offer that
warning only to avoid instilling false hopes in anyone hoping
to achieve the same instant ranking boost overnight.

Those clients that do succeed in this way are often thrilled
with the results accomplished in such short order. I’d love
to be able to offer that type of ranking boosts to everyone,
but some are more equal than others when it comes to easy,
inexpensive SEO tune-ups that rev up your rankings overnight.
Your mileage may vary.

WHY DO NEW SITES SUFFER?

What is going on with newer sites that don’t get crawled for
months? I’ve got a client, a newer attorney directory that
offers tons of great information in the form of articles on
specific areas of law, links to incredibly valuable and
relevant legal sites and over 600,000 attorneys listed by
practice area and state. Yet the site has not been re-crawled
by Google for over 3 months! Now this would not be such a big
issue for many sites, but this site is relatively new and we’ve
optimized all the titles, tags & page text, created a complete
site map and placed links to all these resources on the front
page.

I know that the site is not being crawled because Google’s
cached copy of the front page shows it before we did the
work four months ago, without the new links and without
title tags. We’ve submitted the site by hand, (manually)
once a month for three months via the Google Add URL page.
http://www.google.com/addurl.html When the hand submission
failed to get it re-indexed for four months, we submitted
the sitemap page, which has not been crawled at all. Google
shows only ONE page on this site, when in fact it has
thousands of pages, a sitemap and dozens static pages!

Part of the problem is that this site must be dynamic, since
a database of over 632,000 attorneys must be accessed,
retrieved and served for any of those law firms searched for
to be returned to the site visitor. Google warns owners of
dynamic sites that Googlebot may not crawl dynamically
generated pages with “?”” question marks in the URL. This is
to avoid crashing the server with too many concurrent page
requests from Google’s spider.
http://www.google.com/webmasters/2.html#A1

The solution to this dynamic URL problem has been discussed
widely in search engine forums and solutions have been bandied
about including software provided by SEO’s, URL re-write
techniques for dynamic pages on APACHE servers
http://www.alistapart.com/articles/urls/ and PHP pages
http://www.stargeek.com/php-seo.php to generate search engine
friendly URL’s. Others recommend simply adding static HTML
sitemap pages as alternatives for the search engine spiders.

In this instance the client’s developer simply said “I
can’t
do that (PHP solution) on this server”. So we resorted to
putting up the static HTML sitemap pages with hard-coded
URLS to the main 54 pages of the site at
http://lawfirm411.com/Law-Firm-411-sitemap.html This should
get at least those fifty pages crawled by Googlebot, but
Googles’ spider appears not to be crawling this site at all.

How do we know this? See for yourself by using the following
query in the search box at Google: allinurl:www.lawfirm411.com
where the result page shows ONE page in the results. If you
try that query on your own site (replace your own domain name
for lawfirm411.com), you’ll see the results lists ALL your
pages.

The site home page was crawled by Google four months ago, when
they took their “Cached Snapshot” of the page. You can see
this by visiting the Google cached page here:
http://66.102.7.104/search?sourceid=navclient&ie=U
TF-8&q=cache:www.lawfirm411.com
where the date of this snapshot is “Apr 20, 2004 07:42:19 GMT”
and they haven’t been back since. The page in that snapshot
has none of the newly added links, an outdated title tag, and
old content.

This problem is not unique to this site. One client we worked
with two years ago had a dynamically generated, framed site!
Those two site structures have always given search engines
trouble. Their site was not crawled at all and only the front
page showed up. Our solution was to create a second domain
(owned by the client), which had static HTML pages that
precisely mirrored the content of the client’s framed,
dynamically generated site. Guess what happened after
Googlebot crawled the static site? Google indexed the framed
site in full and then banned the static site from the index!

Not an approach we advocate, but the one that worked for this
client.

We’re still searching for ways to get Googlebot back to
LawFirm411.com before creating that new static site, but
decided to share this odd experience with the SEO community
before going to any extremes. Google provides over 70% of
most search engine referred traffic to ALL of our clients
and we realized we can’t site idly by and see a major client
languish because Googlebot didn’t like what it found at the
client site on the first visit four months ago.

This issue dogs newer sites in other places as well. The Open
Directory Project has also become notoriously slow in adding
new sites to the directory and in this case, has not picked
up this site even after 6 regular monthly submissions. The
web playing field may have begun tilting toward olderScience Articles,
established sites and away from new ones.