|
|
|||
|
|
Atomz NB: Atomz has not fully indexed my site pending a change to their system to address the Meta Robots (noindex) tag. It indexed 500 pages and stopped. See the discussion on the Extended Search Page for more details. However, a sufficient quantity and diversity has been indexed to show the two different types of results pages. The Fully Custom version shows what is possible with HTML editing. The Template Configured page was created only by completing forms.
Fully Custom Results Page
Template Configured Results Page
FreeFind
PicoSearch
SearchButton NB: This is still under development. The Simple Search seems to be working. I'm still experimenting with the Advanced Search and don't quite have it right yet, but they are helping me with it. See the discussion on the Extended Search Page for more details. Simple Search
Advanced Search
SiteMiner
Thunderstone N. B.: Because of the total space limit, Thunderstone will not index my entire site. However, it is included to permit you to see how it works and how it can be configured. Also, because Thunderstone uses the "values" of the two buttons, these cannot be changed as they have been for all other forms on this page. Simple Search
Advanced Search
SiteMiner
whatUseek (IntraSearch)
Overview Before I discuss the specific features and options of each engine I am using, I need to present a few overviews. Which Engines This review presents only free search engines that can be used to provide search services within a website. I am working with each one I can find. If you find one I don't know about, please email me. Which Service Options All of these free services also offer paid options. The paid options offer additional features not offered under the free options. At this time, this review does not include any of the paid options and does not explore the differences between the free option and the paid option. Results Formatting Free search services present their results in three basic formats:
Results Configuration Under the Configuration option, you can control general colors, layouts, fonts, and maybe include a logo of your site. You might be able to specify whether the links open in the same window or a new window. All the services offer this level of results formatting. Results Wrapper Under the Wrapper option, you get everything you got with Configuration, plus you can specify HTML to "wrap" around the results. This lets you construct a page that "looks like" your site and put the Configured results onto your page. Note that sites offering a Results Wrapper do so as an option. You can always use the simpler Configuration. Complete Results Control Under the Complete Control option, you get the ability to completely control the presentation of results. This is typically done by specifying the HTML for the results page and including special tags to control the formatting of the results. Note that sites offering Complete Control do so as an option. You can always use simpler solutions such as Configuration or Wrapper. WhatUseek offers a level of Configuration I have classified as Wrapper Plus. I could also have called it Complete Control Minus. It provides some scripting, but not complete control of the appearance of each result entry. For the best example, so far, of Complete Control, see the Atomz discussion below. Search & Display Options In addition, the services take two different approaches to control the search and display options. These include:
The two primary approaches to control of these options is:
Search Options Configuration Under the Configuration option, you can configure the options. But you can only configure them once. You can decide, for example, to show 10 results on a page, to show them in order of relevancy and to display the summaries. However, all results pages will adhere to these "rules" and the search user cannot change them. User Control of Search Options Under the User Control option, the search user can specify, either through keywords or through search box controls, how to conduct and show a particular search. Results Display In displaying the results of a search there are several items that can be displayed:
Two of the most important are the Description and Context. Description and Context HTML pages can include a Meta Tag named Description. This tag is designed to provide a description of the page, particularly for search engines. Search engines can display this description in the search results to help describe the page. If there is no Description Meta Tag, the engine usually displays the beginning of the text on the page. In addition to the description, Context can also be displayed. This information (also called Results Context) shows the search words that were found "in context". That is, it shows the portion of the page that caused the page to be selected, including the search words. The search words are often highlighted (e.g., bolded). What To Index, What to Score and What To Search Pages consist of many parts. For example, there are:
The questions for an engine include:
Some engines let the webmaster control which parts will be indexed and how to score words in different parts. (For example, a word in the title might "count" more than a word in the body of text.) And some engines let the user who is searching, search a particular "part". (For example, the user could search only titles or only URLs for a particular word.) Language Support Searching a site is more than just doing an old fashioned text match. Today, it involves understanding language subtleties. If you search for "hire" you'd also like to find "hiring". If you search for "tree" you'd also like to find "trees". The better you try to make this, the more you need to know about the language. If your site is written in another language, this might be important. Page Counting Problems One of the problems of the free engines is how they handle page limits. My site includes hundreds of links to pages that should not be indexed. These pages include the "noindex" meta tag. It also includes tens of thousands of offsite links. So when an engine says it will handle up to 500 or 1000 or 2000 or 5000 pages, the question is "How does it count?" Does it count the pages it doesn't index? Searchbutton might (and I emphasize might) be counting some of the pages that they do not (or should not) index. We are still investigating. Atomz did when I first encountered them. But they fixed it. See the discussion below for more info on how they responded quickly and affirmatively to this issue. Kudos to them. Robots.txt, Noindex and Partial Noindex There are two "standard" ways on the web to tell a robot (also called spiders and crawlers) not to index your page. These are both discussed on the Robots Exclusion Page The first is the robots.txt file. It can be used to specify directories that should not be indexed. It can also be used to apply these non-indexing requests to specific robots. The second is the Robots Meta Tag meta tag. This appears in the head of a particular page that should not be indexed. It contains the noindex option. It may also contain the nofollow option. These tell the robot not to index the page and/or not to follow any links on the page. In addition some indexing services have added proprietary extensions to permit a webmaster to specify a portion of a page that should not be indexed. I call this Partial Noindex. Although it doesn't appear to be a formal standard, a common approach is to delimit the section with <noindex></noindex>. If I just indicate "Yes" (they support Partial Noindex) that means they use this "standard". A few have invented their own protocols. If they take a different approach, I'll tell you. (I cannot find many links for the use of <noindex></noindex>. One I did find is at the site of the ht://Dig Open Source Search Engine on their FAQ Page. Branding These services are free. There are only a couple of ways for a site to offer free services.
I indicate in my summary and in the reviews which branding approach the service has chosen: Ads or Logos. One insidious nature of banner advertising and hosted search engines is that the ad company may use the keywords to target advertising, and this may lead to privacy issues as ad banner companies associate searched words with cookie-based identification of the site's visitors. Webmasters need to carefully evaluate the ad policies of such services. What I've Learned This has been an exciting learning experience for me. I know these things. I preach them in my consulting and professional speaking, but this experience has proved that they are true.
The Engines I am using, or I am trying to use, several engines. These include:
I have their complete reviews in alphabetical order below. I also have a table that summarizes the results. Atomz
So far, this is the most customizable service I have encountered. It provides both Complete Control of results formatting and full User Control of search options. In addition to supporting one of my "standard" pages as the "shell", the details of the results are completely customizable using their scripting language. And, the search options are the most complete and flexible of any service I've encountered. For example, I am able to:
In other words, they provide completely customizable results. As far as search options, they support:
The most severe limitation of their free offering is that it will only index 500 pages. And for most sites, this is not a severe limitation at all. Consider all the other features. When I began experimenting with them I had two issues with how they processed the Robots Meta Tag (noindex).
But the day after I wrote them they wrote me back and said:
In other words, they listened and changed their routine to address both issues. Kudos to them. (www.atomz.com) FreeFind
I am working with FreeFind. My first attempt failed because:
Their refusal to honor the "noindex" tag is interesting because they provide their own proprietary substitute. If, instead of using the web standard "noindex", you are willing to use their "No Index" comment (www.freefind.com/faq.html#faq10) you will get the same result. In fact, they even have a proprietary extension (www.freefind.com/faq.html#faq11) that will inhibit indexing "part" of a page. They do honor the robots.txt file. I am working both to extend the size permitted for my site and to add a robots.txt file as a further test. They wrote back to indicate that they have increased the permitted size for my site and would respider once the increase became effective. They also said
After they increased the size allowed, they still attempted to index over 1000 pages and failed. I'm writing them again to work through this. I'll keep you posted. (www.FreeFind.com) PicoSearch
This service, being one of the last I added, went very smoothly. Part is the simplicity of their offering. Part is that I got better at it. It doesn't offer (in the free option) as much control over the results display. But it will index many pages. My inquiries about options and services were answered promptly and completely. Searchbutton
It seemed to be taking forever to index my site. When I wrote to ask why it was taking so long, they promptly wrote back to say that they only index every 12 hours. (To be fair, their "Getting Started" section says they will advise me "Within one business day (often sooner)". I just hadn't read that section of their site. I skipped it (hey, I'm a male; I don't have to read directions) and I went straight to the signup. I have recommended that they add their schedule more prominently on their site and also in their initial welcome letter.) I have not yet tested robots.txt compliance but take their word for it. For some reason the initial index indexed fewer than the expected number of pages on my site. They are working closely with me to identify and resolve this issue.They have been very responsive and helpful so far. On the plus side, their support has been excellent. Throughout the Easter weekend (when I started this crazy idea) I received prompt answers to rather picky questions. Although all the issues aren't yet resolved, I'm impressed by their support. I am still working to configure my results page and will report back as soon as I've completed that. (www.Searchbutton.com) SiteMiner
Part of MyComputer.com. Thunderstone
Thunderstone is an independent R&D company focusing on information retrieval and document management problems for over 19 years. They claim that more Internet searches are conducted by our software on a daily basis than any other available package. They primarily sell their software. But they provide a limited version of their Webinator service as an offsite search engine. They also offer a free version of the Webinator for download. The limits on their offsite service include:
Results customization permits changes to:
The 8 hours limit on reindexing is not a major issue; but it does make my "experimentation" a bit more tedious. Their maintenance process is also interesting. You sign in with your email address and your website URL. Then they email you the password to continue. And you need to get a new password for each access. It is the only site I've ever encountered using this technique. whatUseek (IntraSearch)
This formatting is more than just a Wrapper. It provides some scripting. However, it is not Complete because it does not provide full scripting. I do not have control over the exact display of each result. It also does not seem to offer any user control of search options. (intra.whatUseek.com) |
|
This page created: before Wed, 16.Aug.2000
Last updated: |
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|