Why Search Is Important?
Search engines take advantage of reverse broadcast networks to help save time and money. They are instant answer systems that provide ultra-targeted advertising.
Search allows you to sell what your customers want, when they want it.
The web has become a direct marketer’s dream. In its infancy, the web was based around the core idea of sharing knowledge. The closer your ideas are in some way aligned with this idea, the easier it will be to promote your site.
The Goal of Search Engines & How They Work
Search Engine Relevancy
Many people think search engines have a hidden agenda. This simply is not true. The goal of the search engine is to provide high-quality content to people searching the Internet.
Search engines with the broadest distribution network sell the most advertising space. As I write this, Google is considered the search engine with the best relevancy. Their technologies power the bulk of web searches.
The Problem Listing a New Site
The biggest problem new websites have is that search engines have no idea they exist. Even when a search engine finds a new document, it has a hard time determining its quality. Search engines rely on links to help determine the quality of a document. Some engines, such as Google, also trust websites more as they age.
Parts of a Search Engine
While there are different ways to organize web content, every crawling search engine has the same basic parts:
an index (or catalog)
a search interface
Crawler (or Spider)
The crawler does just what its name implies. It scours the web following links, updating pages, and adding new pages when it comes across them. Each search engine has periods of deep crawling and periods of shallow crawling. There is also a scheduler mechanism to prevent a spider from overloading servers and to tell the spider what documents to crawl next and how frequently to crawl them.
Rapidly changing or highly important documents are more likely to get crawled frequently. The frequency of crawl should typically have little effect on search relevancy; it simply helps the search engines keep fresh content in their index. The home page of CNN.com might get crawled once every ten minutes. A popular, rapidly growing forum might get crawled a few dozen times each day. A static site with little link popularity and rarely changing content might only get crawled once or twice a month.
The best benefit of having a frequently crawled page is that you can get your new sites, pages, or projects crawled quickly by linking to them from a powerful or frequently changing page.
The index is where the spider-collected data are stored. When you perform a search on a major search engine, you are not searching the web, but the cache of the web provided by that search engine’s index.
Search engines organize their content in what is called a reverse index. A reverse index sorts web documents by words. When you search Google and it displays 1-10 out of 143,000 websites, it means that there are approximately 143,000 web pages that either have the words from your search on them or have inbound links containing them. Also, note that search engines do not store punctuation, just words.
Some people also use Cascading Style Sheets (CSS) or a blank table cell to place the page content ahead of the navigation. As far as how search engines evaluate what words are first, they look at how the words appear in the source code. I have not done significant testing to determine if it is worth the effort to make your unique page code appear ahead of the navigation, but if it does not take much additional effort, it is probably worth doing. Link analysis (discussed in depth later) is far more important than page copy to most search algorithms, but every little bit can help.
Google has also hired some people from Mozilla and is likely working on helping their spider understand how browsers render pages. Microsoft published visually segmenting research that may help them understand what page content is most important.
As well as storing the position of a word, search engines can also store how the data are marked up. For example, is the term in the page title? Is it a heading? What type of heading? Is it bold? Is it emphasized? Is it in part of a list? Is it in link text?
Words that are in a heading or are set apart from normal text in other ways may be given additional weighting in many search algorithms. However, keep in mind that it may be an unnatural pattern for your keyword phrases to appear many times in bold and headings without occurring in any of the regular textual body copy. Also, if a page looks like it is aligned too perfectly with a topic (i.e., overly-focused so as to have an abnormally high keyword density), then that page may get a lower relevancy score than a page with a lower keyword density and more natural page copy.
By storing where the terms occur, search engines can understand how close one term is to another. Generally, the closer the terms are together, the more likely the page with matching terms will satisfy your query.
If you only use an important group of words on the page once, try to make sure they are close together or right next to each other. If words also occur naturally, sprinkled throughout the copy many times, you do not need to try to rewrite the content to always have the words next to one another. Natural sounding content is best.
Words that are common do not help search engines understand documents. Exceptionally common terms, such as the, are called stop words. While search engines index stop words, they are not typically used or weighted heavily to determine relevancy in search algorithms. If I search for the Cat in the Hat, search engines may insert wildcards for the words the and in, so my search will look like
* cat * * hat.
Each page is standardized to a size. This prevents longer pages from having an unfair advantage by using a term many more times throughout long page copy. This also prevents short pages for scoring arbitrarily high by having a high