Behind the Algorithm Scene: How Do Search Engines Work?
One of the most commonly asked questions in the tech industry is the complicated nature of understanding how search engines work. Google, Bing and Yahoo all use similar protocols, while terminology may slightly differ. Highlighted below are answers to how Google really works.
- Indexing Versus Crawling – Indexed simply means that Google has added the web address to their database. This is different from a website being crawled. Indexing is required prior to crawling. Crawling involves Google using a complicated proprietary algorithm that determines if the website should be stored in a permanent index. This means that not all websites are indexed or crawled. It’s important to note that Bing and Yahoo respect noindex programming commands, while Google does not. To simplify, consider the following:
- Internet ⟷ Crawler ⟶ URLs ⟶ Parser ⟶ Indexing ⟶ Index ⟶ SEARCH
- It’s important to note that Crawler also extends from URL to Scheduler and then to Crawer. Crawler can also bypass URLs and go directly to Parser.
- Parsing is defined as the process that involves analyzing strings of symbols, albeit it computer or natural languages.
- Link Processing – A common misconception, links are not processed when the website is crawled. This means that PageRank is not determined by crawls, but is processed separately by Google.
- PageRank – This measures the quality and quantity of a site’s links. This has nothing to do with the keywords. Bad links can be disavowed, which has the identical effect of nofollow source links, which are accepted by Google. To help promote PageRank, it’s best that websites avoid the four following items:
- Nofollow directives;
- Disallow directives;
- 404 errors on originating pages; and
- 404 errors on destination pages.
Additional tips and tricks for companies to remember include the following:
- Robots.txt blocks a page, not a website, from being crawled. This means that Google can associate the words on the page with links, but is unable to fully crawl the page’s content.
- Parameter exclusions, canonicals and other elements are processed when Google learns about the URL and page and when the page is indexed and/or crawled.
- 302 redirects are acceptable and will pass PageRank.
- PageRank cannot be controlled with refer-based tracking.
Adaptivity Pro, a leading Utah web design expert, specializes in SLC web design and Utah SEO. Their team of advanced designers, programmers and copywriters work together to create high-quality content that is targeted by Google, Yahoo and Bing.