When was the last time you wanted some information and it wasn’t within your reach? Well, barring nuclear codes and Death Star plans, search engines have brought information literally at your fingertips. We’ve come a long way from primitive filing and indexing systems to the modern day million-results-under-a-second nature of search systems and it wasn’t always this easy.
The Memex Idea
While a simple Google search would tell you the origin of web search was around the 1980’s, the actual concept of an indexed search goes quite further in the past beyond that. In his article ‘As we may think’ published in the Atlantic Monthly in July 1945, Vannevar Bush proposed the idea of a virtually limitless, fast, reliable, extensible, associative memory storage and retrieval system. He named this device a memex.
Although his original vision was more in line with what Wikipedia is now, the most important part of his concept was the “associative trail”, which was pretty close to what hypertext is now. To begin with, information is assumed to be stored on microfilms. The associative trail would be a method to create a linear sequence of such microfilms across an arbitrary number of microfilms. This would be achieved by creating physical pointers that would permanently join a pair of microfilms. With multiple such combinations, the goal was to create a chain of such links to quickly traverse through related information. A pioneer in the the work done on the first hypertext system in the 1960s and the creator of the term ‘hypertext’, Ted Nelson credited Bush as his main influence.
Ted Nelson coined the term hypertext
Ted Nelson is the creator of Project Xanadu. He started the project in the year 1960 with the goal of solving problems like attribution by creating a computer network that was easy to use and connect to. While the project itself failed to flourish, much of his work forms the basis for the World Wide Web. To look at it simply, his vision of the internet contains links that are two way instead of one-way, thus giving each and every link a context.
Gerard Salton is known as the “father of information retrieval” mainly due to his development of the vector space model that is now commonly used for information retrieval. In this model, documents and search queries are represented as vectors of term counts. He also initiated the development of the SMART (System for the Mechanical Analysis and Retrieval of Text) Information Retrieval System when he was at Harvard, and his team at Cornell University actually developed the system in the 1960s.
From Riverdale to Mountain View
From its inception in 1989 upto September 1993, the World Wide Web was entirely manually indexed by hand. Inventor and scientist Tim Berners Lee maintained a list of servers and hosted the same on the CERN web server, but soon the advent of a large number of websites made this system impossible for continued usage.
Tracing the emergence of web based search engines might divert your attention to a certain comic universe. Created in 1990, the first web search engine, that looked for content instead of users, was named Archie by its creator Alan Emtage from McGill University in Montreal. It wasn’t due to any particular character fondness - he originally wanted to call it ‘archives’ but since he had to comply with the Unix standard of assigning programs and files short and cryptic names like cat, grep etc he shortened it to Archie.The primary method of file storage and retrieval around that time was via File Transfer Protocol that involved setting up FTP clients and servers. Even with the arrival of anonymous FTP sites that allowed file hosting, files were still scattered on multiple servers and could only be located by online word of mouth via forums or mailing lists announcing their availability. This changed with the arrival of Archie.
The first web search engine - Archie
Using a script based data gatherer to retrieve directory listings from such FTP hosting sites and a regular expression matcher program to corroborate search queries, Archie basically scoured the internet to index any files it came across while giving users access to its database. Pretty soon, in 1993, Veronica (Very Easy Rodent-Oriented Netwide Index to Computer Archives) joined the party. The University of Nevada System Computing Services group created a system pretty similar to Archie, but for Gopher files. Another system called Jughead (Jonzy’s Universal Gopher Hierarchy Excavation And Display), a little rougher around the edges, arrived soon seemingly just to complete the trio. Veronica provided a keyword based search of most Gopher menu titles in the entire Gopher menu listings, whereas Jughead was a tool for obtaining menu information from specific Gopher servers.
Around the same time Oscar Nierstrasz from the University of Geneva created a series of Perl based scripts for the purpose of indexing. His work later formed the foundation for the W3Catalog, world’s first primitive search engine released in September 1993. In the same year, in June, MIT based Mathew Gray created the world’s first web robot named the World Wide Web Wanderer, whose purpose was to generate ‘Wandex’, an index. By 1995, the Wanderer succeeded in counting the number of websites on the internet.
The title of the world’s second search engine goes to Aliweb, which also released in November 1993. This engine did not use automation and instead trusted web administrators to notify the existence of each site on an Index file. Jumpstation, created by Jonathan Fletcher in December 1993, was a search engine that put to use a web robot for indexing and a web form to take the query in. Now that we think about it, the Jumpstation was the first one to comply with the three essential features of a Search engine - crawling, indexing and searching, and it’s only limitations were due to the weak hardware it had to run on. The first text search based on a crawler was WebCrawler that emerged in 1994. This one allowed users to search for words within a webpage. After this, a number of search engines emerged that piqued public interest were like Magellan,Excite, Infoseek, Inktomi, Northern Light, AltaVista and even Yahoo!
The Google we know started off by selling advertising spots in 1998 from a small search engine company called goto.com, suddenly making the Search Engine business highly lucrative. Around the turn of the millennium is when Google’s search engine finally started gaining popularity. They greatly improved the quality of their search with their Pagerank algorithm, an iterative algorithm that took into account the quality of a page for optimizing search results. It assigned a Pagerank to a website on the basis of the number of websites linking to it along with their Pagerank. This is based on the logic that websites with good content are linked to by outside sources more often than those with bad quality content.
Searching for the future
Pagerank has since evolved to take into account much more complicated factors. Talking about the direction where their Search operation is headed, then Google CEO Eric Schmidt said in 2007 that they want the Search to be more conversational and contextual - like “What should I eat today” etc. In its August 2009 rollout of Caffeine, a Search Architecture, Google brought about changes like heavier keyword weighing and domain age. They once again switched to Hummingbird, another new update that not only improved performance but also the kind of searches that take place nowadays, making it more conversational. And that’s merely one platform we are talking about.
Searching by voice and searching by gesture are almost here, especially if you take into account the increasing types of devices coming online. Also, all of this data is often used by Google to show that a search company is also one of the most reliable sources for analytics and trend predictions.