I retired from personal blogging in July 2008.
But you can find me over at http://blog.xero.com.
Another good Cringley column. I enjoy reading his take on big boy strategy.
Universal Search is Google’s attempt to destroy its major competitors who, like Gorbachev in the waning years of the USSR, have to follow suit and start spending money they don’t have if they want to even appear to still be in competition with Google. This means for these companies more software development, more sweeps of the web, as well as the greater likelihood that among their top results will be pages located at Google properties like YouTube.

FYI, Cringely is not a real person: http://en.wikipedia.org/wiki/Robert_X_Cringely. Something I just learned recently.
I did not know that. Thanks.
The thing I do not like about the common search engines is, that they do not recognize documents with similar content. It happens often on the Web that a post or document is spread out over more then 50 websites. Now that is great for the author but not for the searcher because it blows up your search result unnecessarily. With InfoCodex this will not happen because the linguistical database recognizes similar documents and puts them into groups. This does not blow up your search result unnecessarily.
http://www.ywesee.com/pmwiki.php/Ywesee/InfoCodexProcedure
Three things a modern Search engine should do:
1. Automatically classify a document according to its content.
2. Automatically generate an abstract of a document.
3. Generate a Heat-Map of the Contents of a Search Result.
http://www.ywesee.com/uploads/Main/InfoCodex_22.2.2007.pdf
Cringely quoted:
No president could spend money like Ronald Reagan could spend money. His greatest legacy, in fact, was spending so much on defense projects like his “Star Wars” anti-missile system that the USSR was torn apart economically by simply trying to compete, thus ending the Cold War.
We all know that President Reagan hyped up star wars program, just to drive the USSR military spending up higher and perhaps to bankrupt them. The Star War technology presented to the public, was far inferior than what it was first made out to be. The capability of such defense systems were portrayed to be like the movie star-wars with its state of the art lasers that are capable of shooting ICBM, which was in fact, the technology at the time couldn’t have achieved such a capability.
I am not sure about Google’s announcement of its Universal Search Engine to the public, but I wouldn’t be surprised if it is really another Ronald Reagan hype. I am not disputing that Google R&D team are spearheading the development of innovative technologies, but they only integrated Latent Semantic Indexing (LSI)technology into their search engine over the last 2 or 3 years perhaps, where LSI had been published and available from literatures since the late 1980s. The adoption of LSI by Google is mentioned here. This to me shows that they are slow to adopt or slow paced in inventing new technologies. Again I am only making inference based on information available from the public domain. Meanwhile Microsoft had hired Prof. Susan Dumais the inventor of LSI, to lead research in information retrieval.
There is no doubt that Google is driving its research to develop a universal engine, but I still think that they have some catch up to do with Microsoft on the sum of the parts, because clearly Microsoft had work on the individual parts of such a universal search systems from years earlier. All it takes for Microsoft to do is to bundle the parts to make a one whole system (universal) and they would be there. Here is another article from Microsoft in the area of image search retrieval systems, where to the best of my knowledge, Google is learning of how to do it.
Text-Search Tricks Speak Volumes in Image Search
Also, recently, Microsoft had hired the guy who lead the datamining group at Amazon for the development of its automated online product recommendation engine, Dr Ron Kohavi. So, I can see that the competition is fierce out there amongst the biggies.
Zeno said…
The thing I do not like about the common search engines is, that they do not recognize documents with similar content.
That is exactly what Latent Semantic Indexing (LSI) is for, that is to detect content similarity of documents. LSI search is content-based , while Google is link-based (ie, one document contains a link that points to another document, etc) using PageRank, which is not content aware at all and I don’t know how Google had integrated LSI into its pageRank algorithm. The technique to combine those search-type algorithms into one, is still in its infancy, where there had been a number of publications that had been available from researchers over recent years of how to solve such problems by using multi-linear algebra. There were a few presentations on this very topic at the Algorithms for Modern Massive Datasets workshop at Stanford last year (2006). I suspect that Google computes both the LSI & PageRank differently and then somehow , combine them in some way.
Zeno said…
Three things a modern Search engine should do:
1. Automatically classify a document according to its content.
2. Automatically generate an abstract of a document.
3. Generate a Heat-Map of the Contents of a Search Result.
Some commercial search engines have already achieved those capability. No, algorithm is 100% perfect and LSI & PageRank are no exception. However existing algorithms are always improved by researchers in different computing disciplines. There had been a tremendous amount of publications in the areas of search engine over the past few years that appeared in a varieties of journals, such as the freely available Journal of Machine Learning Research (JMLR) and Neural Information Processing Systems (NIPS). I am sure that commercial search engine vendors are busy & continually scouring the literatures such as JMLR, NIPS and others to look for new ways of perfecting their products.
I thought that this might be of interest to some readers here at Rod’s blog site. This event was attended by reps from Google (Director of Research Prof. Peter Norvig), Yahoo, Microsoft, plus other vendors and also members of the academic research community. I do subscribe to the BISC newsletter and the links are just cut & paste below:
*********************************************************************
Berkeley Initiative in Soft Computing (BISC)
*********************************************************************
The videos from the Cognitive Computing event held on May 2-3 are now online at
http://www.citris-uc.org/article/cognitive_computing_2007_videos
http://www.citris-uc.org/CognitiveComputing07
OR
http://www-bisc.eecs.berkeley.edu/CognitiveComputing07/CognitiveComputing2007Video.htm
http://www-bisc.eecs.berkeley.edu/CognitiveComputing07/
—————————
The videos from the Future of Search meeting held on May 4 are now online at:
http://www.citris-uc.org/article/future_search_2007_videos
and linked through the main meeting site:
http://www.citris-uc.org/FutureSearch
OR
http://www-bisc.cs.berkeley.edu/FutureSearch/FutureSearchVideo.htm
http://www-bisc.cs.berkeley.edu/FutureSearch/