I retired from personal blogging in July 2008.
But you can find me over at http://blog.xero.com.

Cringley on Google Universal Search
Posted by rod@drury.net.nz in Google, Microsoft, TechBiz at 1:20 pm on Sunday, 20 May 2007

Another good Cringley column. I enjoy reading his take on big boy strategy.

Risk is for Losers.

Universal Search is Google’s attempt to destroy its major competitors who, like Gorbachev in the waning years of the USSR, have to follow suit and start spending money they don’t have if they want to even appear to still be in competition with Google. This means for these companies more software development, more sweeps of the web, as well as the greater likelihood that among their top results will be pages located at Google properties like YouTube.

Trackback uri |

Comments(6)

    Comment by Matt at 11:04 am on 21 May 2007

    FYI, Cringely is not a real person: http://en.wikipedia.org/wiki/Robert_X_Cringely. Something I just learned recently.




    Comment by Rod at 5:58 pm on 21 May 2007

    I did not know that. Thanks.




    Comment by Zeno Davatz at 9:23 pm on 21 May 2007

    The thing I do not like about the common search engines is, that they do not recognize documents with similar content. It happens often on the Web that a post or document is spread out over more then 50 websites. Now that is great for the author but not for the searcher because it blows up your search result unnecessarily. With InfoCodex this will not happen because the linguistical database recognizes similar documents and puts them into groups. This does not blow up your search result unnecessarily.

    http://www.ywesee.com/pmwiki.php/Ywesee/InfoCodexProcedure

    Three things a modern Search engine should do:

    1. Automatically classify a document according to its content.
    2. Automatically generate an abstract of a document.
    3. Generate a Heat-Map of the Contents of a Search Result.

    http://www.ywesee.com/uploads/Main/InfoCodex_22.2.2007.pdf




    Comment by Falafulu Fisi at 12:01 am on 22 May 2007

    Cringely quoted:
    No president could spend money like Ronald Reagan could spend money. His greatest legacy, in fact, was spending so much on defense projects like his “Star Wars” anti-missile system that the USSR was torn apart economically by simply trying to compete, thus ending the Cold War.

    We all know that President Reagan hyped up star wars program, just to drive the USSR military spending up higher and perhaps to bankrupt them. The Star War technology presented to the public, was far inferior than what it was first made out to be. The capability of such defense systems were portrayed to be like the movie star-wars with its state of the art lasers that are capable of shooting ICBM, which was in fact, the technology at the time couldn’t have achieved such a capability.

    I am not sure about Google’s announcement of its Universal Search Engine to the public, but I wouldn’t be surprised if it is really another Ronald Reagan hype. I am not disputing that Google R&D team are spearheading the development of innovative technologies, but they only integrated Latent Semantic Indexing (LSI)technology into their search engine over the last 2 or 3 years perhaps, where LSI had been published and available from literatures since the late 1980s. The adoption of LSI by Google is mentioned here. This to me shows that they are slow to adopt or slow paced in inventing new technologies. Again I am only making inference based on information available from the public domain. Meanwhile Microsoft had hired Prof. Susan Dumais the inventor of LSI, to lead research in information retrieval.

    There is no doubt that Google is driving its research to develop a universal engine, but I still think that they have some catch up to do with Microsoft on the sum of the parts, because clearly Microsoft had work on the individual parts of such a universal search systems from years earlier. All it takes for Microsoft to do is to bundle the parts to make a one whole system (universal) and they would be there. Here is another article from Microsoft in the area of image search retrieval systems, where to the best of my knowledge, Google is learning of how to do it.

    Text-Search Tricks Speak Volumes in Image Search

    Also, recently, Microsoft had hired the guy who lead the datamining group at Amazon for the development of its automated online product recommendation engine, Dr Ron Kohavi. So, I can see that the competition is fierce out there amongst the biggies.




    Comment by Falafulu Fisi at 11:43 am on 22 May 2007

    Zeno said…
    The thing I do not like about the common search engines is, that they do not recognize documents with similar content.

    That is exactly what Latent Semantic Indexing (LSI) is for, that is to detect content similarity of documents. LSI search is content-based , while Google is link-based (ie, one document contains a link that points to another document, etc) using PageRank, which is not content aware at all and I don’t know how Google had integrated LSI into its pageRank algorithm. The technique to combine those search-type algorithms into one, is still in its infancy, where there had been a number of publications that had been available from researchers over recent years of how to solve such problems by using multi-linear algebra. There were a few presentations on this very topic at the Algorithms for Modern Massive Datasets workshop at Stanford last year (2006). I suspect that Google computes both the LSI & PageRank differently and then somehow , combine them in some way.

    Zeno said…
    Three things a modern Search engine should do:

    1. Automatically classify a document according to its content.
    2. Automatically generate an abstract of a document.
    3. Generate a Heat-Map of the Contents of a Search Result.

    Some commercial search engines have already achieved those capability. No, algorithm is 100% perfect and LSI & PageRank are no exception. However existing algorithms are always improved by researchers in different computing disciplines. There had been a tremendous amount of publications in the areas of search engine over the past few years that appeared in a varieties of journals, such as the freely available Journal of Machine Learning Research (JMLR) and Neural Information Processing Systems (NIPS). I am sure that commercial search engine vendors are busy & continually scouring the literatures such as JMLR, NIPS and others to look for new ways of perfecting their products.




    Comment by Falafulu Fisi at 6:05 pm on 25 May 2007

    I thought that this might be of interest to some readers here at Rod’s blog site. This event was attended by reps from Google (Director of Research Prof. Peter Norvig), Yahoo, Microsoft, plus other vendors and also members of the academic research community. I do subscribe to the BISC newsletter and the links are just cut & paste below:

    *********************************************************************
    Berkeley Initiative in Soft Computing (BISC)
    *********************************************************************

    The videos from the Cognitive Computing event held on May 2-3 are now online at
    http://www.citris-uc.org/article/cognitive_computing_2007_videos
    http://www.citris-uc.org/CognitiveComputing07

    OR

    http://www-bisc.eecs.berkeley.edu/CognitiveComputing07/CognitiveComputing2007Video.htm
    http://www-bisc.eecs.berkeley.edu/CognitiveComputing07/

    —————————

    The videos from the Future of Search meeting held on May 4 are now online at:
    http://www.citris-uc.org/article/future_search_2007_videos
    and linked through the main meeting site:
    http://www.citris-uc.org/FutureSearch

    OR

    http://www-bisc.cs.berkeley.edu/FutureSearch/FutureSearchVideo.htm
    http://www-bisc.cs.berkeley.edu/FutureSearch/