Monday, November 10, 2008

Unit 10 Readings

Some were interesting, one was very dense.

Web Search Engines: Part 1
-Amazing amount of data being searched.
-Interesting that there's a politeness delay among machines.
-Speed of retrieval is unbelievable but having initiated searches & found relevant data am grateful for its existence.
-Ironic use of the world "crawl" which to me means to move slowly while in reality these 'crawls' are done, dare say at the speed of light.
-Agree wholeheartedly that:
"...Engineering a web-scale crawler is not for the unskilled or fainthearted..." So English & History majors need not apply.

Web Search Engines: Part 2
-?"... An inverted file is a concatenation of the posting lists for each distinct term..."
-# of machines & documents is astounding.
-?"... PageRank computation is an eigenvector calculation on the page-page link connectivity matrix..."
-Would say interesting article, not "fascinating".

Current Developments:
-Dense article.
-Assumption of "... high level of familiarity with how the protocol works..." indicates I probably won't understand it.

The Deep Web:Surfacing Hidden Value:
-Nice image of the Deep Web as an ocean
-Similar to today's knowledge of the ocean, only limited exploration of the deepest part because need special equipment (directed query) to withstand the water's pressure.
-Not aware of NorthernLight and Fast as search engines; nor there were search engines and search directories.
-Excellent statement: "...Discovery comes form looking at the world in new ways and with new tools..."
-Pure scientific statement: "...It has been said that what cannot be seen cannot be defined, and what is not defined cannot be understood..." or seeing is believing.
-Largest % of deep sites by subject areas with the Humanities leading with 13.5%, followed by News,Media with 12.2%.

