全文预览

探索搜索引擎爬虫毕业论文外文翻译

上传者:你的雨天 |  格式:doc  |  页数:18 |  大小:0KB

文档介绍
e that was read by the URL server. Typically,three to four crawler machines were used, so the entire system required between four and eight machines. Research on Web crawling continues at Stanford even after Google has been transformed into mercial effort.The Stanford Web Base project has implemented a high performance distributed crawler,capable of downloading 50 to 100 documents per second.Cho and others have also developed models of documents update frequencies to inform the download schedule of incremental crawlers. The Archive also used multiple machines to crawl the Web.Each crawler process was assigned up to 64 sites to crawl, and no site was assigned to more than one crawler.Each single-threaded crawler process read a list of seed URLs for its assigned sited from disk int per-site

收藏

分享

举报
下载此文档