分享

[Nutch-dev] MD5 in fetchlist / fetcher

 accesine 2005-09-24

[Nutch-dev] MD5 in fetchlist / fetcher

Michael Ji
Fri, 19 Aug 2005 20:09:27 -0700

hi there,

I dumped the contents in segment/fetchlist and
segment/fetcher;

My curious question is that: why MD5 signature of the
page content doesn‘t save in fetchlist? 

In my mind, I think it will save CPU time if we see a
page unchanged --- coz we can skip the parsing
process; From my view, if we have MD5 in fetchlist, we
can do it directly in memory. If we have MD5 in
fetcher, we need to search it in local file in order
to do compare with the new fetched page content MD5.

Did I miss some important points or my dumping is
wrong?

thanks,

Michael Ji 

----------------fetchlist--------------------
fetch: true
page: Version: 4
URL: http://www.sina.com/
ID: d6a83e9c17e05d5602709a63c241bf68
Next fetch: Sun Aug 21 20:15:06 CDT 2005
Retries since fetch: 0
Retry interval: 30 days
Num outlinks: 0
Score: 1.0
NextScore: 1.0

anchors: 0

----------------fetcher--------------------
fetch: true
page: Version: 4
URL: http://www.sina.com/
ID: d6a83e9c17e05d5602709a63c241bf68
Next fetch: Sun Aug 21 20:15:06 CDT 2005
Retries since fetch: 0
Retry interval: 30 days
Num outlinks: 0
Score: 1.0
NextScore: 1.0

anchors: 0
Fetch Result:
MD5Hash: 56eae3c2556cb10a00e7346738dcb318
ProtocolStatus: success(1), lastModified=0
FetchDate: Sun Aug 14 20:15:13 CDT 2005




__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www./bsce5sf
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.
https://lists./lists/listinfo/nutch-developers

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多