首页
开源
资讯
活动
开源许可证
软件工程云服务
软件代码质量检测云服务
持续集成与部署云服务
社区个性化内容推荐服务
贡献审阅人推荐服务
群体化学习服务
重睛鸟代码扫描工具
登录
注册
代码拉取完成,页面将自动刷新
Watch
58
Star
101
Fork
41
xautlx
/
nutch-ajax
Fork 仓库
加载中
取消
确认
代码
Issues
2
Pull Requests
0
Wiki
1
统计
更新失败,请稍后重试!
Issues
/
详情
移除标识
内容风险标识
本任务被
标识为内容中包含有代码安全 Bug 、隐私泄露等敏感信息,仓库外成员不可访问
FetcherJob <-batchId>命令执行出错
待办的
#I7492
hustszh
创建于
2015-07-27 09:24
您好,我在使用FetcherJob时,使用-all参数时,nutch可以顺利抓取网页内容的,但我将-all参数替换成本轮GeneratorJob刚生成的batchId时,执行FetcherJob就会出现以下的错误: FetcherJob: starting at 2015-07-26 19:57:27 FetcherJob: batchId: 1437911548-1520592662 FetcherJob: threads: 10 FetcherJob: parsing: false FetcherJob: resuming: false FetcherJob : timelimit set for : -1 http.proxy.host = null http.proxy.port = 8080 http.timeout = 10000 http.content.limit = -1 http.agent = Your Nutch Spider/Nutch-2.3 http.accept.language = ja-jp, en-us,en-gb,en;q=0.7,*;q=0.3 http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 java.lang.IllegalArgumentException: can't serialize class org.apache.avro.util.Utf8 at org.bson.BasicBSONEncoder._putObjectField(BasicBSONEncoder.java:299) at org.bson.BasicBSONEncoder.putObject(BasicBSONEncoder.java:194) at org.bson.BasicBSONEncoder.putObject(BasicBSONEncoder.java:136) at com.mongodb.DefaultDBEncoder.writeObject(DefaultDBEncoder.java:36) at com.mongodb.OutMessage.putObject(OutMessage.java:289) at com.mongodb.OutMessage.writeQuery(OutMessage.java:211) at com.mongodb.OutMessage.query(OutMessage.java:86) at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:81) at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:66) at com.mongodb.DBCursor._check(DBCursor.java:498) at com.mongodb.DBCursor._hasNext(DBCursor.java:621) at com.mongodb.DBCursor.hasNext(DBCursor.java:657) at org.apache.gora.mongodb.query.MongoDBResult.nextInner(MongoDBResult.java:69) at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:114) at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:119) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:531) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Using queue mode : byHost Fetcher: threads: 10 QueueFeeder finished: total 0 records. Hit by time limit :0 Fetcher: throughput threshold: -1 Fetcher: throughput threshold sequence: 5 -finishing thread FetcherThread0, activeThreads=9 -finishing thread FetcherThread3, activeThreads=7 -finishing thread FetcherThread1, activeThreads=7 -finishing thread FetcherThread2, activeThreads=6 -finishing thread FetcherThread4, activeThreads=5 -finishing thread FetcherThread8, activeThreads=4 -finishing thread FetcherThread5, activeThreads=3 -finishing thread FetcherThread6, activeThreads=2 -finishing thread FetcherThread7, activeThreads=1 -finishing thread FetcherThread9, activeThreads=0 0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs in 0 queues -activeThreads=0 FetcherJob: finished at 2015-07-26 19:57:37, time elapsed: 00:00:09 请问这是什么问题引起的?应该怎样修正?谢谢。 我的相关配置文件如下: ----------seed.txt http://blog.tianya.cn/ ---------- ----------regex-urlfilter.txt +^http://blog.tianya.cn/$ +^http://blog.tianya.cn/.*.shtml$ ----------
您好,我在使用FetcherJob时,使用-all参数时,nutch可以顺利抓取网页内容的,但我将-all参数替换成本轮GeneratorJob刚生成的batchId时,执行FetcherJob就会出现以下的错误: FetcherJob: starting at 2015-07-26 19:57:27 FetcherJob: batchId: 1437911548-1520592662 FetcherJob: threads: 10 FetcherJob: parsing: false FetcherJob: resuming: false FetcherJob : timelimit set for : -1 http.proxy.host = null http.proxy.port = 8080 http.timeout = 10000 http.content.limit = -1 http.agent = Your Nutch Spider/Nutch-2.3 http.accept.language = ja-jp, en-us,en-gb,en;q=0.7,*;q=0.3 http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 java.lang.IllegalArgumentException: can't serialize class org.apache.avro.util.Utf8 at org.bson.BasicBSONEncoder._putObjectField(BasicBSONEncoder.java:299) at org.bson.BasicBSONEncoder.putObject(BasicBSONEncoder.java:194) at org.bson.BasicBSONEncoder.putObject(BasicBSONEncoder.java:136) at com.mongodb.DefaultDBEncoder.writeObject(DefaultDBEncoder.java:36) at com.mongodb.OutMessage.putObject(OutMessage.java:289) at com.mongodb.OutMessage.writeQuery(OutMessage.java:211) at com.mongodb.OutMessage.query(OutMessage.java:86) at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:81) at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:66) at com.mongodb.DBCursor._check(DBCursor.java:498) at com.mongodb.DBCursor._hasNext(DBCursor.java:621) at com.mongodb.DBCursor.hasNext(DBCursor.java:657) at org.apache.gora.mongodb.query.MongoDBResult.nextInner(MongoDBResult.java:69) at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:114) at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:119) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:531) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Using queue mode : byHost Fetcher: threads: 10 QueueFeeder finished: total 0 records. Hit by time limit :0 Fetcher: throughput threshold: -1 Fetcher: throughput threshold sequence: 5 -finishing thread FetcherThread0, activeThreads=9 -finishing thread FetcherThread3, activeThreads=7 -finishing thread FetcherThread1, activeThreads=7 -finishing thread FetcherThread2, activeThreads=6 -finishing thread FetcherThread4, activeThreads=5 -finishing thread FetcherThread8, activeThreads=4 -finishing thread FetcherThread5, activeThreads=3 -finishing thread FetcherThread6, activeThreads=2 -finishing thread FetcherThread7, activeThreads=1 -finishing thread FetcherThread9, activeThreads=0 0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs in 0 queues -activeThreads=0 FetcherJob: finished at 2015-07-26 19:57:37, time elapsed: 00:00:09 请问这是什么问题引起的?应该怎样修正?谢谢。 我的相关配置文件如下: ----------seed.txt http://blog.tianya.cn/ ---------- ----------regex-urlfilter.txt +^http://blog.tianya.cn/$ +^http://blog.tianya.cn/.*.shtml$ ----------
评论 (
0
)
登录
后才可以发表评论
状态
待办的
待办的
进行中
已完成
已关闭
负责人
未设置
标签
未设置
标签管理
里程碑
未关联里程碑
未关联里程碑
Pull Requests
未关联
未关联
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
未关联
未关联
master
snapshot
开始日期   -   截止日期
-
置顶选项
不置顶
置顶等级:高
置顶等级:中
置顶等级:低
优先级
不指定
严重
主要
次要
不重要
参与者(1)