代码拉取完成,页面将自动刷新
同步操作将从 nlh774/DotnetSpider 强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!!
确定后同步将在后台操作,完成时将刷新页面,请耐心等待。
DotnetSpider, a .NET Standard web crawling library similar to WebMagic and Scrapy. It is a lightweight, efficient and fast high-level web crawling & scraping framework for .NET
https://github.com/dotnetcore/DotnetSpider/wiki
Please see the Projet DotnetSpider.Sample in the solution.
public class EntityModelSpider
{
public static void Run()
{
Spider spider = new Spider();
spider.Run();
}
private class Spider : EntitySpider
{
protected override void OnInit(params string[] arguments)
{
var word = "可乐|雪碧";
AddRequest(string.Format("http://news.baidu.com/ns?word={0}&tn=news&from=news&cl=2&pn=0&rn=20&ct=1", word), new Dictionary<string, dynamic> { { "Keyword", word } });
AddEntityType<BaiduSearchEntry>();
AddPipeline(new ConsoleEntityPipeline());
}
[Schema("baidu", "baidu_search_entity_model")]
[Entity(Expression = ".//div[@class='result']", Type = SelectorType.XPath)]
class BaiduSearchEntry : BaseEntity
{
[Column]
[Field(Expression = "Keyword", Type = SelectorType.Enviroment)]
public string Keyword { get; set; }
[Column]
[Field(Expression = ".//h3[@class='c-title']/a")]
[ReplaceFormatter(NewValue = "", OldValue = "<em>")]
[ReplaceFormatter(NewValue = "", OldValue = "</em>")]
public string Title { get; set; }
[Column]
[Field(Expression = ".//h3[@class='c-title']/a/@href")]
public string Url { get; set; }
[Column]
[Field(Expression = ".//div/p[@class='c-author']/text()")]
[ReplaceFormatter(NewValue = "-", OldValue = " ")]
public string Website { get; set; }
[Column]
[Field(Expression = ".//div/span/a[@class='c-cache']/@href")]
public string Snapshot { get; set; }
[Column]
[Field(Expression = ".//div[@class='c-summary c-row ']", Option = FieldOptions.InnerText)]
[ReplaceFormatter(NewValue = "", OldValue = "<em>")]
[ReplaceFormatter(NewValue = "", OldValue = "</em>")]
[ReplaceFormatter(NewValue = " ", OldValue = " ")]
public string Details { get; set; }
[Column(Length = 0)]
[Field(Expression = ".", Option = FieldOptions.InnerText)]
[ReplaceFormatter(NewValue = "", OldValue = "<em>")]
[ReplaceFormatter(NewValue = "", OldValue = "</em>")]
[ReplaceFormatter(NewValue = " ", OldValue = " ")]
public string PlainText { get; set; }
}
}
}
public static void Main()
{
EntityModelSpider.Run();
}
Command: -s:[spider type name | TaskName attribute] -i:[identity] -a:[arg1,arg2...] --tid:[taskId] -n:[name] -c:[configuration file path or name]
When you want to collect a page JS loaded, there is only one thing to do, set the downloader to WebDriverDownloader.
Downloader=new WebDriverDownloader(Browser.Chrome);
NOTE:
https://github.com/zlzforever/DotnetSpider.Hub
timeout 0
tcp-keepalive 60
QQ Group: 477731655 Email: zlzforever@163.com
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。