1000字范文,内容丰富有趣,学习的好帮手!
1000字范文 > 采集规则七:河溪小说网 www.518cqdl.com 适用于-易读系统小说站河溪小说网的采集规则

采集规则七:河溪小说网 www.518cqdl.com 适用于-易读系统小说站河溪小说网的采集规则

时间:2018-11-22 19:55:17

相关推荐

采集规则七:河溪小说网 www.518cqdl.com 适用于-易读系统小说站河溪小说网的采集规则

有朋友说不会替换和查找过滤,那我就一个一个站弄下吧。没多少时间,一天发一个吧,这次是雯雯文学。

首先要过滤掉他网站的广告。过滤信息在<PubContentText>这。可以参考下。也许还有我不知道的广告,你们可以进他的网站内页多点一下找一下看看。

这个规则易读的采集器是可以适应的。关关不知道是否可以用。

<?xml version="1.0" encoding="UTF-8"?>

<RuleConfigInfo xmlns:xsi="/2001/XMLSchema-instance" xmlns:xsd="/2001/XMLSchema">

<NovelIntro>

<RegexName>NovelIntro</RegexName>

<Pattern>&lt;meta property="og:description" content="((.|\n)*?)"/&gt;</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</NovelIntro>

<PubContentText>

<RegexName>PubContentText</RegexName>

<Pattern>&lt;div id="content"&gt;((.|\n)*?)&lt;/div&gt;</Pattern>

<Method/>

<FilterPattern>河溪小说

手机站-

&lt;script.+?&lt;/script&gt;|&lt;div.+?&gt;|&lt;/div&gt;|&lt;p&gt;|&lt;/p&gt;

【&lt;b&gt;(.|\n)*?&lt;/B&gt;】♂</FilterPattern>

<Options/>

</PubContentText>

<NovelSearchUrl>

<RegexName>NovelSearchUrl</RegexName>

<Pattern/>

<Method/>

<FilterPattern/>

<Options/>

</NovelSearchUrl>

<NovelList_GetNovelKey>

<RegexName>NovelList_GetNovelKey</RegexName>

<Pattern>&lt;span class="s2"&gt;&lt;a href="/info/.+?/(.+?).html"&gt;.+?&lt;/a&gt;</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</NovelList_GetNovelKey>

<NovelListUrl>

<RegexName>NovelListUrl</RegexName>

<Pattern>/list/1.html

/list/2.html

/list/3.html

/list/4.html

/list/5.html

/list/6.html

/list/7.html

/list/8.html

/list/9.html

/list/10.html</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</NovelListUrl>

<PubChapterRegion>

<RegexName>PubChapterRegion</RegexName>

<Pattern/>

<Method/>

<FilterPattern/>

<Options/>

</PubChapterRegion>

<NovelName>

<RegexName>NovelName</RegexName>

<Pattern>&lt;meta property="og:title" content="(.+?)"/&gt;</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</NovelName>

<NovelSearch_GetNovelName>

<RegexName>NovelSearch_GetNovelName</RegexName>

<Pattern/>

<Method/>

<FilterPattern/>

<Options/>

</NovelSearch_GetNovelName>

<NovelList_GetNovelKey2>

<RegexName>NovelList_GetNovelKey2</RegexName>

<Pattern/>

<Method/>

<FilterPattern/>

<Options/>

</NovelList_GetNovelKey2>

<LagerSort>

<RegexName>LagerSort</RegexName>

<Pattern>&lt;meta property="og:novel:category" content="(.+?)"/&gt;</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</LagerSort>

<SmallSort>

<RegexName>SmallSort</RegexName>

<Pattern>&lt;meta property="og:novel:category" content="(.+?)"/&gt;</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</SmallSort>

<GetSiteUrl>

<RegexName>GetSiteUrl</RegexName>

<Pattern></Pattern>

<Method/>

<FilterPattern/>

<Options/>

</GetSiteUrl>

<TestSearchNovelName>

<RegexName>TestSearchNovelName</RegexName>

<Pattern/>

<Method/>

<FilterPattern/>

<Options/>

</TestSearchNovelName>

<NovelDegree>

<RegexName>NovelDegree</RegexName>

<Pattern>&lt;meta property="og:novel:status" content="(.+?)"/&gt;</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</NovelDegree>

<PubContentText_FT2JT>

<RegexName>PubContentText_FT2JT</RegexName>

<Pattern>false</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</PubContentText_FT2JT>

<NovelAuthor>

<RegexName>NovelAuthor</RegexName>

<Pattern>&lt;meta property="og:novel:author" content="(.+?)"/&gt;</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</NovelAuthor>

<NovelInfo_GetNovelPubKey>

<RegexName>NovelInfo_GetNovelPubKey</RegexName>

<Pattern>&lt;meta property="og:novel:read_url" content="(.+?)"/&gt;</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</NovelInfo_GetNovelPubKey>

<PubContentText_ASCII>

<RegexName>PubContentText_ASCII</RegexName>

<Pattern>false</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</PubContentText_ASCII>

<NovelCover>

<RegexName>NovelCover</RegexName>

<Pattern>&lt;meta property="og:image" content="(.+?)"/&gt;</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</NovelCover>

<RuleVersion>

<RegexName>RuleVersion</RegexName>

<Pattern>2</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</RuleVersion>

<PubContentText_BJ2QJ>

<RegexName>PubContentText_BJ2QJ</RegexName>

<Pattern>false</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</PubContentText_BJ2QJ>

<NovelInfoExtra>

<RegexName>NovelInfoExtra</RegexName>

<Pattern/>

<Method/>

<FilterPattern/>

<Options/>

</NovelInfoExtra>

<PubIndexUrl>

<RegexName>PubIndexUrl</RegexName>

<Pattern>{NovelPubKey}</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</PubIndexUrl>

<NovelDefaultCoverUrl>

<RegexName>NovelDefaultCoverUrl</RegexName>

<Pattern>/cover/nocover.jpg</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</NovelDefaultCoverUrl>

<PubContentUrl2>

<RegexName>PubContentUrl2</RegexName>

<Pattern/>

<Method/>

<FilterPattern/>

<Options/>

</PubContentUrl2>

<PubContentUrl>

<RegexName>PubContentUrl</RegexName>

<Pattern>{ChapterKey}</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</PubContentUrl>

<GetSiteName>

<RegexName>GetSiteName</RegexName>

<Pattern></Pattern>

<Method/>

<FilterPattern/>

<Options/>

</GetSiteName>

<PubChapterName>

<RegexName>PubChapterName</RegexName>

<Pattern>&lt;a href=".+?" title=".+?"&gt;(.+?)&lt;/a&gt;</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</PubChapterName>

<GetSiteCharset>

<RegexName>GetSiteCharset</RegexName>

<Pattern>utf8</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</GetSiteCharset>

<PubChapter_GetChapterKey>

<RegexName>PubChapter_GetChapterKey</RegexName>

<Pattern>&lt;a href="(.+?)" title=".+?"&gt;.+?&lt;/a&gt;</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</PubChapter_GetChapterKey>

<NovelSearch_GetNovelKey>

<RegexName>NovelSearch_GetNovelKey</RegexName>

<Pattern/>

<Method/>

<FilterPattern/>

<Options/>

</NovelSearch_GetNovelKey>

<NovelKeyword>

<RegexName>NovelKeyword</RegexName>

<Pattern/>

<Method/>

<FilterPattern/>

<Options/>

</NovelKeyword>

<NovelUrl>

<RegexName>NovelUrl</RegexName>

<Pattern>/info/10/{NovelKey}.html</Pattern>

<Method/>

<FilterPattern/>

<Options/>

</NovelUrl>

</RuleConfigInfo>

​过滤这,我没多看,需要这个采集规则的可以去多看下他的小说内容页面,看下他加了什么广告内容么。

易读站不多,我找了下找到一些:

www.infected-

www.next-

这些网站都可以用这个规则进行套,改下过滤和域名就可以了。

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。