步骤: 1.首先在浏览器安装 'save as we '插件(用于把网页保存成HTML文件) <火狐浏览器/QQ浏览器/360浏览器/谷歌浏览等都支持此插件> 2.获取一篇百度文库文章word/pdf格式等都可以(以<富甲美国>为例) 3.点击'save as we',跳出提示按continue save 继续就可以把网页保存为HTML, 4.完全之策已准备就绪,只欠东南风了! 5.制作HTML解析软件,在窗体上添加一个按钮,一个RichTextBox1文本框,一个textbox控件 6.直接上代码 Imports HtmlAgilityPack Imports System.Text
Public Class Form1
Sub Get_YBQ() If TextBox1.Text <> '' Then RichTextBox1.Clear() Dim url As String = TextBox1.Text Dim wc As New HtmlWeb With { .OverrideEncoding = Encoding.Default, .AutoDetectEncoding = True } Dim htmldoc As HtmlDocument = wc.Load(url) Dim rootNode As HtmlNode = htmldoc.DocumentNode Try Dim xl As HtmlNodeCollection = rootNode.SelectNodes('//div[@class=' & Chr(34) & 'ie-fix' & Chr(34) & ']/p') If xl IsNot Nothing Then Dim strr As String = '' For Each node As HtmlNode In xl RichTextBox1.AppendText(node.InnerText) Next
End If
Catch ex As Exception MessageBox.Show(ex.Message) End Try End If End Sub
Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click OpenFileDialog1.Title = '请选择HTML文档' OpenFileDialog1.Filter = 'HTML文件|*.html|HTM文件|*.htm' OpenFileDialog1.ShowDialog() TextBox1.Text = OpenFileDialog1.FileName If OpenFileDialog1.FileName <> '' Then Get_YBQ() End If
End Sub End Class
7.此控件可以直接输入网址获取HTML和打开本地HTML文件进行解析(这里不用在线是因为百度文库网页有保护不能直接获取网页源码) 8.如有问题请添加QQ群提问 9.声明:本HTML解析只做技术交流,切勿用于非法用途,否则后果自负!谢谢合作!
|
|
来自: 昵称37581541 > 《vb学习》