本篇文章通过对安居客官网天津市租房信息的进行抓取,然后对房价信息进行了简单的描述性统计分析。 安居客租房信息网址为简单的静态网页,天津市河北区租房信息第二页的网页地址:
https://tj.zu.anjuke.com/fangyuan/hebei/p2/ 不同区只需要修改拼音参数即可,翻页效果通过修改p后面的数字来达到。由于该网址的信息页出来的数字全部被加密,所以单独打开每个房源的详情页,来获取每个房源的具体信息。
具体代码: Sub 安居客租房数据升级版() On Error Resume Next Dim arr(), drr() brr = Array("hebei")'这里仅仅获取河北区的租房信息,也可扩展填入其他区的拼音。 Range("a1").Resize(1, 14) = Array("Community_name", "Price", "Layout", "area", "Direction", "Floor", " decoration ", "house_type", "House_code", "release_time", "rent_type", "rent_type2", "url", "district") Set XML = CreateObject("msxml2.xmlhttp") ActiveSheet.Range("a2:z10000").Clear For m = 0 To UBound(brr) For Page = 1 To 1 XML.Open "get", "https://tj.zu.anjuke.com/fangyuan/" & brr(m) & "/p" & Page & "/", False XML.send Do While XML.ReadyState <> 4 DoEvents Loop strText = XML.responseText Set reg = CreateObject("vbscript.regexp") reg.Global = True reg.IgnoreCase = True reg.MultiLine = True reg.Pattern = " href=""(https://tj.zu.anjuke.com/fangyuan/\d{10}\?isauction=2&shangquan_id=\d+)" For Each mat In reg.Execute(strText) k = k + 1 ReDim Preserve arr(1 To k) arr(k) = mat.SubMatches(0) ReDim Preserve drr(1 To k) drr(k) = brr(m) Next Next Next ReDim crr(1 To UBound(arr), 1 To 12) For num = 1 To UBound(arr) XML.Open "get", arr(num), False XML.send Do While XML.ReadyState <> 4 DoEvents Loop reg.Pattern = "\s+" result = reg.Replace(XML.responseText, "") crr(num, 1) = Split(Split(result, """propview"">")(1), "<")(0) crr(num, 2) = Split(Split(result, "price""><em>")(1), "<")(0) crr(num, 3) = Split(Split(result, "户型:</span><spanclass=""info"">")(1), "<")(0) crr(num, 4) = Split(Split(result, "面积:</span><spanclass=""info"">")(1), "<")(0) crr(num, 5) = Split(Split(result, "朝向:</span><spanclass=""info"">")(1), "<")(0) crr(num, 6) = Split(Split(result, "楼层:</span><spanclass=""info"">")(1), "<")(0) crr(num, 7) = Split(Split(result, "装修:</span><spanclass=""info"">")(1), "<")(0) crr(num, 8) = Split(Split(result, "类型:</span><spanclass=""info"">")(1), "<")(0) crr(num, 9) = "'" & Split(Split(result, "房屋编码:")(1), ",")(0) crr(num, 10) = Split(Split(result, "发布时间:")(1), "<")(0) crr(num, 11) = Split(Split(result, "元/月</span><spanclass=""type"">")(1), "<")(0) crr(num, 12) = Split(Split(result, "<liclass=""title-label-itemrent"">")(1), "<")(0) Next ActiveSheet.Range("a2").Resize(UBound(crr), 12) = crr ActiveSheet.Range("m2").Resize(UBound(arr), 1) = Application.Transpose(arr) ActiveSheet.Range("n2").Resize(UBound(arr), 1) = Application.Transpose(drr) Columns.AutoFit End Sub
市内六区+环城四区范围内(不涉及原远郊及滨海新区),总共抓取了约6000条数据(由于IP限制,每个区抓取的房源数量不是完全相等) 双击图片可查看
①包括租房总价价格大于10000或者小于500的、租房平米数大于300的。 ②删除自我矛盾的不现实数据,比如:仅一室的合租房源。 ③删除三室一厅整租小于50平米的、一室一厅大于100平米的数据。 利用Excel和Power BI可视化工具对租房数据进行简单的可视化。 整体租房价格区间分布 整体出租户型比例 合租整租比例及整租户型比例 一室整租价格分布 两室整租价格分布 三室整租价格分布 从上面可以清晰看到: ①位于天津市市中心也是区域面积最小的和平区租房均价远远高于其他区。 ②环城四区中西青区的租房均价最高,津南区紧随其后,西青区有大学城,海泰高新区白领学生较多,租房价格比较高。津南区由于海河教育园区还有一号线等因素,租房均价也比较高。
|