配色: 字号:
Scalaz泛函数据结构: Tree-数据游览及维护
2016-09-21 | 阅:  转:  |  分享 
  
Scalaz泛函数据结构:Tree-数据游览及维护
上节我们讨论了Zipper-串形不可变集合(immutablesequentialcollection)游标,在串形集合中左右游走及元素维护操作。这篇我们谈谈Tree。在电子商务应用中对于xml,json等格式文件的处理要求非常之普遍,scalaz提供了Tree数据类型及相关的游览及操作函数能更方便高效的处理xml,json文件及系统目录这些树形结构数据的相关编程。scalazTree的定义非常简单:scalaz/Tree.scala

复制代码
Amulti-waytree,alsoknownasarosetree.AlsoknownasCofree[Stream,A].
/
sealedabstractclassTree[A]{

importTree._

/Thelabelattherootofthistree./
defrootLabel:A

/Thechildnodesofthistree./
defsubForest:Stream[Tree[A]]
...
复制代码
Tree是由一个A值rootLabel及一个流中子树Stream[Tree[A]]组成。Tree可以只由一个A类型值rootLabel组成,这时流中子树subForest就是空的Stream.empty。只有rootLabel的Tree俗称叶(leaf),有subForest的称为节(node)。scalaz为任何类型提供了leaf和node的构建注入方法:syntax/TreeOps.scala

复制代码
finalclassTreeOps[A](self:A){
defnode(subForest:Tree[A]):Tree[A]=Tree.node(self,subForest.toStream)

defleaf:Tree[A]=Tree.leaf(self)
}

traitToTreeOps{
implicitdefToTreeOps[A](a:A)=newTreeOps(a)
}
复制代码
实际上注入方法调用了Tree里的构建函数:

复制代码
traitTreeFunctions{
/ConstructanewTreenode./
defnode[A](root:=>A,forest:=>Stream[Tree[A]]):Tree[A]=newTree[A]{
lazyvalrootLabel=root
lazyvalsubForest=forest

overridedeftoString=""
}

/Constructatreenodewithnochildren./
defleaf[A](root:=>A):Tree[A]=node(root,Stream.empty)
复制代码
Tree提供了构建和模式拆分函数:

复制代码
objectTreeextendsTreeInstanceswithTreeFunctions{
/Constructatreenodewithnochildren./
defapply[A](root:=>A):Tree[A]=leaf(root)

objectNode{
defunapply[A](t:Tree[A]):Option[(A,Stream[Tree[A]])]=Some((t.rootLabel,t.subForest))
}
}
复制代码
我们可以直接构建Tree:

复制代码
1Tree("ALeaf")==="ALeaf".leaf//>res5:Boolean=true
2valtree:Tree[Int]=
31.node(
411.leaf,
512.node(
6121.leaf),
72.node(
821.leaf,
922.leaf)
10)//>tree:scalaz.Tree[Int]=
11tree.drawTree//>res6:String="1
12//||
13//|+-11
14//||
15//|+-12
16//|||
17//||`-121
18//||
19//|`-2
20//||
21//|+-21
22//||
23//|`-22
24//|"
复制代码
Tree实现了下面众多的接口函数:

复制代码
sealedabstractclassTreeInstances{
implicitvaltreeInstance:Traverse1[Tree]withMonad[Tree]withComonad[Tree]withAlign[Tree]withZip[Tree]=newTraverse1[Tree]withMonad[Tree]withComonad[Tree]withAlign[Tree]withZip[Tree]{
defpoint[A](a:=>A):Tree[A]=Tree.leaf(a)
defcobind[A,B](fa:Tree[A])(f:Tree[A]=>B):Tree[B]=facobindf
defcopoint[A](p:Tree[A]):A=p.rootLabel
overridedefmap[A,B](fa:Tree[A])(f:A=>B)=famapf
defbind[A,B](fa:Tree[A])(f:A=>Tree[B]):Tree[B]=faflatMapf
deftraverse1Impl[G[_]:Apply,A,B](fa:Tree[A])(f:A=>G[B]):G[Tree[B]]=fatraverse1f
overridedeffoldRight[A,B](fa:Tree[A],z:=>B)(f:(A,=>B)=>B):B=fa.foldRight(z)(f)
overridedeffoldMapRight1[A,B](fa:Tree[A])(z:A=>B)(f:(A,=>B)=>B)=(fa.flatten.reverse:@unchecked)match{
caseh#::t=>t.foldLeft(z(h))((b,a)=>f(a,b))
}
overridedeffoldLeft[A,B](fa:Tree[A],z:B)(f:(B,A)=>B):B=
fa.flatten.foldLeft(z)(f)
overridedeffoldMapLeft1[A,B](fa:Tree[A])(z:A=>B)(f:(B,A)=>B):B=fa.flattenmatch{
caseh#::t=>t.foldLeft(z(h))(f)
}
overridedeffoldMap[A,B](fa:Tree[A])(f:A=>B)(implicitF:Monoid[B]):B=fafoldMapf
defalignWith[A,B,C](f:(\&/[A,B])?C)={
defalign(ta:Tree[A],tb:Tree[B]):Tree[C]=
Tree.node(f(\&/(ta.rootLabel,tb.rootLabel)),Align[Stream].alignWith[Tree[A],Tree[B],Tree[C]]({
case\&/.This(sta)?stamap{a?f(\&/.This(a))}
case\&/.That(stb)?stbmap{b?f(\&/.That(b))}
case\&/.Both(sta,stb)?align(sta,stb)
})(ta.subForest,tb.subForest))
align_
}
defzip[A,B](aa:=>Tree[A],bb:=>Tree[B])={
vala=aa
valb=bb
Tree.node(
(a.rootLabel,b.rootLabel),
Zip[Stream].zipWith(a.subForest,b.subForest)(zip(_,_))
)
}
}

implicitdeftreeEqual[A](implicitA0:Equal[A]):Equal[Tree[A]]=
newTreeEqual[A]{defA=A0}

implicitdeftreeOrder[A](implicitA0:Order[A]):Order[Tree[A]]=
newOrder[Tree[A]]withTreeEqual[A]{
defA=A0
importstd.stream._
overridedeforder(x:Tree[A],y:Tree[A])=
A.order(x.rootLabel,y.rootLabel)match{
caseOrdering.EQ=>
Order[Stream[Tree[A]]].order(x.subForest,y.subForest)
casex=>x
}
}
复制代码
那么Tree就是个Monad,也是Functor,Applicative,还是traversable,foldable。Tree也实现了Order,Equal实例,可以进行值的顺序比较。我们就用些例子来说明吧:

复制代码
1//是Functor...
2(treemap{v:Int=>v+1})===
32.node(
412.leaf,
513.node(
6122.leaf),
73.node(
822.leaf,
923.leaf)
10)//>res7:Boolean=true
11
12//...是Monad
131.point[Tree]===1.leaf//>res8:Boolean=true
14valt2=tree>>=(x=>(x==2)?x.leaf|x.node((-x).leaf))
15//>t2:scalaz.Tree[Int]=
16t2===1.node((-1).leaf,2.leaf,3.node((-3).leaf,4.node((-4).leaf)))
17//>res9:Boolean=false
18t2.drawTree//>res10:String="1
19//||
20//|+--1
21//||
22//|+-11
23//|||
24//||`--11
25//||
26//|+-12
27//|||
28//||+--12
29//|||
30//||`-121
31//|||
32//||`--121
33//||
34//|`-2
35//||
36//|+-21
37//|||
38//||`--21
39//||
40//|`-22
41//||
42//|`--22
43//|"
44//...是Foldable
45tree.foldMap(_.toString)==="1111212122122"//>res11:Boolean=true
复制代码
说到构建Tree,偶然在网上发现了这么一个Tree构建函数:

复制代码
defpathTree[E](root:E,paths:Seq[Seq[E]]):Tree[E]={
root.node(pathsgroupBy(_.head)map{
case(parent,subpaths)=>
pathTree(parent,subpathscollect{
casepp+:restifrest.nonEmpty=>rest
})
}toSeq:_)
}
复制代码
据说这个pathTree函数能把List里的目录结构转化成Tree。先看看到底是不是具备如此功能:

复制代码
1valpaths=List(List("A","a1","a2"),List("B","b1"))
2//>paths:List[List[String]]=List(List(A,a1,a2),List(B,b1))
3pathTree("root",paths)drawTree//>res0:String=""root"
4//||
5//|+-"A"
6//|||
7//||`-"a1"
8//|||
9//||`-"a2"
10//||
11//|`-"B"
12//||
13//|`-"b1"
14//|"
15valpaths=List(List("A","a1","a2"),List("B","b1"),List("B","b2","b3"))
16//>paths:List[List[String]]=List(List(A,a1,a2),List(B,b1),List(B,b2,
17//|b3))
18pathTree("root",paths)drawTree//>res0:String=""root"
19//||
20//|+-"A"
21//|||
22//||`-"a1"
23//|||
24//||`-"a2"
25//||
26//|`-"B"
27//||
28//|+-"b2"
29//|||
30//||`-"b3"
31//||
32//|`-"b1"
33//|"
复制代码
果然能行,而且还能把"B"节点合并汇集。这个函数的作者简直就是个神人,起码是个算法和FP语法运用大师。我虽然还无法达到大师的程度能写出这样的泛函程序,但好奇心是挡不住的,总想了解这个函数是怎么运作的。可以用一些测试数据来逐步跟踪一下:



1valpaths=List(List("A"))//>paths:List[List[String]]=List(List(A))
2valgpPaths=paths.groupBy(_.head)//>gpPaths:scala.collection.immutable.Map[String,List[List[String]]]=Map(A->List(List(A)))
3List(List("A"))collect{casepp+:restifrest.nonEmpty=>rest}
4//>res0:List[List[String]]=List()


通过上面的跟踪约化我们看到List(List(A))在pathTree里的执行过程。这里把复杂的groupBy和collect函数的用法和结果了解了。实际上整个过程相当于:



1"root".node(
2"A".node(List().toSeq:_)
3)drawTree//>res3:String=""root"
4//||
5//|`-"A"
6//|"


如果再增加一个点就相当于:

复制代码
1"root".node(
2"A".node(List().toSeq:_),
3"B".node(List().toSeq:_)
4)drawTree//>res4:String=""root"
5//||
6//|+-"A"
7//||
8//|`-"B"
9//|"
复制代码
加多一层:

复制代码
1valpaths=List(List("A","a1"))//>paths:List[List[String]]=List(List(A,a1))
2valgpPaths=paths.groupBy(_.head)//>gpPaths:scala.collection.immutable.Map[String,List[List[String]]]=Map(A
3//|->List(List(A,a1)))
4List(List("A","a1"))collect{casepp+:restifrest.nonEmpty=>rest}
5//>res0:List[List[String]]=List(List(a1))
6
7//化解成
8"root".node(
9"A".node(
10"a1".node(
11List().toSeq:_)
12)
13)drawTree//>res3:String=""root"
14//||
15//|`-"A"
16//||
17//|`-"a1"
18//|"
复制代码
合并目录:



复制代码
1valpaths=List(List("A","a1"),List("A","a2"))//>paths:List[List[String]]=List(List(A,a1),List(A,a2))
2valgpPaths=paths.groupBy(_.head)//>gpPaths:scala.collection.immutable.Map[String,List[List[String]]]=Map(A
3//|->List(List(A,a1),List(A,a2)))
4List(List("A","a1"),List("A","a2"))collect{casepp+:restifrest.nonEmpty=>rest}
5//>res0:List[List[String]]=List(List(a1),List(a2))
6
7//相当产生结果
8"root".node(
9"A".node(
10"a1".node(
11List().toSeq:_)
12,
13"a2".node(
14List().toSeq:_)
15)
16)drawTree//>res3:String=""root"
17//||
18//|`-"A"
19//||
20//|+-"a1"
21//||
22//|`-"a2"
23//|"
复制代码


相信这些跟踪过程足够了解整个函数的工作原理了。
有了Tree构建方法后就需要Tree的游动和操作函数了。与串形集合的直线游动不同的是,树形集合游动方式是分岔的。所以Zipper不太适用于树形结构。scalaz特别提供了树形集合的定位游标TreeLoc,我们看看它的定义:scalaz/TreeLoc.scala



复制代码
finalcaseclassTreeLoc[A](tree:Tree[A],lefts:TreeForest[A],
rights:TreeForest[A],parents:Parents[A]){
...
traitTreeLocFunctions{
typeTreeForest[A]=
Stream[Tree[A]]

typeParent[A]=
(TreeForest[A],A,TreeForest[A])

typeParents[A]=
Stream[Parent[A]]
复制代码


树形集合游标TreeLoc由当前节点tree、左子树lefts、右子树rights及父树parents组成。lefts,rights,parents都是在流中的树形Stream[Tree[A]]。
用Tree.loc可以直接对目标树生成TreeLoc:



复制代码
1/ATreeLoczipperofthistree,focusedontherootnode./
2defloc:TreeLoc[A]=TreeLoc.loc(this,Stream.Empty,Stream.Empty,Stream.Empty)
3
4valtree:Tree[Int]=
51.node(
611.leaf,
712.node(
8121.leaf),
92.node(
1021.leaf,
1122.leaf)
12)//>tree:scalaz.Tree[Int]=
13
14tree.loc//>res7:scalaz.TreeLoc[Int]=TreeLoc(,Stream(),Stream(),Stream())
复制代码


TreeLoc的游动函数:



复制代码
defroot:TreeLoc[A]=
parentmatch{
caseSome(z)=>z.root
caseNone=>this
}

/Selecttheleftsiblingofthecurrentnode./
defleft:Option[TreeLoc[A]]=leftsmatch{
caset#::ts=>Some(loc(t,ts,tree#::rights,parents))
caseStream.Empty=>None
}

/Selecttherightsiblingofthecurrentnode./
defright:Option[TreeLoc[A]]=rightsmatch{
caset#::ts=>Some(loc(t,tree#::lefts,ts,parents))
caseStream.Empty=>None
}

/Selecttheleftmostchildofthecurrentnode./
deffirstChild:Option[TreeLoc[A]]=tree.subForestmatch{
caset#::ts=>Some(loc(t,Stream.Empty,ts,downParents))
caseStream.Empty=>None
/Selecttherightmostchildofthecurrentnode./
deflastChild:www.wang027.comOption[TreeLoc[A]]=tree.subForest.
reversematch{caset#::ts=>Some(loc(t,ts,Stream.Empty,
downParents))caseStream.Empty=>None
}

/Selectthenthchildofthecurrentnode./
defgetChild(n:Int):Option[TreeLoc[A]]=
for{lr<-splitChildren(Stream.Empty,tree.subForest,n)
ls=lr._1
}yieldloc(ls.head,ls.tail,lr._2,downParents)
复制代码


我们试着用这些函数游动:



复制代码
1valtree:Tree[Int]=
21.node(
311.leaf,
412.node(
5121.leaf),
62.node(
721.leaf,
822.leaf)
9)//>tree:scalaz.Tree[Int]=
10tree.loc//>res7:scalaz.TreeLoc[Int]=TreeLoc(,Stream(),Stream(),Stream())
11vall=for{
12l1<-tree.loc.some
13l2<-l1.firstChild
14l3<-l1.lastChild
15l4<-l3.firstChild
16}yield(l1,l2,l3,l4)//>l:Option[(scalaz.TreeLoc[Int],scalaz.TreeLoc[Int],scalaz.TreeLoc[Int],
17//|scalaz.TreeLoc[Int])]=Some((TreeLoc(,Stream(),Stream(),Stream()),T
18//|reeLoc(,Stream(),Stream(,),Stream((Stream(),1,Stream()),
19//|?)),TreeLoc(,Stream(,),Stream(),Stream((Stream(),1,Stre
20//|am()),?)),TreeLoc(,Stream(),Stream(,?),Stream((Stream(,
21//|),2,Stream()),?))))
22
23l.get._1.getLabel//>res8:Int=1
24l.get._2.getLabel//>res9:Int=11
25l.get._3.getLabel//>res10:Int=2
26l.get._4.getLabel//>res11:Int=21
复制代码


跳动函数:

复制代码
/Selectthenthchildofthecurrentnode./
defgetChild(n:Int):Option[TreeLoc[A]]=
for{lr<-splitChildren(Stream.Empty,tree.subForest,n)
ls=lr._1
}yieldloc(ls.head,ls.tail,lr._2,downParents)

/Selectthefirstimmediatechildofthecurrentnodethatsatisfiesthegivenpredicate./
deffindChild(p:Tree[A]=>Boolean):Option[TreeLoc[A]]={
@tailrec
defsplit(acc:TreeForest[A],xs:TreeForest[A]):Option[(TreeForest[A],Tree[A],TreeForest[A])]=
(acc,xs)match{
case(acc,Stream.cons(x,xs))=>if(p(x))Some((acc,x,xs))elsesplit(Stream.cons(x,acc),xs)
case_=>None
}
for(ltr<-split(Stream.Empty,tree.subForest))yieldloc(ltr._2,ltr._1,ltr._3,downParents)
}

/Selectthefirstdescendantnodeofthecurrentnodethatsatisfiesthegivenpredicate./
deffind(p:TreeLoc[A]=>Boolean):Option[TreeLoc[A]]=
Cobind[TreeLoc].cojoin(this).tree.flatten.find(p)
复制代码
find用法示范:



复制代码
1valtree:Tree[Int]=
21.node(
311.leaf,
412.node(
5121.leaf),
62.node(
721.leaf,
822.leaf)
9)//>tree:scalaz.Tree[Int]=
10tree.loc//>res7:scalaz.TreeLoc[Int]=TreeLoc(,Stream(),Stream(),Stream())
11vall=for{
12l1<-tree.loc.some
13l2<-l1.find{_.getLabel==2}
14l3<-l1.find{_.getLabel==121}
15l4<-l2.find{_.getLabel==22}
16l5<-l1.findChild{_.rootLabel==12}
17l6<-l1.findChild{_.rootLabel==2}
18}yieldl6//>l:Option[scalaz.TreeLoc[Int]]=Some(TreeLoc(,Stream(,?),St
19//|ream(),Stream((Stream(),1,Stream()),?)))
复制代码


注意:上面6个跳动都成功了。如果无法跳转结果会是None
insert,modify,delete这些操作函数:

复制代码
/Replacethecurrentnodewiththegivenone./
defsetTree(t:Tree[A]):TreeLoc[A]=loc(t,lefts,rights,parents)

/Modifythecurrentnodewiththegivenfunction./
defmodifyTree(f:Tree[A]=>Tree[A]):TreeLoc[A]=setTree(f(tree))

/Modifythelabelatthecurrentnodewiththegivenfunction./
defmodifyLabel(f:A=>A):TreeLoc[A]=setLabel(f(getLabel))

/Getthelabelofthecurrentnode./
defgetLabel:A=tree.rootLabel

/Setthelabelofthecurrentnode./
defsetLabel(a:A):TreeLoc[A]=modifyTree((t:Tree[A])=>node(a,t.subForest))

/Insertthegivennodetotheleftofthecurrentnodeandgiveitfocus./
definsertLeft(t:Tree[A]):TreeLoc[A]=loc(t,lefts,Stream.cons(tree,rights),parents)

/Insertthegivennodetotherightofthecurrentnodeandgiveitfocus./
definsertRight(t:Tree[A]):TreeLoc[A]=loc(t,Stream.cons(tree,lefts),rights,parents)

/Insertthegivennodeasthefirstchildofthecurrentnodeandgiveitfocus./
definsertDownFirst(t:Tree[A]):TreeLoc[A]=loc(t,Stream.Empty,tree.subForest,downParents)

/Insertthegivennodeasthelastchildofthecurrentnodeandgiveitfocus./
definsertDownLast(t:Tree[A]):TreeLoc[A]=loc(t,tree.subForest.reverse,Stream.Empty,downParents)

/Insertthegivennodeasthenthchildofthecurrentnodeandgiveitfocus./
definsertDownAt(n:Int,t:Tree[A]):Option[TreeLoc[A]]=
for(lr<-splitChildren(Stream.Empty,tree.subForest,n))yieldloc(t,lr._1,lr._2,downParents)

/Deletethecurrentnodeandallitschildren./
defdelete:Option[TreeLoc[A]]=rightsmatch{
caseStream.cons(t,ts)=>Some(loc(t,lefts,ts,parents))
case_=>leftsmatch{
caseStream.cons(t,ts)=>Some(loc(t,ts,rights,parents))
case_=>for(loc1<-parent)yieldloc1.modifyTree((t:Tree[A])=>node(t.rootLabel,Stream.Empty))
}
}
复制代码
用法示范:

复制代码
1valtr=1.leaf//>tr:scalaz.Tree[Int]=
2valtl=for{
3l1<-tr.loc.some
4l3<-l1.insertDownLast(12.leaf).some
5l4<-l3.insertDownLast(121.leaf).some
6l5<-l4.root.some
7l2<-l5.insertDownFirst(11.leaf).some
8l6<-l2.root.some
9l7<-l6.find{_.getLabel==12}
10l8<-l7.setLabel(102).some
11}yieldl8//>tl:Option[scalaz.TreeLoc[Int]]=Some(TreeLoc(,Stream(,?),S
12//|tream(),Stream((Stream(),1,Stream()),?)))
13
14tl.get.toTree.drawTree//>res8:String="1
15//||
16//|+-11
17//||
18//|`-102
19//||
20//|`-121
21//|"
22
复制代码
setTree和delete会替换当前节点下的所有子树:

复制代码
1valtree:Tree[Int]=
21.node(
311.leaf,
412.node(
5121.leaf),
62.node(
721.leaf,
822.leaf)
9)//>tree:scalaz.Tree[Int]=
10defmodTree(t:Tree[Int]):Tree[Int]={
11vall=for{
12l1<-t.loc.some
13l2<-l1.find{_.getLabel==22}
14l3<-l2.setTree{3.node(31.leaf)}.some
15}yieldl3
16l.get.toTree
17}//>modTree:(t:scalaz.Tree[Int])scalaz.Tree[Int]
18vall=for{
19l1<-tree.loc.some
20l2<-l1.find{_.getLabel==2}
21l3<-l2.modifyTree{modTree(_)}.some
22l4<-l3.root.some
23l5<-l4.find{_.getLabel==12}
24l6<-l5.delete
25}yieldl6//>l:Option[scalaz.TreeLoc[Int]]=Some(TreeLoc(,Stream(,?),St
26//|ream(),Stream((Stream(),1,Stream()),?)))
27l.get.toTree.drawTree//>res7:String="1
28//||
29//|+-11
30//||
31//|`-2
32//||
33//|+-21
34//||
35//|`-3
36//||
37//|`-31
38//|"
复制代码
通过scalaz的Tree和TreeLoc数据结构,以及一整套树形结构游览、操作函数,我们可以方便有效地实现FP风格的不可变树形集合编程。
献花(0)
+1
(本文系thedust79首藏)