Scalaz泛函数据结构:Tree-数据游览及维护 上节我们讨论了Zipper-串形不可变集合(immutablesequentialcollection)游标,在串形集合中左右游走及元素维护操作。这篇我们谈谈Tree。在电子商务应用中对于xml,json等格式文件的处理要求非常之普遍,scalaz提供了Tree数据类型及相关的游览及操作函数能更方便高效的处理xml,json文件及系统目录这些树形结构数据的相关编程。scalazTree的定义非常简单:scalaz/Tree.scala
复制代码 Amulti-waytree,alsoknownasarosetree.AlsoknownasCofree[Stream,A]. / sealedabstractclassTree[A]{
importTree._
/Thelabelattherootofthistree./ defrootLabel:A
/Thechildnodesofthistree./ defsubForest:Stream[Tree[A]] ... 复制代码 Tree是由一个A值rootLabel及一个流中子树Stream[Tree[A]]组成。Tree可以只由一个A类型值rootLabel组成,这时流中子树subForest就是空的Stream.empty。只有rootLabel的Tree俗称叶(leaf),有subForest的称为节(node)。scalaz为任何类型提供了leaf和node的构建注入方法:syntax/TreeOps.scala
复制代码 finalclassTreeOps[A](self:A){ defnode(subForest:Tree[A]):Tree[A]=Tree.node(self,subForest.toStream)
defleaf:Tree[A]=Tree.leaf(self) }
traitToTreeOps{ implicitdefToTreeOps[A](a:A)=newTreeOps(a) } 复制代码 实际上注入方法调用了Tree里的构建函数:
复制代码 traitTreeFunctions{ /ConstructanewTreenode./ defnode[A](root:=>A,forest:=>Stream[Tree[A]]):Tree[A]=newTree[A]{ lazyvalrootLabel=root lazyvalsubForest=forest
overridedeftoString="" }
/Constructatreenodewithnochildren./ defleaf[A](root:=>A):Tree[A]=node(root,Stream.empty) 复制代码 Tree提供了构建和模式拆分函数:
复制代码 objectTreeextendsTreeInstanceswithTreeFunctions{ /Constructatreenodewithnochildren./ defapply[A](root:=>A):Tree[A]=leaf(root)
objectNode{ defunapply[A](t:Tree[A]):Option[(A,Stream[Tree[A]])]=Some((t.rootLabel,t.subForest)) } } 复制代码 我们可以直接构建Tree:
复制代码 1Tree("ALeaf")==="ALeaf".leaf//>res5:Boolean=true 2valtree:Tree[Int]= 31.node( 411.leaf, 512.node( 6121.leaf), 72.node( 821.leaf, 922.leaf) 10)//>tree:scalaz.Tree[Int]= 11tree.drawTree//>res6:String="1 12//|| 13//|+-11 14//|| 15//|+-12 16//||| 17//||`-121 18//|| 19//|`-2 20//|| 21//|+-21 22//|| 23//|`-22 24//|" 复制代码 Tree实现了下面众多的接口函数:
复制代码 sealedabstractclassTreeInstances{ implicitvaltreeInstance:Traverse1[Tree]withMonad[Tree]withComonad[Tree]withAlign[Tree]withZip[Tree]=newTraverse1[Tree]withMonad[Tree]withComonad[Tree]withAlign[Tree]withZip[Tree]{ defpoint[A](a:=>A):Tree[A]=Tree.leaf(a) defcobind[A,B](fa:Tree[A])(f:Tree[A]=>B):Tree[B]=facobindf defcopoint[A](p:Tree[A]):A=p.rootLabel overridedefmap[A,B](fa:Tree[A])(f:A=>B)=famapf defbind[A,B](fa:Tree[A])(f:A=>Tree[B]):Tree[B]=faflatMapf deftraverse1Impl[G[_]:Apply,A,B](fa:Tree[A])(f:A=>G[B]):G[Tree[B]]=fatraverse1f overridedeffoldRight[A,B](fa:Tree[A],z:=>B)(f:(A,=>B)=>B):B=fa.foldRight(z)(f) overridedeffoldMapRight1[A,B](fa:Tree[A])(z:A=>B)(f:(A,=>B)=>B)=(fa.flatten.reverse:@unchecked)match{ caseh#::t=>t.foldLeft(z(h))((b,a)=>f(a,b)) } overridedeffoldLeft[A,B](fa:Tree[A],z:B)(f:(B,A)=>B):B= fa.flatten.foldLeft(z)(f) overridedeffoldMapLeft1[A,B](fa:Tree[A])(z:A=>B)(f:(B,A)=>B):B=fa.flattenmatch{ caseh#::t=>t.foldLeft(z(h))(f) } overridedeffoldMap[A,B](fa:Tree[A])(f:A=>B)(implicitF:Monoid[B]):B=fafoldMapf defalignWith[A,B,C](f:(\&/[A,B])?C)={ defalign(ta:Tree[A],tb:Tree[B]):Tree[C]= Tree.node(f(\&/(ta.rootLabel,tb.rootLabel)),Align[Stream].alignWith[Tree[A],Tree[B],Tree[C]]({ case\&/.This(sta)?stamap{a?f(\&/.This(a))} case\&/.That(stb)?stbmap{b?f(\&/.That(b))} case\&/.Both(sta,stb)?align(sta,stb) })(ta.subForest,tb.subForest)) align_ } defzip[A,B](aa:=>Tree[A],bb:=>Tree[B])={ vala=aa valb=bb Tree.node( (a.rootLabel,b.rootLabel), Zip[Stream].zipWith(a.subForest,b.subForest)(zip(_,_)) ) } }
implicitdeftreeEqual[A](implicitA0:Equal[A]):Equal[Tree[A]]= newTreeEqual[A]{defA=A0}
implicitdeftreeOrder[A](implicitA0:Order[A]):Order[Tree[A]]= newOrder[Tree[A]]withTreeEqual[A]{ defA=A0 importstd.stream._ overridedeforder(x:Tree[A],y:Tree[A])= A.order(x.rootLabel,y.rootLabel)match{ caseOrdering.EQ=> Order[Stream[Tree[A]]].order(x.subForest,y.subForest) casex=>x } } 复制代码 那么Tree就是个Monad,也是Functor,Applicative,还是traversable,foldable。Tree也实现了Order,Equal实例,可以进行值的顺序比较。我们就用些例子来说明吧:
复制代码 1//是Functor... 2(treemap{v:Int=>v+1})=== 32.node( 412.leaf, 513.node( 6122.leaf), 73.node( 822.leaf, 923.leaf) 10)//>res7:Boolean=true 11 12//...是Monad 131.point[Tree]===1.leaf//>res8:Boolean=true 14valt2=tree>>=(x=>(x==2)?x.leaf|x.node((-x).leaf)) 15//>t2:scalaz.Tree[Int]= 16t2===1.node((-1).leaf,2.leaf,3.node((-3).leaf,4.node((-4).leaf))) 17//>res9:Boolean=false 18t2.drawTree//>res10:String="1 19//|| 20//|+--1 21//|| 22//|+-11 23//||| 24//||`--11 25//|| 26//|+-12 27//||| 28//||+--12 29//||| 30//||`-121 31//||| 32//||`--121 33//|| 34//|`-2 35//|| 36//|+-21 37//||| 38//||`--21 39//|| 40//|`-22 41//|| 42//|`--22 43//|" 44//...是Foldable 45tree.foldMap(_.toString)==="1111212122122"//>res11:Boolean=true 复制代码 说到构建Tree,偶然在网上发现了这么一个Tree构建函数:
复制代码 defpathTree[E](root:E,paths:Seq[Seq[E]]):Tree[E]={ root.node(pathsgroupBy(_.head)map{ case(parent,subpaths)=> pathTree(parent,subpathscollect{ casepp+:restifrest.nonEmpty=>rest }) }toSeq:_) } 复制代码 据说这个pathTree函数能把List里的目录结构转化成Tree。先看看到底是不是具备如此功能:
复制代码 1valpaths=List(List("A","a1","a2"),List("B","b1")) 2//>paths:List[List[String]]=List(List(A,a1,a2),List(B,b1)) 3pathTree("root",paths)drawTree//>res0:String=""root" 4//|| 5//|+-"A" 6//||| 7//||`-"a1" 8//||| 9//||`-"a2" 10//|| 11//|`-"B" 12//|| 13//|`-"b1" 14//|" 15valpaths=List(List("A","a1","a2"),List("B","b1"),List("B","b2","b3")) 16//>paths:List[List[String]]=List(List(A,a1,a2),List(B,b1),List(B,b2, 17//|b3)) 18pathTree("root",paths)drawTree//>res0:String=""root" 19//|| 20//|+-"A" 21//||| 22//||`-"a1" 23//||| 24//||`-"a2" 25//|| 26//|`-"B" 27//|| 28//|+-"b2" 29//||| 30//||`-"b3" 31//|| 32//|`-"b1" 33//|" 复制代码 果然能行,而且还能把"B"节点合并汇集。这个函数的作者简直就是个神人,起码是个算法和FP语法运用大师。我虽然还无法达到大师的程度能写出这样的泛函程序,但好奇心是挡不住的,总想了解这个函数是怎么运作的。可以用一些测试数据来逐步跟踪一下:
1valpaths=List(List("A"))//>paths:List[List[String]]=List(List(A)) 2valgpPaths=paths.groupBy(_.head)//>gpPaths:scala.collection.immutable.Map[String,List[List[String]]]=Map(A->List(List(A))) 3List(List("A"))collect{casepp+:restifrest.nonEmpty=>rest} 4//>res0:List[List[String]]=List()
通过上面的跟踪约化我们看到List(List(A))在pathTree里的执行过程。这里把复杂的groupBy和collect函数的用法和结果了解了。实际上整个过程相当于:
1"root".node( 2"A".node(List().toSeq:_) 3)drawTree//>res3:String=""root" 4//|| 5//|`-"A" 6//|"
如果再增加一个点就相当于:
复制代码 1"root".node( 2"A".node(List().toSeq:_), 3"B".node(List().toSeq:_) 4)drawTree//>res4:String=""root" 5//|| 6//|+-"A" 7//|| 8//|`-"B" 9//|" 复制代码 加多一层:
复制代码 1valpaths=List(List("A","a1"))//>paths:List[List[String]]=List(List(A,a1)) 2valgpPaths=paths.groupBy(_.head)//>gpPaths:scala.collection.immutable.Map[String,List[List[String]]]=Map(A 3//|->List(List(A,a1))) 4List(List("A","a1"))collect{casepp+:restifrest.nonEmpty=>rest} 5//>res0:List[List[String]]=List(List(a1)) 6 7//化解成 8"root".node( 9"A".node( 10"a1".node( 11List().toSeq:_) 12) 13)drawTree//>res3:String=""root" 14//|| 15//|`-"A" 16//|| 17//|`-"a1" 18//|" 复制代码 合并目录:
复制代码 1valpaths=List(List("A","a1"),List("A","a2"))//>paths:List[List[String]]=List(List(A,a1),List(A,a2)) 2valgpPaths=paths.groupBy(_.head)//>gpPaths:scala.collection.immutable.Map[String,List[List[String]]]=Map(A 3//|->List(List(A,a1),List(A,a2))) 4List(List("A","a1"),List("A","a2"))collect{casepp+:restifrest.nonEmpty=>rest} 5//>res0:List[List[String]]=List(List(a1),List(a2)) 6 7//相当产生结果 8"root".node( 9"A".node( 10"a1".node( 11List().toSeq:_) 12, 13"a2".node( 14List().toSeq:_) 15) 16)drawTree//>res3:String=""root" 17//|| 18//|`-"A" 19//|| 20//|+-"a1" 21//|| 22//|`-"a2" 23//|" 复制代码
相信这些跟踪过程足够了解整个函数的工作原理了。 有了Tree构建方法后就需要Tree的游动和操作函数了。与串形集合的直线游动不同的是,树形集合游动方式是分岔的。所以Zipper不太适用于树形结构。scalaz特别提供了树形集合的定位游标TreeLoc,我们看看它的定义:scalaz/TreeLoc.scala
复制代码 finalcaseclassTreeLoc[A](tree:Tree[A],lefts:TreeForest[A], rights:TreeForest[A],parents:Parents[A]){ ... traitTreeLocFunctions{ typeTreeForest[A]= Stream[Tree[A]]
typeParent[A]= (TreeForest[A],A,TreeForest[A])
typeParents[A]= Stream[Parent[A]] 复制代码
树形集合游标TreeLoc由当前节点tree、左子树lefts、右子树rights及父树parents组成。lefts,rights,parents都是在流中的树形Stream[Tree[A]]。 用Tree.loc可以直接对目标树生成TreeLoc:
复制代码 1/ATreeLoczipperofthistree,focusedontherootnode./ 2defloc:TreeLoc[A]=TreeLoc.loc(this,Stream.Empty,Stream.Empty,Stream.Empty) 3 4valtree:Tree[Int]= 51.node( 611.leaf, 712.node( 8121.leaf), 92.node( 1021.leaf, 1122.leaf) 12)//>tree:scalaz.Tree[Int]= 13 14tree.loc//>res7:scalaz.TreeLoc[Int]=TreeLoc(,Stream(),Stream(),Stream()) 复制代码
TreeLoc的游动函数:
复制代码 defroot:TreeLoc[A]= parentmatch{ caseSome(z)=>z.root caseNone=>this }
/Selecttheleftsiblingofthecurrentnode./ defleft:Option[TreeLoc[A]]=leftsmatch{ caset#::ts=>Some(loc(t,ts,tree#::rights,parents)) caseStream.Empty=>None }
/Selecttherightsiblingofthecurrentnode./ defright:Option[TreeLoc[A]]=rightsmatch{ caset#::ts=>Some(loc(t,tree#::lefts,ts,parents)) caseStream.Empty=>None }
/Selecttheleftmostchildofthecurrentnode./ deffirstChild:Option[TreeLoc[A]]=tree.subForestmatch{ caset#::ts=>Some(loc(t,Stream.Empty,ts,downParents)) caseStream.Empty=>None /Selecttherightmostchildofthecurrentnode./ deflastChild:www.wang027.comOption[TreeLoc[A]]=tree.subForest. reversematch{caset#::ts=>Some(loc(t,ts,Stream.Empty, downParents))caseStream.Empty=>None }
/Selectthenthchildofthecurrentnode./ defgetChild(n:Int):Option[TreeLoc[A]]= for{lr<-splitChildren(Stream.Empty,tree.subForest,n) ls=lr._1 }yieldloc(ls.head,ls.tail,lr._2,downParents) 复制代码
我们试着用这些函数游动:
复制代码 1valtree:Tree[Int]= 21.node( 311.leaf, 412.node( 5121.leaf), 62.node( 721.leaf, 822.leaf) 9)//>tree:scalaz.Tree[Int]= 10tree.loc//>res7:scalaz.TreeLoc[Int]=TreeLoc(,Stream(),Stream(),Stream()) 11vall=for{ 12l1<-tree.loc.some 13l2<-l1.firstChild 14l3<-l1.lastChild 15l4<-l3.firstChild 16}yield(l1,l2,l3,l4)//>l:Option[(scalaz.TreeLoc[Int],scalaz.TreeLoc[Int],scalaz.TreeLoc[Int], 17//|scalaz.TreeLoc[Int])]=Some((TreeLoc(,Stream(),Stream(),Stream()),T 18//|reeLoc(,Stream(),Stream(,),Stream((Stream(),1,Stream()), 19//|?)),TreeLoc(,Stream(,),Stream(),Stream((Stream(),1,Stre 20//|am()),?)),TreeLoc(,Stream(),Stream(,?),Stream((Stream(, 21//|),2,Stream()),?)))) 22 23l.get._1.getLabel//>res8:Int=1 24l.get._2.getLabel//>res9:Int=11 25l.get._3.getLabel//>res10:Int=2 26l.get._4.getLabel//>res11:Int=21 复制代码
跳动函数:
复制代码 /Selectthenthchildofthecurrentnode./ defgetChild(n:Int):Option[TreeLoc[A]]= for{lr<-splitChildren(Stream.Empty,tree.subForest,n) ls=lr._1 }yieldloc(ls.head,ls.tail,lr._2,downParents)
/Selectthefirstimmediatechildofthecurrentnodethatsatisfiesthegivenpredicate./ deffindChild(p:Tree[A]=>Boolean):Option[TreeLoc[A]]={ @tailrec defsplit(acc:TreeForest[A],xs:TreeForest[A]):Option[(TreeForest[A],Tree[A],TreeForest[A])]= (acc,xs)match{ case(acc,Stream.cons(x,xs))=>if(p(x))Some((acc,x,xs))elsesplit(Stream.cons(x,acc),xs) case_=>None } for(ltr<-split(Stream.Empty,tree.subForest))yieldloc(ltr._2,ltr._1,ltr._3,downParents) }
/Selectthefirstdescendantnodeofthecurrentnodethatsatisfiesthegivenpredicate./ deffind(p:TreeLoc[A]=>Boolean):Option[TreeLoc[A]]= Cobind[TreeLoc].cojoin(this).tree.flatten.find(p) 复制代码 find用法示范:
复制代码 1valtree:Tree[Int]= 21.node( 311.leaf, 412.node( 5121.leaf), 62.node( 721.leaf, 822.leaf) 9)//>tree:scalaz.Tree[Int]= 10tree.loc//>res7:scalaz.TreeLoc[Int]=TreeLoc(,Stream(),Stream(),Stream()) 11vall=for{ 12l1<-tree.loc.some 13l2<-l1.find{_.getLabel==2} 14l3<-l1.find{_.getLabel==121} 15l4<-l2.find{_.getLabel==22} 16l5<-l1.findChild{_.rootLabel==12} 17l6<-l1.findChild{_.rootLabel==2} 18}yieldl6//>l:Option[scalaz.TreeLoc[Int]]=Some(TreeLoc(,Stream(,?),St 19//|ream(),Stream((Stream(),1,Stream()),?))) 复制代码
注意:上面6个跳动都成功了。如果无法跳转结果会是None insert,modify,delete这些操作函数:
复制代码 /Replacethecurrentnodewiththegivenone./ defsetTree(t:Tree[A]):TreeLoc[A]=loc(t,lefts,rights,parents)
/Modifythecurrentnodewiththegivenfunction./ defmodifyTree(f:Tree[A]=>Tree[A]):TreeLoc[A]=setTree(f(tree))
/Modifythelabelatthecurrentnodewiththegivenfunction./ defmodifyLabel(f:A=>A):TreeLoc[A]=setLabel(f(getLabel))
/Getthelabelofthecurrentnode./ defgetLabel:A=tree.rootLabel
/Setthelabelofthecurrentnode./ defsetLabel(a:A):TreeLoc[A]=modifyTree((t:Tree[A])=>node(a,t.subForest))
/Insertthegivennodetotheleftofthecurrentnodeandgiveitfocus./ definsertLeft(t:Tree[A]):TreeLoc[A]=loc(t,lefts,Stream.cons(tree,rights),parents)
/Insertthegivennodetotherightofthecurrentnodeandgiveitfocus./ definsertRight(t:Tree[A]):TreeLoc[A]=loc(t,Stream.cons(tree,lefts),rights,parents)
/Insertthegivennodeasthefirstchildofthecurrentnodeandgiveitfocus./ definsertDownFirst(t:Tree[A]):TreeLoc[A]=loc(t,Stream.Empty,tree.subForest,downParents)
/Insertthegivennodeasthelastchildofthecurrentnodeandgiveitfocus./ definsertDownLast(t:Tree[A]):TreeLoc[A]=loc(t,tree.subForest.reverse,Stream.Empty,downParents)
/Insertthegivennodeasthenthchildofthecurrentnodeandgiveitfocus./ definsertDownAt(n:Int,t:Tree[A]):Option[TreeLoc[A]]= for(lr<-splitChildren(Stream.Empty,tree.subForest,n))yieldloc(t,lr._1,lr._2,downParents)
/Deletethecurrentnodeandallitschildren./ defdelete:Option[TreeLoc[A]]=rightsmatch{ caseStream.cons(t,ts)=>Some(loc(t,lefts,ts,parents)) case_=>leftsmatch{ caseStream.cons(t,ts)=>Some(loc(t,ts,rights,parents)) case_=>for(loc1<-parent)yieldloc1.modifyTree((t:Tree[A])=>node(t.rootLabel,Stream.Empty)) } } 复制代码 用法示范:
复制代码 1valtr=1.leaf//>tr:scalaz.Tree[Int]= 2valtl=for{ 3l1<-tr.loc.some 4l3<-l1.insertDownLast(12.leaf).some 5l4<-l3.insertDownLast(121.leaf).some 6l5<-l4.root.some 7l2<-l5.insertDownFirst(11.leaf).some 8l6<-l2.root.some 9l7<-l6.find{_.getLabel==12} 10l8<-l7.setLabel(102).some 11}yieldl8//>tl:Option[scalaz.TreeLoc[Int]]=Some(TreeLoc(,Stream(,?),S 12//|tream(),Stream((Stream(),1,Stream()),?))) 13 14tl.get.toTree.drawTree//>res8:String="1 15//|| 16//|+-11 17//|| 18//|`-102 19//|| 20//|`-121 21//|" 22 复制代码 setTree和delete会替换当前节点下的所有子树:
复制代码 1valtree:Tree[Int]= 21.node( 311.leaf, 412.node( 5121.leaf), 62.node( 721.leaf, 822.leaf) 9)//>tree:scalaz.Tree[Int]= 10defmodTree(t:Tree[Int]):Tree[Int]={ 11vall=for{ 12l1<-t.loc.some 13l2<-l1.find{_.getLabel==22} 14l3<-l2.setTree{3.node(31.leaf)}.some 15}yieldl3 16l.get.toTree 17}//>modTree:(t:scalaz.Tree[Int])scalaz.Tree[Int] 18vall=for{ 19l1<-tree.loc.some 20l2<-l1.find{_.getLabel==2} 21l3<-l2.modifyTree{modTree(_)}.some 22l4<-l3.root.some 23l5<-l4.find{_.getLabel==12} 24l6<-l5.delete 25}yieldl6//>l:Option[scalaz.TreeLoc[Int]]=Some(TreeLoc(,Stream(,?),St 26//|ream(),Stream((Stream(),1,Stream()),?))) 27l.get.toTree.drawTree//>res7:String="1 28//|| 29//|+-11 30//|| 31//|`-2 32//|| 33//|+-21 34//|| 35//|`-3 36//|| 37//|`-31 38//|" 复制代码 通过scalaz的Tree和TreeLoc数据结构,以及一整套树形结构游览、操作函数,我们可以方便有效地实现FP风格的不可变树形集合编程。 |
|