配色: 字号:
Color- and Texture-Based Image Segmentation Using EM and Its Application to Content-Based Image Retr
2014-01-08 | 阅:  转:  |  分享 
  
Color-andTexture-BasedImageSegmentationUsingEM

andItsApplicationtoContent-BasedImageRetrieval

#03

SergeBelongie,ChadCarson,HayitGreenspan,andJitendraMalik

ComputerScienceDivision

UniversityofCaliforniaatBerkeley

Berkeley,CA94720

fsjb,carson,hayit,malikg@cs.berkeley.edu

Abstract

Retrievingimagesfromlargeandvariedcollectionsusing

imagecontentasakeyisachallengingandimportantprob-

lem.Inthispaperwepresentanewimagerepresentation

whichprovidesatransformationfromtherawpixeldatato

asmallsetofimageregionswhicharecoherentincolorand

texturespace.Thisso-called“blobworld”representationis

basedonsegmentationusingtheExpectation-Maximization

algorithmoncombinedcolorandtexturefeatures.Thetex-

turefeaturesweuseforthesegmentationarisefromanew

approachtotexturedescriptionandscaleselection.

Wedescribeasystemthatusestheblobworldrepresenta-

tiontoretrieveimages.Animportantanduniqueaspectof

thesystemisthat,inthecontextofsimilarity-basedquery-

ing,theuserisallowedtoviewtheinternalrepresentation

ofthesubmittedimageandthequeryresults.Similarsys-

temsdonotoffertheuserthisviewintotheworkingsofthe

system;consequently,theoutcomeofmanyqueriesonthese

systemscanbequiteinexplicable,despitetheavailabilityof

knobsforadjustingthesimilaritymetric.

1Introduction

Verylargecollectionsofimagesaregrowingevermore

common.Fromstockphotocollectionstoproprietary

databasestotheWeb,thesecollectionsarediverseandoften

poorlyindexed;unfortunately,imageretrievalsystemshave

notkeptpacewiththecollectionstheyaresearching.The

shortcomingsofthesesystemsareduebothtotheimage

representationstheyuseandtotheirmethodsofaccessing

thoserepresentationstofindimages:

#0FWhileusersgenerallywanttofindimagescontaining

particularobjects(“things”)[4,6],mostexistingim-

ageretrievalsystemsrepresentimagesbasedonlyon

#03

ToappearatICCV’98.Copyright(c)1998IEEE.

theirlow-levelfeatures(“stuff”),withlittleregardfor

thespatialorganizationofthosefeatures.

#0FSystemsbasedonuserqueryingareoftenunintuitive

andofferlittlehelpinunderstandingwhycertainim-

ageswerereturnedandhowtorefinethequery.Often

theuserknowsonlythathehassubmittedaqueryfor,

say,abearandretrievedveryfewpicturesofbearsin

return.

#0FForgeneralimagecollections,therearecurrentlyno

systemsthatautomaticallyclassifyimagesorrecog-

nizetheobjectstheycontain.

Inthispaperwepresentanewimagerepresentation,

“blobworld,”andaretrievalsystembasedonthisrepre-

sentation.Whileblobworlddoesnotexistcompletelyin

the“thing”domain,itrecognizesthenatureofimagesas

combinationsofobjects,andbothqueryingandlearningin

blobworldaremoremeaningfulthantheyarewithsimple

“stuff”representations.

WeusetheExpectation-Maximization(EM)algorithmto

performautomaticsegmentationbasedonimagefeatures.

EMiterativelymodelsthejointdistributionofcolorand

texturewithamixtureofGaussians;theresultingpixel-

clustermembershipsprovideasegmentationoftheimage.

Aftertheimageissegmentedintoregions,adescription

ofeachregion’scolor,texture,andspatialcharacteristicsis

produced.Inaqueryingtask,theusercanaccesstheregions

directly,inordertoseethesegmentationofthequeryimage

andspecifywhichaspectsoftheimageareimportanttothe

query.Whenqueryresultsarereturned,theuserseesthe

blobworldrepresentationofthereturnedimages;thisassists

greatlyinrefiningthequery.

Webeginthispaperbybrieflydiscussingthecurrentstate

ofimageretrieval.InSection2wedescribetheblobworld

representation,fromfeaturesthroughsegmentationtoregion

description.InSection3wepresentaquerysystembased

onblobworld,aswellasresultsfromqueriesinacollection

ofhighlyvariednaturalimages.

1.1Background

CurrentimagedatabasesystemsincludeIBM’sQueryby

ImageContent(QBIC)[18],Photobook[20],Virage[10],

Candid[14],andChabot[19].Thesesystemsprimarilyuse

low-levelimageproperties;severalofthemincludesomede-

greeofautomaticsegmentation.Noneofthesystemscodes

spatialorganizationinawaythatsupportsobjectqueries.

Classicalobjectrecognitiontechniquesusuallyrelyon

cleansegmentationoftheobjectfromtherestoftheimage

oraredesignedforfixedgeometricobjectssuchasmachine

parts.Neitherconstraintholdsinourcase:theshape,size,

andcolorofobjectslikecheetahsandpolarbearsarequite

variable,andsegmentationisimperfect.Clearly,classical

objectrecognitiondoesnotapply.Morerecenttechniques

[21]canidentifyspecificobjectsdrawnfromafinite(on

theorderof100)collection,butnopresenttechniqueis

effectiveatthegeneralimageanalysistask,whichrequires

bothimagesegmentationandimageclassification.

PromisingworkbyLipsonetal.[16]retrievesimages

basedonspatialandphotometricrelationshipswithinand

acrossimageregions.Littleornosegmentationisdone;the

regionsarederivedfromlow-resolutionimages.

EarlierworkhasusedtheEMalgorithmand/ortheMin-

imumDescriptionLength(MDL)principletoperformseg-

mentationbasedonmotion[1,25]orscaledintensities[26],

butEMhasnotpreviouslybeenusedonjointcolorand

texture.Relatedapproachessuchasdeterministicanneal-

ing[11]andclassicalclustering[12]havebeenappliedto

texturesegmentationwithoutcolor.

2Theblobworldimagerepresentation

Theblobworldrepresentationisrelatedtothenotionof

photographicorartisticscenecomposition.Inthesense

discussedin[23],theblobworlddescriptorsconstitutean

exampleofasummaryrepresentationbecausetheyarecon-

ciseandrelativelyeasytoprocessinaqueryingframework.

Blobworldisdistinctfromcolor-layoutmatchingasin

QBICinthatitisdesignedtofindobjectsorpartsofob-

jects.Eachimagemaybevisualizedbyanensembleof2-D

ellipses,or“blobs,”eachofwhichpossessesanumberof

attributes.Thenumberofblobsinanimageistypicallyless

thanten.Eachblobrepresentsaregionoftheimagewhich

isroughlyhomogeneouswithrespecttocolorortexture.

Ablobisdescribedbyitsdominantcolors,meantexture

descriptors,andspatialcentroidandscattermatrix.(See

Figs.3–4foravisualizationofblobworld.)

2.1Extractingcolorandtexturefeatures

Ourgoalistoassignthepixelsintheoriginalimagetoa

relativelysmallnumberofgroups,whereeachgrouprepre-

sentsasetofpixelsthatarecoherentintheircolorandlocal

textureproperties;themotivationistoreducetheamount

ofrawdatapresentedbytheimagewhilepreservingthein-

formationneededfortheimageunderstandingtask.Given

theunconstrainednatureoftheimagesinourdatabase,itis

importantthatthetoolsweemploytomeetthisgoalbeas

generalaspossiblewithoutsacrificinganundueamountof

descriptivepower.

2.1.1Color

Colorisaveryimportantcueinextractinginforma-

tionfromimages.Colorhistogramsarecommonlyused

incontent-basedretrievalsystems[18,19,24]andhave

proventobeveryuseful;however,theglobalcharacteriza-

tionispoorat,forexample,distinguishingbetweenafield

oforangeflowersandatiger,becauseitlacksinformation

abouthowthecolorisdistributedspatially.Itisimportant

togroupcolorinlocalizedregionsandtofusecolorwith

texturalproperties.

Wetreatthehue-saturation-value(HSV)colorspaceas

acone:foragivenpoint#28h;s;v#29,handsvaretheangular

andradialcoordinatesofthepointonadiskofradiusvat

heightv;allcoordinatesrangefrom0to1.Pointswith

smallvareblack,regardlessoftheirhandsvalues.The

conerepresentationmapsallsuchpointstotheapexofthe

cone,sotheyareclosetooneanother.TheCartesiancoor-

dinatesofpointsinthecone,#28svcos#282#19h#29;svsin#282#19h#29;v#29,

cannowbeusedtofindcolordifferences.Thisencoding

allowsustooperationalizethefactthathuedifferencesare

meaninglessforverysmallsaturations(thosenearthecone’s

axis).However,itignoresthefactthatforlargevaluesand

saturations,huedifferencesareperceptuallymorerelevant

thansaturationandvaluedifferences.

2.1.2Texture

Textureisawell-researchedpropertyofimageregions,

andmanytexturedescriptorshavebeenproposed,including

multi-orientationfilterbanks[17]andthesecond-moment

matrix[5,8].Wewillnotelaboratehereontheclassical

approachestotexturesegmentationandclassification,both

ofwhicharechallengingandwell-studiedtasks.Rather,we

introduceanewperspectiverelatedtotexturedescriptorsand

texturegroupingmotivatedbythecontent-basedretrieval

task.

Whilecolorisapointproperty,textureisalocal-

neighborhoodproperty.Itdoesnotmakesensetotalkabout

thetextureofzebrastripesataparticularpixelwithoutspec-

ifyinganeighborhoodaroundthatpixel.Inorderfora

texturedescriptortobeuseful,itmustprovideanadequate

descriptionoftheunderlyingtextureparametersanditmust

becomputedinaneighborhoodwhichisappropriatetothe

localstructurebeingdescribed.

2

ce

d

b

a

σ=0

(a)flow;

σ=1.5

(b)flow;

σ=2.5

(c)2-Dtexture;

σ=1.5

(d)edge

σ=0

(e)uniform

Figure1.Fivesamplepatchesfromazebraim-

age.#28a#29and#28b#29havestripes#281-D#0Dow#29ofdif-

ferentscalesandorientations,#28c#29isaregionof2-D

texture,#28d#29containsanedge,and#28e#29isauniform

region.

Thefirstrequirementcouldbemettoanarbitrarydegree

ofsatisfactionbyusingmulti-orientationfilterbankssuchas

steerablefilters;wechoseasimplermethodthatissufficient

forourpurposes.Thesecondrequirement,whichmaybe

thoughtofastheproblemofscaleselection,doesnotenjoy

thesamelevelofattentionintheliterature.Thisisunfortu-

nate,sincetexturedescriptorscomputedatthewrongscale

onlyconfusetheissue.

Inthiswork,weintroduceanovelmethodofscaleselec-

tionwhichworksintandemwithafairlysimplebutinforma-

tivesetoftexturedescriptors.Thescaleselectionmethodis

basedonedge/barpolaritystabilization,andthetexturede-

scriptorsarisefromthewindowedsecondmomentmatrix.

Botharederivedfromthegradientoftheimageintensity,

whichwedenotebyrI.WecomputerIusingthefirst

differenceapproximationalongeachdimension.Thisoper-

ationisoftenaccompaniedbysmoothing,butwehavefound

thispreprocessingoperationunnecessaryfortheimagesin

ourcollection.

Tomakethenotionofscaleconcrete,wedefinethescale

tobethewidthoftheGaussianwindowwithinwhichthe

gradientvectorsoftheimagearepooled.Thesecondmo-

mentmatrixforthevectorswithinthiswindow,computed

abouteachpixelintheimage,canbeapproximatedusing

M

#1B

#28x;y#29=G

#1B

#28x;y#29#03#28rI#29#28rI#29

T

(1)

whereG

#1B

#28x;y#29isaseparablebinomialapproximationtoa

Gaussiansmoothingkernelwithvariance#1B

2

.

Ateachpixellocation,M

#1B

#28x;y#29isa2#022symmetric

positivesemidefinitematrix;thusitprovidesuswiththree

piecesofinformationabouteachpixel.Ratherthanwork

withtherawentriesinM

#1B

,itismorecommontodealwith

itseigenstructure[2,5].Considerafixedscaleandpixel

location,let#15

1

and#15

2

(#15

1

#15#15

2

)denotetheeigenvaluesof

M

#1B

atthatlocation,andlet#1Edenotetheargumentofthe

principaleigenvectorofM

#1B

.When#15

1

islargecomparedto

#15

2

,thelocalneighborhoodpossessesadominantorientation,

asspecifiedby#1E.Whentheeigenvaluesarecomparable,

thereisnopreferredorientation,andwhenbotheigenval-

uesarenegligible,thelocalneighborhoodisapproximately

constant.

Scaleselection

Wemaythinkof#1Bascontrollingthesizeoftheintegra-

tionwindowaroundeachpixelwithinwhichtheouterprod-

uctofthegradientvectorsisaveraged.#1Bhasbeencalled

theintegrationscaleorartificialscalebyvariousauthors

[5,8]todistinguishitfromthenaturalscaleusedinlinear

smoothingofrawimageintensities.Notethat#1B=#1B#28x;y#29;

thescalevariesacrosstheimage.

1

InordertoselectthescaleatwhichM

#1B

iscomputed,

i.e.todeterminethefunction#1B#28x;y#29,wemakeuseofa

localimagepropertyknownaspolarity.

2

Thepolarityis

ameasureoftheextenttowhichthegradientvectorsina

certainneighborhoodallpointinthesamedirection.(Inthe

computationofsecondmoments,thisinformationislostin

theouterproductoperation;i.e.,gradientvectordirections

differingby180

#0E

areindistinguishable.)Thepolarityata

givenpixeliscomputedwithrespecttothedominantori-

entation#1Eintheneighborhoodofthatpixel.Foreaseof

notation,letusconsiderafixedscaleandpixellocation.We

definepolarityas

p=

jE

+

,E

,

j

E

+

+E

,

ThedefinitionsofE

+

andE

,

are

E

+

=

X

#28x;y#292?

G

#1B

#28x;y#29#5BrI#01?n#5D

+

and

E

,

=

X

#28x;y#292?

G

#1B

#28x;y#29#5BrI#01?n#5D

,

where#5Bq#5D

+

and#5Bq#5D

,

aretherectifiedpositiveandnegative

partsoftheirargument,?nisaunitvectorperpendicularto#1E,

and?representstheneighborhoodunderconsideration.We

canthinkofE

+

andE

,

asmeasuresofhowmanygradient

vectorsin?areonthe“positiveside”and“negativeside”

ofthedominantorientation,respectively.Notethatpranges

from0to1.Asimilarmeasureisusedin[15]todistinguish

aflowpatternfromanedge.

1

Strictlyspeaking,eqn.(1)isaslidinginnerproduct,notaconvolution,

since#1B#28x;y#29isspatiallyvariant.

2

Polarityisrelatedtothequadraturephaseasdiscussedin[7,9].

3

Thepolarityp

#1B

variesasthescale#1Bchanges;itsbehavior

intypicalimageregionscanbesummarizedasfollows:

Edge:Thepresenceofanedgeissignaledbyp

#1B

holding

valuescloseto1forall#1B.

Texture:Inregionswith2-Dtextureor1-Dflow,p

#1B

decays

with#1Bduetothepresenceofmultipleorientations.

Uniform:Inaconstant-intensityneighborhood,p

#1B

takes

onarbitraryvaluessincethegradientvectorshave

negligiblemagnitudesandthereforearbitraryangles.

Theprocessofselectingascaleisbasedonthederivative

ofthepolaritywithrespecttoscale.First,wecomputethe

polarityateverypixelintheimagefor#1B

k

=k=2;k=

0;1;:::;7,thusproducinga“stack”ofpolarityimages

acrossscale.Then,foreachk,thepolarityimagecomputed

atscale#1B

k

isconvolvedwithaGaussianwithstandardde-

viation2#1B

k

toyieldasmoothedpolarityimage?p

#1B

k

.For

eachpixel,weselectthescaleasthefirstvalueof#1B

k

for

whichthedifferencebetweensuccessivevaluesofpolarity

(?p

#1B

k

,?p

#1B

k,1

)islessthan2%.Inthismanner,weareper-

formingasoftversionoflocalspatialfrequencyestimation,

sincethesmoothedpolaritytendstostabilizeoncethescale

windowencompassesoneapproximateperiod.Sincewe

stopat#1B

k

=3:5,thelargestperiodwecandetectisapprox-

imately10pixels.Notethatwhentheperiodisundefined,

asisthecaseinuniformregions,theselectedscaleisnot

meaningfulandissettozero.Wedeclareapixeltobe

uniformifitsmeancontrastacrossscaleislessthan0:1.

Anothermethodofscaleselectionthathasbeenproposed

[8]isbasedonlocalizingextremaacrossscaleofaninvariant

ofM

#1B

,suchasthetraceordeterminant.Inthisalgorithm,

whichisappliedtotheproblemofestimatingtheslantand

tiltofsurfaceswithtangentialtexture,itisnecessaryto

performnaturalsmoothingatascaletiedtotheartificial

scale.Wefoundthatthisextrasmoothingcompromisedthe

spatiallocalizationabilityofourscaleselectionmethod.

Texturefeatures

Onceascale#1B

#03

isselectedforeachpixel,thatpixelis

assignedthreetexturedescriptors.Thefirstisthepolarity,

p

#1B

#03.Theothertwo,whicharetakenfromM

#1B

#03,arethe

anisotropy,definedasa=1,#15

2

=#15

1

,andthenormalized

texturecontrast,definedasc=2

p

#15

1

+#15

2

.

3

Theseare

relatedtoderivedquantitiesreportedin[8].

2.1.3Combiningcolorandtexturefeatures

Thecolor/texturedescriptorforagivenpixelconsistsof

sixvalues:threeforcolorandthreefortexture.Thethree

colorcomponentsarethecolor-conecoordinatesfoundafter

3

Ifweuseacenteredfirstdifferencekernelinthegradientcomputation,

thefactorof2makescrangefrom0to1.

spatialaveragingusingaGaussianattheselectedscale.

Thethreetexturecomponentsareac;pc,andc,computed

attheselectedscale;theanisotropyandpolarityareeach

modulatedbythecontrastinanalogytotheconstruction

ofthecolor-conecoordinates.(Recallthatanisotropyand

polarityaremeaninglessinregionsoflowcontrast.)In

effect,agiventexturedpatchinanimagefirsthasitstexture

propertiesextractedandisthenreplacedbyasmoothpatch

ofaveragedcolor.Inthismanner,thecolorandtexture

propertiesinagivenregionaredecoupled;forexample,a

zebraisagrayhorseplusstripes.

Notethatinthisformulationofthecolor/texturedescrip-

tor,orientationandselectedscaledonotappearinthefeature

vector;asaresult,groupingcanoccuracrossvariationsin

scaleandorientation.

2.2GroupingwiththeEMAlgorithm

Onceanimagehasbeenprocessedusingtheabovefea-

tureextractionscheme,theresultisalargesetof6-Dfeature

vectors,whichwemayregardaspointsina6-Dfeature

space.Inordertodividethesepointsintogroups,wemake

useoftheExpectation-Maximization(EM)algorithm[3]to

determinethemaximumlikelihoodparametersofamixture

ofKGaussiansinsidethe6-Dfeaturespace.

TheEMalgorithmisusedforfindingmaximumlikeli-

hoodparameterestimateswhenthereismissingorincom-

pletedata.Inourcase,themissingdataistheregionto

whichthepointsinthefeaturespacebelong.Weestimate

valuestofillinfortheincompletedata(the“E-Step”),com-

putethemaximumlikelihoodparameterestimatesusingthis

data(the“M-Step”),andrepeatuntilasuitablestoppingcri-

terionisreached.

ThefirststepinapplyingtheEMalgorithmistoinitialize

ameanvectorandcovariancematrixtorepresenteachof

theKgroups.Weinitializethemeanstorandomvalues

andthecovariancestoidentitymatrices.(Inearlierwork

wechosetheinitializationforEMcarefully,butwehave

foundthattheinitializationhaslittleeffectonthequality

oftheresultingsegmentation.)Theupdateschemeallows

forfullcovariancematrices;variantsincluderestrictingthe

covariancetobediagonaloraconstanttimestheidentity

matrix.Fullcovariancematricessuitourproblem,since

manyplausiblefeatureclustersrequireextrudedcovariance

shapes,e.g.theshadesofgrayalongthecolorconeaxis.

Uponconvergence,theGaussianmixtureparameterscan

beinspectedtodeterminewhatcolor/texturepropertiesare

representedbyeachcomponentofthemixture.Someex-

amplesofgroupsthatcanformincludethefollowing:

#0Fbright,bluish,andtexturelessregions(e.g.,sky)

#0Fanisotropicandnon-polarregions(e.g.,zebrahide)

#0Fgreenweak-isotropictexture(e.g.,grass)

4

WehavethusfarnotdiscussedhowtochooseK,the

numberofmixturecomponents.Ideallywewouldliketo

choosethatvalueofKthatbestsuitsthenaturalnumberof

groupspresentintheimage.Onereadilyavailablenotionof

goodnessoffitisthelog-likelihood.Giventhisindicator,we

canapplytheMinimumDescriptionLength(MDL)princi-

ple[22]toselectamongvaluesofK.Asaconsequenceof

thisprinciple,whenmodelsusingtwovaluesofKfitthe

dataequallywell,thesimplermodelwillbechosen.Forour

experiments,Krangesfrom2to5.

Onceamodelisselected,thenextstepistoperform

spatialgroupingofthosepixelsbelongingtothesame

color/texturecluster.WefirstproduceaK-levelimage

whichencodespixel-clustermembershipsbyreplacingeach

pixelwiththelabeloftheclusterforwhichitattainsthehigh-

estlikelihood(seeFig.2(d)).Toenforceasmallamountof

spatialsmoothnessinthisrepresentation,weapplya3#023

maximum-votefiltertotherawcluster-membershipimage.

Finally,weruntheresultingimagethroughaconnected-

componentsalgorithmtoproduceasetoflabeledimage

regions(seeFig.2(e)).(Alternatively,onecouldenforce

spatialconstraintsbyappendingthepixelcoordinatestothe

featurevectors,thoughweobservedthatthismethodtoo

oftenyieldsunsatisfactorysegmentations.)

2.3Describingtheregions

Westoreasimpledescriptionofeachregion’scolor,

texture,andspatialcharacteristics.(SeeFig.2(f)foravisu-

alizationofthestoredrepresentation.)

Colorandtexturedescriptors

Thetwodominantcolorswithinaconnectedcomponent

arechosenbyusingtheEMalgorithmtofitamixtureof

twoGaussiansintheHSVcone.Thedetailsareasbefore

exceptthatinthiscasewerestrictthecovariancestobea

constanttimestheidentitymatrix.Uponconvergence,the

twomeanvectorsarerecordedasthedominantcolorsinthe

region.WhenthecolordistributioninsidetheHSVcone

isinfactunimodal,bothmeansbecomenearlycoincident;

wehavenotfounditnecessarytoapplymodelselection

betweenK=1andK=2.

Foreachimageregion(blob)westorethemeantexture

descriptors(i.e.,anisotropy,orientation,contrast)andthe

toptwocolors.Wedonotstoretheselectedscale,sincewe

wanttobeinvarianttoscalesintherange#1B

k

=0;:::;3:5.

Althoughpolarityisusedforscaleselection,wediscardit

here,sinceinanytexturedoruniformregionitisapproxi-

matelyzerobyvirtueofthescaleselectionprocess.

(a)(b)

(e)(f)

(d)

(c)

Figure2.Creatingtheblobworldrepresentation.

#28a#29Originalimage.

#28b#29Scaleestimatedusingpolarity.Thevaluesrange

from#1B=0#28black#29to#1B=3:5#28white#29.

#28c#29Thesixcomponentsofthecolor#2Ftexturefea-

turevectors,eachboundedbetween0#28white#29and

1#28black#29.Top:thelocallysmoothedcolor-coneco-

ordinates.Bottom:thetexturecoordinates;from

lefttoright,ac,pc,andc.Thezebrahideishighly

anisotropicandingeneralhashightexturecontrast.

Thepolarityislargestaroundtheedges,wherethe

shadinggradientpointsprimarilyinonedirection.

#28d#29Theresultsofclusteringthesefeaturevectors

intoK=2;3;4;5groupsusingEMtolearnamix-

tureofGaussians.Pixelclustermembershipsare

shownasoneofupto#0Cvegraylevels.TheMDL

principlesuggeststhattherightmostimage#28K=5#29

providesthebestsegmentationofthedata.Most

noticeableinthissegmentationareorientedtexture,

whichisfoundthroughoutthezebrahide,anduni-

formorlow-contrasttexture,whichaccountsfor

mostofthebackground.

#28e#29ThesegmentationforK=5afterapplicationof

a3#023max-vote#0Clter.Eachconnectedcomponent

inthisimagewhichpossessesanareagreaterthan

2#25ofthetotalimageareaproducesablob.

#28f#29Theblobworldrepresentation.Eachbloben-

codessummaryinformationabouttheunderlying

color,textureandshapeproperties.

5

Spatialdescriptors

Thegeometricdescriptorsoftheblobaresimplythe

centroidcandscattermatrixSoftheblobregion;thecen-

troidprovidesanotionofposition,whilethescattermatrix

providesanelementaryshapedescription.Inthequerying

processdiscussedinSection3.1,centroidseparationsare

expressedusingEuclideandistance.Thedeterminationof

thedistancebetweenscattermatricesisbasedonthethree

quantities#5Bdet#28S#29#5D

1=2

=

p

#14

1

#14

2

,1,#14

2

=#14

1

,and#12.(#14

1

and#14

2

aretheeigenvaluesofS;#12istheargumentofthe

principaleigenvectorofS.)Thesethreequantitiesrepresent

approximatearea,eccentricity,andorientation.

3Imageretrievalbyquerying

Anyonewhohasusedasearchengine,text-basedoroth-

erwise,isfamiliarwiththerealityofunwantedmatches.

Ofteninthecaseoftextsearchesthisresultsfromtheuse

ofambiguouskeywords,suchas“bank”or“interest”[27].

Unfortunately,withimagequeriesitisnotalwayssoclear

whythingsgowrong.Unlikewithtextsearches,inwhich

theusercanseethefeatures(words)inadocument,none

ofthecurrentcontent-basedimageretrievalsystemsallows

theusertoseeexactlywhatthesystemislookingforin

responsetoasimilarity-basedquery.Simplyallowingthe

usertosubmitanarbitraryimage(orsketch)andsetsome

abstractknobswithoutknowinghowtheyrelatetothein-

putimageinparticularimpliesadegreeofcomplexitythat

searchingalgorithmsdonothave.Asaresult,aqueryfor

abearcanreturnjustaboutanyobjectunderthesunifthe

queryisnotbasedonimageregions,thesegmentationrou-

tinefailsto“find”thebearinthesubmittedimage,orthe

submittedimagecontainsotherdistinctiveobjects.Without

realizingthattheinputimagewasnotproperlyprocessed,

theusercanonlywonderwhatwentwrong.Inordertohelp

theuserformulateeffectivequeriesandunderstandtheirre-

sults,aswellastominimizedisappointmentduetooverly

optimisticexpectationsofthesystem,thesystemshould

displayitsrepresentationofthesubmittedimageandthe

returnedimages.

3.1Queryinginblobworld

Inoursystem,theusercomposesaquerybysubmittingan

imageandseeingitsblobworldrepresentation,selectingthe

blobstomatch,andfinallyspecifyingtherelativeimportance

oftheblobfeatures.Theusermayalsosubmitblobsfrom

severaldifferentimages.(Forexample,aquerymightbe

thedisjunctionoftheblobscorrespondingtoairplanesin

severalimages,inordertoprovideaquerythatlooksfor

airplanesofseveralshades.)

Wedefinean“atomicquery”asonewhichspecifiesa

particularblobtomatch(e.g.,“like-blob-1”).A“compound

query”isdefinedaseitheranatomicqueryoraconjunction

ordisjunctionofcompoundqueries(“like-blob-1andlike-

blob-2”).Inthefuture,wemightexpandthisdefinitionto

includenegation(“not-like-blob-1”)andtoallowtheuserto

specifytwoblobswithaparticularspatialrelationshipasan

atomicquery(“like-blob-1-left-of-blob-2”).

Onceacompoundqueryisspecified,wescoreeach

databaseimagebasedonhowcloselyitsatisfiesthecom-

poundquery.Thescore#16

i

foreachatomicquery(like-blob-

i)iscalculatedasfollows:

1.Findthefeaturevectorv

i

forthedesiredblobb

i

.This

vectorconsistsofthestoredcolor,texture,position,

andshapedescriptors.

2.Foreachblobb

j

inthedatabaseimage:

(a)Findthefeaturevectorv

j

forb

j

.

(b)FindtheMahalanobisdistancebetweenv

i

andv

j

usingthediagonalcovariancema-

trix(featureweights)setbytheuser:

d

ij

=

#02

#28v

i

,v

j

#29

T

Σ

,1

#28v

i

,v

j

#29

#031

2

.

(c)Measurethesimilaritybetweenb

i

andb

j

using

#16

ij

=e

,

d

ij

2

.Thisscoreis1iftheblobsare

identicalinallrelevantfeatures;itdecreasesas

thematchbecomeslessperfect.

3.Take#16

i

=max

j

#16

ij

.

Thecompoundqueryscoreforthedatabaseimageiscal-

culatedusingfuzzy-logicoperations[13].Forexample,if

thequeryis“like-blob-1and(like-blob-2orlike-blob-3),”

theoverallscorefortheimageisminf#16

1

;maxf#16

2

;#16

3

gg.

Theusercanalsospecifyaweighting#1B

i

foreachatomic

query.If“like-blob-i”ispartofadisjunctioninthecom-

poundquery,theweightedscoreforatomicqueryiis

#16

0

i

=#1B

i

#16

i

;ifitisinaconjunction,itsweightedscoreis

#16

0

i

=1,#1B

i

#01#281,#16

i

#29.

Wethenranktheimagesaccordingtooverallscoreand

returnthebestmatches,indicatingforeachimagewhich

setofblobsprovidesthehighestscore;thisinformation

willhelptheuserrefinethequery.Afterreviewingthe

queryresults,theusermaychangetheweightingoftheblob

featuresormayspecifynewblobstomatch.

3.2Results

Wehaveperformedavarietyofqueriesusingasetof

2000imagesfromthecommercialCorelstockphotocol-

lection.Weusedthefollowingcategories:AfricanSpe-

cialtyAnimals;AirShows;ArabianHorses;BaldEagles;

Bears;CanadianRockies;Caribbean;Cheetahs,Leopards,

Jaguars;China;DeathValley;Deserts;Elephants;Fields;

France;Kenya;NightScenes;Sheep;Sunsets;Tigers;and

WildAnimals.SamplequeriesareshowninFigs.3–4.

6

Figure3.Blobworldqueryfortigerimages.28#25of

thetop50imagesaretigers;tigerimagesmakeup

5#25ofthedatabase.

Figure4.Blobworldqueryforzebraimages.24#25of

thetop50imagesarezebras,whilelessthan2#25of

theimagesinthedatabasearezebras.

Figure5.Tigerqueryperformance.

Figure6.Zebraqueryperformance.

Figure7.Airplanequeryperformance.

Figure8.Sunsetqueryperformance.

7

3.2.1Comparisontocolorhistograms

Wehavecomparedourresultstothoseobtainedusing

colorhistogrammatching,followingtheprocedureofSwain

andBallard[24].Thecolorhistogramforeachimageuses

8divisionsfortheintensityaxisand16foreachoppo-

nentcoloraxis.GivenaqueryimagewithhistogramQ

i

,

eachdatabaseimage(withhistogramD

i

)receivesscore

P

i

jQ

i

,D

i

j.Asbefore,werankthedatabaseimagesand

returnthebestmatches.Figures5–8showhowtheprecision

changesasmoreimagesarereturned;theblobworldquery

resultsarebetterthanthecolorhistogramresults,exceptfor

thetigerquery.Webelievethegoodcolorhistogramresults

forthetigerqueryoccurlargelybecauseofthelimitednature

ofthetestdatabase;fewnon-tigerimagesinthiscollection

havesignificantamountsofbothorangeandgreen.Adding

picturesof,say,orangeflowersinafieldwoulddegradethe

colorhistogramperformancewithoutsignificantlyaffecting

theblobworldperformance.

4Conclusions

WehaveproposedanewmethodwhichusesExpectation-

Maximizationoncolorandtexturejointlytoprovidean

imagesegmentation,aswellasanewimagerepresentation

(blobworld)whichusesthissegmentationanditsassociated

descriptorstorepresentimageregionsexplicitly.Wehave

demonstratedaquerymechanismthatusesblobworldto

retrieveimagesandhelpguideuserqueries.

Acknowledgments

WewouldliketothankDavidForsyth,JoeHellerstein,

GingerOgle,andRobertWilenskyforusefuldiscussions

relatedtothiswork.ThisworkwassupportedbyanNSF

DigitalLibraryGrant(IRI94-11334)andNSFgraduate

fellowshipsforSergeBelongieandChadCarson.

References

[1]S.AyerandH.Sawhney.Layeredrepresentationofmotion

videousingrobustmaximum-likelihoodestimationofmix-

turemodelsandMDLencoding.InProc.Int.Conf.Comp.

Vis.,pages777–784,1995.

[2]J.Big¨un.Localsymmetryfeaturesinimageprocessing.PhD

thesis,Link¨opingUniversity,1988.

[3]A.Dempster,N.Laird,andD.Rubin.Maximumlikeli-

hoodfromincompletedataviatheEMalgorithm.J.Royal

StatisticalSoc.,Ser.B,39(1):1–38,1977.

[4]P.Enser.Queryanalysisinavisualinformationretrieval

context.J.Doc.andTextManagement,1(1):25–52,1993.

[5]W.F¨orstner.Aframeworkforlowlevelfeatureextraction.

InProc.Eur.Conf.Comp.Vis.,pages383–394,1994.

[6]D.Forsyth,J.Malik,andR.Wilensky.Searchingfordigital

pictures.ScientificAmerican,276(6):72–77,June1997.

[7]W.T.FreemanandE.H.Adelson.Thedesignanduseof

steerablefilters.IEEETrans.PatternAnalysisandMachine

Intelligence,13(9):891–906,1991.

[8]J.G?ardingandT.Lindeberg.Directcomputationofshape

cuesusingscale-adaptedspatialderivativeoperators.Int.J.

Comp.Vis.,17(2):163–191,Feb1996.

[9]G.H.GranlundandH.Knutsson.SignalProcessingfor

ComputerVision.KluwerAcademicPublishers,1995.

[10]A.GuptaandR.Jain.Visualinformationretrieval.Comm.

Assoc.Comp.Mach.,40(5):70–79,May1997.

[11]T.Hofmann,J.Puzicha,andJ.M.Buhmann.Deterministic

annealingforunsupervisedtexturesegmentation.InProc.

Int.WorkshoponEnergyMin.MethodsinComp.Vis.and

Patt.Rec.,pages213–228,1997.

[12]A.K.JainandF.Farrokhnia.Unsupervisedtexturesegmen-

tationusingGaborfilters.PatternRecognition,24(12):1167–

1186,1991.

[13]J.-S.Jang,C.-T.Sun,andE.Mizutani.Neuro-FuzzyandSoft

Computing.PrenticeHall,1997.

[14]P.Kelly,M.Cannon,andD.Hush.Querybyimageexample:

theCANDIDapproach.InSPIEProc.StorageandRetrieval

forImageandVideoDatabases,pages238–248,1995.

[15]T.LeungandJ.Malik.Detecting,localizingandgrouping

repeatedsceneelementsfromanimage.InProc.Eur.Conf.

Comp.Vis.,pages546–555,1996.

[16]P.Lipson,E.Grimson,andP.Sinha.Configurationbased

sceneclassificationandimageindexing.InProc.IEEE

Comp.Soc.Conf.Comp.Vis.andPatternRecogn.,pages

1007–1013,1997.

[17]J.MalikandP.Perona.Preattentivetexturediscrimination

withearlyvisionmechanisms.J.Opt.Soc.Am.A,7(5):923–

932,1990.

[18]W.Niblacketal.TheQBICproject:queryingimagesby

contentusingcolour,textureandshape.InSPIEProc.Stor-

ageandRetrievalforImageandVideoDatabases,pages

173–187,1993.

[19]V.OgleandM.Stonebraker.Chabot:Retrievalfromare-

lationaldatabaseofimages.IEEEComputer,28(9):40–48,

Sep1995.

[20]A.Pentland,R.Picard,andS.Sclaroff.Photobook:Content-

basedmanipulationofimagedatabases.Int.J.Comp.Vis.,

18(3):233–254,1996.

[21]J.Ponce,A.Zisserman,andM.Hebert.ObjectRepresen-

tationinComputerVision—II.Number1144inLNCS.

Springer,1996.

[22]J.Rissanen.Modelingbyshortestdatadescription.Auto-

matica,14:465–471,1978.

[23]U.ShaftandR.Ramakrishnan.Datamodelingandquerying

inthePIQimageDBMS.IEEEDataEngineeringBulletin,

19(4):28–36,Dec1996.

[24]M.SwainandD.Ballard.Colorindexing.Int.J.Comp.Vis.,

7(1):11–32,1991.

[25]Y.WeissandE.Adelson.Aunifiedmixtureframeworkfor

motionsegmentation:Incorporatingspatialcoherenceand

estimatingthenumberofmodels.InProc.IEEEComp.Soc.

Conf.Comp.Vis.andPatternRecogn.,pages321–326,1996.

[26]W.Wells,R.Kikinis,W.Grimson,andF.Jolesz.Adaptive

segmentationofMRIdata.InInt.Conf.onComp.Vis.,

VirtualRealityandRoboticsinMedicine,pages59–69,1995.

[27]D.Yarowsky.Word-sensedisambiguationusingstatistical

modelsofRoget’scategoriestrainedonlargecorpora.In

Proc.Int.Conf.Comp.Linguistics,pages454–460,1992.

8

献花(0)
+1
(本文系学海无涯GL首藏)