Color-andTexture-BasedImageSegmentationUsingEM
andItsApplicationtoContent-BasedImageRetrieval
#03
SergeBelongie,ChadCarson,HayitGreenspan,andJitendraMalik
ComputerScienceDivision
UniversityofCaliforniaatBerkeley
Berkeley,CA94720
fsjb,carson,hayit,malikg@cs.berkeley.edu
Abstract
Retrievingimagesfromlargeandvariedcollectionsusing
imagecontentasakeyisachallengingandimportantprob-
lem.Inthispaperwepresentanewimagerepresentation
whichprovidesatransformationfromtherawpixeldatato
asmallsetofimageregionswhicharecoherentincolorand
texturespace.Thisso-called“blobworld”representationis
basedonsegmentationusingtheExpectation-Maximization
algorithmoncombinedcolorandtexturefeatures.Thetex-
turefeaturesweuseforthesegmentationarisefromanew
approachtotexturedescriptionandscaleselection.
Wedescribeasystemthatusestheblobworldrepresenta-
tiontoretrieveimages.Animportantanduniqueaspectof
thesystemisthat,inthecontextofsimilarity-basedquery-
ing,theuserisallowedtoviewtheinternalrepresentation
ofthesubmittedimageandthequeryresults.Similarsys-
temsdonotoffertheuserthisviewintotheworkingsofthe
system;consequently,theoutcomeofmanyqueriesonthese
systemscanbequiteinexplicable,despitetheavailabilityof
knobsforadjustingthesimilaritymetric.
1Introduction
Verylargecollectionsofimagesaregrowingevermore
common.Fromstockphotocollectionstoproprietary
databasestotheWeb,thesecollectionsarediverseandoften
poorlyindexed;unfortunately,imageretrievalsystemshave
notkeptpacewiththecollectionstheyaresearching.The
shortcomingsofthesesystemsareduebothtotheimage
representationstheyuseandtotheirmethodsofaccessing
thoserepresentationstofindimages:
#0FWhileusersgenerallywanttofindimagescontaining
particularobjects(“things”)[4,6],mostexistingim-
ageretrievalsystemsrepresentimagesbasedonlyon
#03
ToappearatICCV’98.Copyright(c)1998IEEE.
theirlow-levelfeatures(“stuff”),withlittleregardfor
thespatialorganizationofthosefeatures.
#0FSystemsbasedonuserqueryingareoftenunintuitive
andofferlittlehelpinunderstandingwhycertainim-
ageswerereturnedandhowtorefinethequery.Often
theuserknowsonlythathehassubmittedaqueryfor,
say,abearandretrievedveryfewpicturesofbearsin
return.
#0FForgeneralimagecollections,therearecurrentlyno
systemsthatautomaticallyclassifyimagesorrecog-
nizetheobjectstheycontain.
Inthispaperwepresentanewimagerepresentation,
“blobworld,”andaretrievalsystembasedonthisrepre-
sentation.Whileblobworlddoesnotexistcompletelyin
the“thing”domain,itrecognizesthenatureofimagesas
combinationsofobjects,andbothqueryingandlearningin
blobworldaremoremeaningfulthantheyarewithsimple
“stuff”representations.
WeusetheExpectation-Maximization(EM)algorithmto
performautomaticsegmentationbasedonimagefeatures.
EMiterativelymodelsthejointdistributionofcolorand
texturewithamixtureofGaussians;theresultingpixel-
clustermembershipsprovideasegmentationoftheimage.
Aftertheimageissegmentedintoregions,adescription
ofeachregion’scolor,texture,andspatialcharacteristicsis
produced.Inaqueryingtask,theusercanaccesstheregions
directly,inordertoseethesegmentationofthequeryimage
andspecifywhichaspectsoftheimageareimportanttothe
query.Whenqueryresultsarereturned,theuserseesthe
blobworldrepresentationofthereturnedimages;thisassists
greatlyinrefiningthequery.
Webeginthispaperbybrieflydiscussingthecurrentstate
ofimageretrieval.InSection2wedescribetheblobworld
representation,fromfeaturesthroughsegmentationtoregion
description.InSection3wepresentaquerysystembased
onblobworld,aswellasresultsfromqueriesinacollection
ofhighlyvariednaturalimages.
1.1Background
CurrentimagedatabasesystemsincludeIBM’sQueryby
ImageContent(QBIC)[18],Photobook[20],Virage[10],
Candid[14],andChabot[19].Thesesystemsprimarilyuse
low-levelimageproperties;severalofthemincludesomede-
greeofautomaticsegmentation.Noneofthesystemscodes
spatialorganizationinawaythatsupportsobjectqueries.
Classicalobjectrecognitiontechniquesusuallyrelyon
cleansegmentationoftheobjectfromtherestoftheimage
oraredesignedforfixedgeometricobjectssuchasmachine
parts.Neitherconstraintholdsinourcase:theshape,size,
andcolorofobjectslikecheetahsandpolarbearsarequite
variable,andsegmentationisimperfect.Clearly,classical
objectrecognitiondoesnotapply.Morerecenttechniques
[21]canidentifyspecificobjectsdrawnfromafinite(on
theorderof100)collection,butnopresenttechniqueis
effectiveatthegeneralimageanalysistask,whichrequires
bothimagesegmentationandimageclassification.
PromisingworkbyLipsonetal.[16]retrievesimages
basedonspatialandphotometricrelationshipswithinand
acrossimageregions.Littleornosegmentationisdone;the
regionsarederivedfromlow-resolutionimages.
EarlierworkhasusedtheEMalgorithmand/ortheMin-
imumDescriptionLength(MDL)principletoperformseg-
mentationbasedonmotion[1,25]orscaledintensities[26],
butEMhasnotpreviouslybeenusedonjointcolorand
texture.Relatedapproachessuchasdeterministicanneal-
ing[11]andclassicalclustering[12]havebeenappliedto
texturesegmentationwithoutcolor.
2Theblobworldimagerepresentation
Theblobworldrepresentationisrelatedtothenotionof
photographicorartisticscenecomposition.Inthesense
discussedin[23],theblobworlddescriptorsconstitutean
exampleofasummaryrepresentationbecausetheyarecon-
ciseandrelativelyeasytoprocessinaqueryingframework.
Blobworldisdistinctfromcolor-layoutmatchingasin
QBICinthatitisdesignedtofindobjectsorpartsofob-
jects.Eachimagemaybevisualizedbyanensembleof2-D
ellipses,or“blobs,”eachofwhichpossessesanumberof
attributes.Thenumberofblobsinanimageistypicallyless
thanten.Eachblobrepresentsaregionoftheimagewhich
isroughlyhomogeneouswithrespecttocolorortexture.
Ablobisdescribedbyitsdominantcolors,meantexture
descriptors,andspatialcentroidandscattermatrix.(See
Figs.3–4foravisualizationofblobworld.)
2.1Extractingcolorandtexturefeatures
Ourgoalistoassignthepixelsintheoriginalimagetoa
relativelysmallnumberofgroups,whereeachgrouprepre-
sentsasetofpixelsthatarecoherentintheircolorandlocal
textureproperties;themotivationistoreducetheamount
ofrawdatapresentedbytheimagewhilepreservingthein-
formationneededfortheimageunderstandingtask.Given
theunconstrainednatureoftheimagesinourdatabase,itis
importantthatthetoolsweemploytomeetthisgoalbeas
generalaspossiblewithoutsacrificinganundueamountof
descriptivepower.
2.1.1Color
Colorisaveryimportantcueinextractinginforma-
tionfromimages.Colorhistogramsarecommonlyused
incontent-basedretrievalsystems[18,19,24]andhave
proventobeveryuseful;however,theglobalcharacteriza-
tionispoorat,forexample,distinguishingbetweenafield
oforangeflowersandatiger,becauseitlacksinformation
abouthowthecolorisdistributedspatially.Itisimportant
togroupcolorinlocalizedregionsandtofusecolorwith
texturalproperties.
Wetreatthehue-saturation-value(HSV)colorspaceas
acone:foragivenpoint#28h;s;v#29,handsvaretheangular
andradialcoordinatesofthepointonadiskofradiusvat
heightv;allcoordinatesrangefrom0to1.Pointswith
smallvareblack,regardlessoftheirhandsvalues.The
conerepresentationmapsallsuchpointstotheapexofthe
cone,sotheyareclosetooneanother.TheCartesiancoor-
dinatesofpointsinthecone,#28svcos#282#19h#29;svsin#282#19h#29;v#29,
cannowbeusedtofindcolordifferences.Thisencoding
allowsustooperationalizethefactthathuedifferencesare
meaninglessforverysmallsaturations(thosenearthecone’s
axis).However,itignoresthefactthatforlargevaluesand
saturations,huedifferencesareperceptuallymorerelevant
thansaturationandvaluedifferences.
2.1.2Texture
Textureisawell-researchedpropertyofimageregions,
andmanytexturedescriptorshavebeenproposed,including
multi-orientationfilterbanks[17]andthesecond-moment
matrix[5,8].Wewillnotelaboratehereontheclassical
approachestotexturesegmentationandclassification,both
ofwhicharechallengingandwell-studiedtasks.Rather,we
introduceanewperspectiverelatedtotexturedescriptorsand
texturegroupingmotivatedbythecontent-basedretrieval
task.
Whilecolorisapointproperty,textureisalocal-
neighborhoodproperty.Itdoesnotmakesensetotalkabout
thetextureofzebrastripesataparticularpixelwithoutspec-
ifyinganeighborhoodaroundthatpixel.Inorderfora
texturedescriptortobeuseful,itmustprovideanadequate
descriptionoftheunderlyingtextureparametersanditmust
becomputedinaneighborhoodwhichisappropriatetothe
localstructurebeingdescribed.
2
ce
d
b
a
σ=0
(a)flow;
σ=1.5
(b)flow;
σ=2.5
(c)2-Dtexture;
σ=1.5
(d)edge
σ=0
(e)uniform
Figure1.Fivesamplepatchesfromazebraim-
age.#28a#29and#28b#29havestripes#281-D#0Dow#29ofdif-
ferentscalesandorientations,#28c#29isaregionof2-D
texture,#28d#29containsanedge,and#28e#29isauniform
region.
Thefirstrequirementcouldbemettoanarbitrarydegree
ofsatisfactionbyusingmulti-orientationfilterbankssuchas
steerablefilters;wechoseasimplermethodthatissufficient
forourpurposes.Thesecondrequirement,whichmaybe
thoughtofastheproblemofscaleselection,doesnotenjoy
thesamelevelofattentionintheliterature.Thisisunfortu-
nate,sincetexturedescriptorscomputedatthewrongscale
onlyconfusetheissue.
Inthiswork,weintroduceanovelmethodofscaleselec-
tionwhichworksintandemwithafairlysimplebutinforma-
tivesetoftexturedescriptors.Thescaleselectionmethodis
basedonedge/barpolaritystabilization,andthetexturede-
scriptorsarisefromthewindowedsecondmomentmatrix.
Botharederivedfromthegradientoftheimageintensity,
whichwedenotebyrI.WecomputerIusingthefirst
differenceapproximationalongeachdimension.Thisoper-
ationisoftenaccompaniedbysmoothing,butwehavefound
thispreprocessingoperationunnecessaryfortheimagesin
ourcollection.
Tomakethenotionofscaleconcrete,wedefinethescale
tobethewidthoftheGaussianwindowwithinwhichthe
gradientvectorsoftheimagearepooled.Thesecondmo-
mentmatrixforthevectorswithinthiswindow,computed
abouteachpixelintheimage,canbeapproximatedusing
M
#1B
#28x;y#29=G
#1B
#28x;y#29#03#28rI#29#28rI#29
T
(1)
whereG
#1B
#28x;y#29isaseparablebinomialapproximationtoa
Gaussiansmoothingkernelwithvariance#1B
2
.
Ateachpixellocation,M
#1B
#28x;y#29isa2#022symmetric
positivesemidefinitematrix;thusitprovidesuswiththree
piecesofinformationabouteachpixel.Ratherthanwork
withtherawentriesinM
#1B
,itismorecommontodealwith
itseigenstructure[2,5].Considerafixedscaleandpixel
location,let#15
1
and#15
2
(#15
1
#15#15
2
)denotetheeigenvaluesof
M
#1B
atthatlocation,andlet#1Edenotetheargumentofthe
principaleigenvectorofM
#1B
.When#15
1
islargecomparedto
#15
2
,thelocalneighborhoodpossessesadominantorientation,
asspecifiedby#1E.Whentheeigenvaluesarecomparable,
thereisnopreferredorientation,andwhenbotheigenval-
uesarenegligible,thelocalneighborhoodisapproximately
constant.
Scaleselection
Wemaythinkof#1Bascontrollingthesizeoftheintegra-
tionwindowaroundeachpixelwithinwhichtheouterprod-
uctofthegradientvectorsisaveraged.#1Bhasbeencalled
theintegrationscaleorartificialscalebyvariousauthors
[5,8]todistinguishitfromthenaturalscaleusedinlinear
smoothingofrawimageintensities.Notethat#1B=#1B#28x;y#29;
thescalevariesacrosstheimage.
1
InordertoselectthescaleatwhichM
#1B
iscomputed,
i.e.todeterminethefunction#1B#28x;y#29,wemakeuseofa
localimagepropertyknownaspolarity.
2
Thepolarityis
ameasureoftheextenttowhichthegradientvectorsina
certainneighborhoodallpointinthesamedirection.(Inthe
computationofsecondmoments,thisinformationislostin
theouterproductoperation;i.e.,gradientvectordirections
differingby180
#0E
areindistinguishable.)Thepolarityata
givenpixeliscomputedwithrespecttothedominantori-
entation#1Eintheneighborhoodofthatpixel.Foreaseof
notation,letusconsiderafixedscaleandpixellocation.We
definepolarityas
p=
jE
+
,E
,
j
E
+
+E
,
ThedefinitionsofE
+
andE
,
are
E
+
=
X
#28x;y#292?
G
#1B
#28x;y#29#5BrI#01?n#5D
+
and
E
,
=
X
#28x;y#292?
G
#1B
#28x;y#29#5BrI#01?n#5D
,
where#5Bq#5D
+
and#5Bq#5D
,
aretherectifiedpositiveandnegative
partsoftheirargument,?nisaunitvectorperpendicularto#1E,
and?representstheneighborhoodunderconsideration.We
canthinkofE
+
andE
,
asmeasuresofhowmanygradient
vectorsin?areonthe“positiveside”and“negativeside”
ofthedominantorientation,respectively.Notethatpranges
from0to1.Asimilarmeasureisusedin[15]todistinguish
aflowpatternfromanedge.
1
Strictlyspeaking,eqn.(1)isaslidinginnerproduct,notaconvolution,
since#1B#28x;y#29isspatiallyvariant.
2
Polarityisrelatedtothequadraturephaseasdiscussedin[7,9].
3
Thepolarityp
#1B
variesasthescale#1Bchanges;itsbehavior
intypicalimageregionscanbesummarizedasfollows:
Edge:Thepresenceofanedgeissignaledbyp
#1B
holding
valuescloseto1forall#1B.
Texture:Inregionswith2-Dtextureor1-Dflow,p
#1B
decays
with#1Bduetothepresenceofmultipleorientations.
Uniform:Inaconstant-intensityneighborhood,p
#1B
takes
onarbitraryvaluessincethegradientvectorshave
negligiblemagnitudesandthereforearbitraryangles.
Theprocessofselectingascaleisbasedonthederivative
ofthepolaritywithrespecttoscale.First,wecomputethe
polarityateverypixelintheimagefor#1B
k
=k=2;k=
0;1;:::;7,thusproducinga“stack”ofpolarityimages
acrossscale.Then,foreachk,thepolarityimagecomputed
atscale#1B
k
isconvolvedwithaGaussianwithstandardde-
viation2#1B
k
toyieldasmoothedpolarityimage?p
#1B
k
.For
eachpixel,weselectthescaleasthefirstvalueof#1B
k
for
whichthedifferencebetweensuccessivevaluesofpolarity
(?p
#1B
k
,?p
#1B
k,1
)islessthan2%.Inthismanner,weareper-
formingasoftversionoflocalspatialfrequencyestimation,
sincethesmoothedpolaritytendstostabilizeoncethescale
windowencompassesoneapproximateperiod.Sincewe
stopat#1B
k
=3:5,thelargestperiodwecandetectisapprox-
imately10pixels.Notethatwhentheperiodisundefined,
asisthecaseinuniformregions,theselectedscaleisnot
meaningfulandissettozero.Wedeclareapixeltobe
uniformifitsmeancontrastacrossscaleislessthan0:1.
Anothermethodofscaleselectionthathasbeenproposed
[8]isbasedonlocalizingextremaacrossscaleofaninvariant
ofM
#1B
,suchasthetraceordeterminant.Inthisalgorithm,
whichisappliedtotheproblemofestimatingtheslantand
tiltofsurfaceswithtangentialtexture,itisnecessaryto
performnaturalsmoothingatascaletiedtotheartificial
scale.Wefoundthatthisextrasmoothingcompromisedthe
spatiallocalizationabilityofourscaleselectionmethod.
Texturefeatures
Onceascale#1B
#03
isselectedforeachpixel,thatpixelis
assignedthreetexturedescriptors.Thefirstisthepolarity,
p
#1B
#03.Theothertwo,whicharetakenfromM
#1B
#03,arethe
anisotropy,definedasa=1,#15
2
=#15
1
,andthenormalized
texturecontrast,definedasc=2
p
#15
1
+#15
2
.
3
Theseare
relatedtoderivedquantitiesreportedin[8].
2.1.3Combiningcolorandtexturefeatures
Thecolor/texturedescriptorforagivenpixelconsistsof
sixvalues:threeforcolorandthreefortexture.Thethree
colorcomponentsarethecolor-conecoordinatesfoundafter
3
Ifweuseacenteredfirstdifferencekernelinthegradientcomputation,
thefactorof2makescrangefrom0to1.
spatialaveragingusingaGaussianattheselectedscale.
Thethreetexturecomponentsareac;pc,andc,computed
attheselectedscale;theanisotropyandpolarityareeach
modulatedbythecontrastinanalogytotheconstruction
ofthecolor-conecoordinates.(Recallthatanisotropyand
polarityaremeaninglessinregionsoflowcontrast.)In
effect,agiventexturedpatchinanimagefirsthasitstexture
propertiesextractedandisthenreplacedbyasmoothpatch
ofaveragedcolor.Inthismanner,thecolorandtexture
propertiesinagivenregionaredecoupled;forexample,a
zebraisagrayhorseplusstripes.
Notethatinthisformulationofthecolor/texturedescrip-
tor,orientationandselectedscaledonotappearinthefeature
vector;asaresult,groupingcanoccuracrossvariationsin
scaleandorientation.
2.2GroupingwiththeEMAlgorithm
Onceanimagehasbeenprocessedusingtheabovefea-
tureextractionscheme,theresultisalargesetof6-Dfeature
vectors,whichwemayregardaspointsina6-Dfeature
space.Inordertodividethesepointsintogroups,wemake
useoftheExpectation-Maximization(EM)algorithm[3]to
determinethemaximumlikelihoodparametersofamixture
ofKGaussiansinsidethe6-Dfeaturespace.
TheEMalgorithmisusedforfindingmaximumlikeli-
hoodparameterestimateswhenthereismissingorincom-
pletedata.Inourcase,themissingdataistheregionto
whichthepointsinthefeaturespacebelong.Weestimate
valuestofillinfortheincompletedata(the“E-Step”),com-
putethemaximumlikelihoodparameterestimatesusingthis
data(the“M-Step”),andrepeatuntilasuitablestoppingcri-
terionisreached.
ThefirststepinapplyingtheEMalgorithmistoinitialize
ameanvectorandcovariancematrixtorepresenteachof
theKgroups.Weinitializethemeanstorandomvalues
andthecovariancestoidentitymatrices.(Inearlierwork
wechosetheinitializationforEMcarefully,butwehave
foundthattheinitializationhaslittleeffectonthequality
oftheresultingsegmentation.)Theupdateschemeallows
forfullcovariancematrices;variantsincluderestrictingthe
covariancetobediagonaloraconstanttimestheidentity
matrix.Fullcovariancematricessuitourproblem,since
manyplausiblefeatureclustersrequireextrudedcovariance
shapes,e.g.theshadesofgrayalongthecolorconeaxis.
Uponconvergence,theGaussianmixtureparameterscan
beinspectedtodeterminewhatcolor/texturepropertiesare
representedbyeachcomponentofthemixture.Someex-
amplesofgroupsthatcanformincludethefollowing:
#0Fbright,bluish,andtexturelessregions(e.g.,sky)
#0Fanisotropicandnon-polarregions(e.g.,zebrahide)
#0Fgreenweak-isotropictexture(e.g.,grass)
4
WehavethusfarnotdiscussedhowtochooseK,the
numberofmixturecomponents.Ideallywewouldliketo
choosethatvalueofKthatbestsuitsthenaturalnumberof
groupspresentintheimage.Onereadilyavailablenotionof
goodnessoffitisthelog-likelihood.Giventhisindicator,we
canapplytheMinimumDescriptionLength(MDL)princi-
ple[22]toselectamongvaluesofK.Asaconsequenceof
thisprinciple,whenmodelsusingtwovaluesofKfitthe
dataequallywell,thesimplermodelwillbechosen.Forour
experiments,Krangesfrom2to5.
Onceamodelisselected,thenextstepistoperform
spatialgroupingofthosepixelsbelongingtothesame
color/texturecluster.WefirstproduceaK-levelimage
whichencodespixel-clustermembershipsbyreplacingeach
pixelwiththelabeloftheclusterforwhichitattainsthehigh-
estlikelihood(seeFig.2(d)).Toenforceasmallamountof
spatialsmoothnessinthisrepresentation,weapplya3#023
maximum-votefiltertotherawcluster-membershipimage.
Finally,weruntheresultingimagethroughaconnected-
componentsalgorithmtoproduceasetoflabeledimage
regions(seeFig.2(e)).(Alternatively,onecouldenforce
spatialconstraintsbyappendingthepixelcoordinatestothe
featurevectors,thoughweobservedthatthismethodtoo
oftenyieldsunsatisfactorysegmentations.)
2.3Describingtheregions
Westoreasimpledescriptionofeachregion’scolor,
texture,andspatialcharacteristics.(SeeFig.2(f)foravisu-
alizationofthestoredrepresentation.)
Colorandtexturedescriptors
Thetwodominantcolorswithinaconnectedcomponent
arechosenbyusingtheEMalgorithmtofitamixtureof
twoGaussiansintheHSVcone.Thedetailsareasbefore
exceptthatinthiscasewerestrictthecovariancestobea
constanttimestheidentitymatrix.Uponconvergence,the
twomeanvectorsarerecordedasthedominantcolorsinthe
region.WhenthecolordistributioninsidetheHSVcone
isinfactunimodal,bothmeansbecomenearlycoincident;
wehavenotfounditnecessarytoapplymodelselection
betweenK=1andK=2.
Foreachimageregion(blob)westorethemeantexture
descriptors(i.e.,anisotropy,orientation,contrast)andthe
toptwocolors.Wedonotstoretheselectedscale,sincewe
wanttobeinvarianttoscalesintherange#1B
k
=0;:::;3:5.
Althoughpolarityisusedforscaleselection,wediscardit
here,sinceinanytexturedoruniformregionitisapproxi-
matelyzerobyvirtueofthescaleselectionprocess.
(a)(b)
(e)(f)
(d)
(c)
Figure2.Creatingtheblobworldrepresentation.
#28a#29Originalimage.
#28b#29Scaleestimatedusingpolarity.Thevaluesrange
from#1B=0#28black#29to#1B=3:5#28white#29.
#28c#29Thesixcomponentsofthecolor#2Ftexturefea-
turevectors,eachboundedbetween0#28white#29and
1#28black#29.Top:thelocallysmoothedcolor-coneco-
ordinates.Bottom:thetexturecoordinates;from
lefttoright,ac,pc,andc.Thezebrahideishighly
anisotropicandingeneralhashightexturecontrast.
Thepolarityislargestaroundtheedges,wherethe
shadinggradientpointsprimarilyinonedirection.
#28d#29Theresultsofclusteringthesefeaturevectors
intoK=2;3;4;5groupsusingEMtolearnamix-
tureofGaussians.Pixelclustermembershipsare
shownasoneofupto#0Cvegraylevels.TheMDL
principlesuggeststhattherightmostimage#28K=5#29
providesthebestsegmentationofthedata.Most
noticeableinthissegmentationareorientedtexture,
whichisfoundthroughoutthezebrahide,anduni-
formorlow-contrasttexture,whichaccountsfor
mostofthebackground.
#28e#29ThesegmentationforK=5afterapplicationof
a3#023max-vote#0Clter.Eachconnectedcomponent
inthisimagewhichpossessesanareagreaterthan
2#25ofthetotalimageareaproducesablob.
#28f#29Theblobworldrepresentation.Eachbloben-
codessummaryinformationabouttheunderlying
color,textureandshapeproperties.
5
Spatialdescriptors
Thegeometricdescriptorsoftheblobaresimplythe
centroidcandscattermatrixSoftheblobregion;thecen-
troidprovidesanotionofposition,whilethescattermatrix
providesanelementaryshapedescription.Inthequerying
processdiscussedinSection3.1,centroidseparationsare
expressedusingEuclideandistance.Thedeterminationof
thedistancebetweenscattermatricesisbasedonthethree
quantities#5Bdet#28S#29#5D
1=2
=
p
#14
1
#14
2
,1,#14
2
=#14
1
,and#12.(#14
1
and#14
2
aretheeigenvaluesofS;#12istheargumentofthe
principaleigenvectorofS.)Thesethreequantitiesrepresent
approximatearea,eccentricity,andorientation.
3Imageretrievalbyquerying
Anyonewhohasusedasearchengine,text-basedoroth-
erwise,isfamiliarwiththerealityofunwantedmatches.
Ofteninthecaseoftextsearchesthisresultsfromtheuse
ofambiguouskeywords,suchas“bank”or“interest”[27].
Unfortunately,withimagequeriesitisnotalwayssoclear
whythingsgowrong.Unlikewithtextsearches,inwhich
theusercanseethefeatures(words)inadocument,none
ofthecurrentcontent-basedimageretrievalsystemsallows
theusertoseeexactlywhatthesystemislookingforin
responsetoasimilarity-basedquery.Simplyallowingthe
usertosubmitanarbitraryimage(orsketch)andsetsome
abstractknobswithoutknowinghowtheyrelatetothein-
putimageinparticularimpliesadegreeofcomplexitythat
searchingalgorithmsdonothave.Asaresult,aqueryfor
abearcanreturnjustaboutanyobjectunderthesunifthe
queryisnotbasedonimageregions,thesegmentationrou-
tinefailsto“find”thebearinthesubmittedimage,orthe
submittedimagecontainsotherdistinctiveobjects.Without
realizingthattheinputimagewasnotproperlyprocessed,
theusercanonlywonderwhatwentwrong.Inordertohelp
theuserformulateeffectivequeriesandunderstandtheirre-
sults,aswellastominimizedisappointmentduetooverly
optimisticexpectationsofthesystem,thesystemshould
displayitsrepresentationofthesubmittedimageandthe
returnedimages.
3.1Queryinginblobworld
Inoursystem,theusercomposesaquerybysubmittingan
imageandseeingitsblobworldrepresentation,selectingthe
blobstomatch,andfinallyspecifyingtherelativeimportance
oftheblobfeatures.Theusermayalsosubmitblobsfrom
severaldifferentimages.(Forexample,aquerymightbe
thedisjunctionoftheblobscorrespondingtoairplanesin
severalimages,inordertoprovideaquerythatlooksfor
airplanesofseveralshades.)
Wedefinean“atomicquery”asonewhichspecifiesa
particularblobtomatch(e.g.,“like-blob-1”).A“compound
query”isdefinedaseitheranatomicqueryoraconjunction
ordisjunctionofcompoundqueries(“like-blob-1andlike-
blob-2”).Inthefuture,wemightexpandthisdefinitionto
includenegation(“not-like-blob-1”)andtoallowtheuserto
specifytwoblobswithaparticularspatialrelationshipasan
atomicquery(“like-blob-1-left-of-blob-2”).
Onceacompoundqueryisspecified,wescoreeach
databaseimagebasedonhowcloselyitsatisfiesthecom-
poundquery.Thescore#16
i
foreachatomicquery(like-blob-
i)iscalculatedasfollows:
1.Findthefeaturevectorv
i
forthedesiredblobb
i
.This
vectorconsistsofthestoredcolor,texture,position,
andshapedescriptors.
2.Foreachblobb
j
inthedatabaseimage:
(a)Findthefeaturevectorv
j
forb
j
.
(b)FindtheMahalanobisdistancebetweenv
i
andv
j
usingthediagonalcovariancema-
trix(featureweights)setbytheuser:
d
ij
=
#02
#28v
i
,v
j
#29
T
Σ
,1
#28v
i
,v
j
#29
#031
2
.
(c)Measurethesimilaritybetweenb
i
andb
j
using
#16
ij
=e
,
d
ij
2
.Thisscoreis1iftheblobsare
identicalinallrelevantfeatures;itdecreasesas
thematchbecomeslessperfect.
3.Take#16
i
=max
j
#16
ij
.
Thecompoundqueryscoreforthedatabaseimageiscal-
culatedusingfuzzy-logicoperations[13].Forexample,if
thequeryis“like-blob-1and(like-blob-2orlike-blob-3),”
theoverallscorefortheimageisminf#16
1
;maxf#16
2
;#16
3
gg.
Theusercanalsospecifyaweighting#1B
i
foreachatomic
query.If“like-blob-i”ispartofadisjunctioninthecom-
poundquery,theweightedscoreforatomicqueryiis
#16
0
i
=#1B
i
#16
i
;ifitisinaconjunction,itsweightedscoreis
#16
0
i
=1,#1B
i
#01#281,#16
i
#29.
Wethenranktheimagesaccordingtooverallscoreand
returnthebestmatches,indicatingforeachimagewhich
setofblobsprovidesthehighestscore;thisinformation
willhelptheuserrefinethequery.Afterreviewingthe
queryresults,theusermaychangetheweightingoftheblob
featuresormayspecifynewblobstomatch.
3.2Results
Wehaveperformedavarietyofqueriesusingasetof
2000imagesfromthecommercialCorelstockphotocol-
lection.Weusedthefollowingcategories:AfricanSpe-
cialtyAnimals;AirShows;ArabianHorses;BaldEagles;
Bears;CanadianRockies;Caribbean;Cheetahs,Leopards,
Jaguars;China;DeathValley;Deserts;Elephants;Fields;
France;Kenya;NightScenes;Sheep;Sunsets;Tigers;and
WildAnimals.SamplequeriesareshowninFigs.3–4.
6
Figure3.Blobworldqueryfortigerimages.28#25of
thetop50imagesaretigers;tigerimagesmakeup
5#25ofthedatabase.
Figure4.Blobworldqueryforzebraimages.24#25of
thetop50imagesarezebras,whilelessthan2#25of
theimagesinthedatabasearezebras.
Figure5.Tigerqueryperformance.
Figure6.Zebraqueryperformance.
Figure7.Airplanequeryperformance.
Figure8.Sunsetqueryperformance.
7
3.2.1Comparisontocolorhistograms
Wehavecomparedourresultstothoseobtainedusing
colorhistogrammatching,followingtheprocedureofSwain
andBallard[24].Thecolorhistogramforeachimageuses
8divisionsfortheintensityaxisand16foreachoppo-
nentcoloraxis.GivenaqueryimagewithhistogramQ
i
,
eachdatabaseimage(withhistogramD
i
)receivesscore
P
i
jQ
i
,D
i
j.Asbefore,werankthedatabaseimagesand
returnthebestmatches.Figures5–8showhowtheprecision
changesasmoreimagesarereturned;theblobworldquery
resultsarebetterthanthecolorhistogramresults,exceptfor
thetigerquery.Webelievethegoodcolorhistogramresults
forthetigerqueryoccurlargelybecauseofthelimitednature
ofthetestdatabase;fewnon-tigerimagesinthiscollection
havesignificantamountsofbothorangeandgreen.Adding
picturesof,say,orangeflowersinafieldwoulddegradethe
colorhistogramperformancewithoutsignificantlyaffecting
theblobworldperformance.
4Conclusions
WehaveproposedanewmethodwhichusesExpectation-
Maximizationoncolorandtexturejointlytoprovidean
imagesegmentation,aswellasanewimagerepresentation
(blobworld)whichusesthissegmentationanditsassociated
descriptorstorepresentimageregionsexplicitly.Wehave
demonstratedaquerymechanismthatusesblobworldto
retrieveimagesandhelpguideuserqueries.
Acknowledgments
WewouldliketothankDavidForsyth,JoeHellerstein,
GingerOgle,andRobertWilenskyforusefuldiscussions
relatedtothiswork.ThisworkwassupportedbyanNSF
DigitalLibraryGrant(IRI94-11334)andNSFgraduate
fellowshipsforSergeBelongieandChadCarson.
References
[1]S.AyerandH.Sawhney.Layeredrepresentationofmotion
videousingrobustmaximum-likelihoodestimationofmix-
turemodelsandMDLencoding.InProc.Int.Conf.Comp.
Vis.,pages777–784,1995.
[2]J.Big¨un.Localsymmetryfeaturesinimageprocessing.PhD
thesis,Link¨opingUniversity,1988.
[3]A.Dempster,N.Laird,andD.Rubin.Maximumlikeli-
hoodfromincompletedataviatheEMalgorithm.J.Royal
StatisticalSoc.,Ser.B,39(1):1–38,1977.
[4]P.Enser.Queryanalysisinavisualinformationretrieval
context.J.Doc.andTextManagement,1(1):25–52,1993.
[5]W.F¨orstner.Aframeworkforlowlevelfeatureextraction.
InProc.Eur.Conf.Comp.Vis.,pages383–394,1994.
[6]D.Forsyth,J.Malik,andR.Wilensky.Searchingfordigital
pictures.ScientificAmerican,276(6):72–77,June1997.
[7]W.T.FreemanandE.H.Adelson.Thedesignanduseof
steerablefilters.IEEETrans.PatternAnalysisandMachine
Intelligence,13(9):891–906,1991.
[8]J.G?ardingandT.Lindeberg.Directcomputationofshape
cuesusingscale-adaptedspatialderivativeoperators.Int.J.
Comp.Vis.,17(2):163–191,Feb1996.
[9]G.H.GranlundandH.Knutsson.SignalProcessingfor
ComputerVision.KluwerAcademicPublishers,1995.
[10]A.GuptaandR.Jain.Visualinformationretrieval.Comm.
Assoc.Comp.Mach.,40(5):70–79,May1997.
[11]T.Hofmann,J.Puzicha,andJ.M.Buhmann.Deterministic
annealingforunsupervisedtexturesegmentation.InProc.
Int.WorkshoponEnergyMin.MethodsinComp.Vis.and
Patt.Rec.,pages213–228,1997.
[12]A.K.JainandF.Farrokhnia.Unsupervisedtexturesegmen-
tationusingGaborfilters.PatternRecognition,24(12):1167–
1186,1991.
[13]J.-S.Jang,C.-T.Sun,andE.Mizutani.Neuro-FuzzyandSoft
Computing.PrenticeHall,1997.
[14]P.Kelly,M.Cannon,andD.Hush.Querybyimageexample:
theCANDIDapproach.InSPIEProc.StorageandRetrieval
forImageandVideoDatabases,pages238–248,1995.
[15]T.LeungandJ.Malik.Detecting,localizingandgrouping
repeatedsceneelementsfromanimage.InProc.Eur.Conf.
Comp.Vis.,pages546–555,1996.
[16]P.Lipson,E.Grimson,andP.Sinha.Configurationbased
sceneclassificationandimageindexing.InProc.IEEE
Comp.Soc.Conf.Comp.Vis.andPatternRecogn.,pages
1007–1013,1997.
[17]J.MalikandP.Perona.Preattentivetexturediscrimination
withearlyvisionmechanisms.J.Opt.Soc.Am.A,7(5):923–
932,1990.
[18]W.Niblacketal.TheQBICproject:queryingimagesby
contentusingcolour,textureandshape.InSPIEProc.Stor-
ageandRetrievalforImageandVideoDatabases,pages
173–187,1993.
[19]V.OgleandM.Stonebraker.Chabot:Retrievalfromare-
lationaldatabaseofimages.IEEEComputer,28(9):40–48,
Sep1995.
[20]A.Pentland,R.Picard,andS.Sclaroff.Photobook:Content-
basedmanipulationofimagedatabases.Int.J.Comp.Vis.,
18(3):233–254,1996.
[21]J.Ponce,A.Zisserman,andM.Hebert.ObjectRepresen-
tationinComputerVision—II.Number1144inLNCS.
Springer,1996.
[22]J.Rissanen.Modelingbyshortestdatadescription.Auto-
matica,14:465–471,1978.
[23]U.ShaftandR.Ramakrishnan.Datamodelingandquerying
inthePIQimageDBMS.IEEEDataEngineeringBulletin,
19(4):28–36,Dec1996.
[24]M.SwainandD.Ballard.Colorindexing.Int.J.Comp.Vis.,
7(1):11–32,1991.
[25]Y.WeissandE.Adelson.Aunifiedmixtureframeworkfor
motionsegmentation:Incorporatingspatialcoherenceand
estimatingthenumberofmodels.InProc.IEEEComp.Soc.
Conf.Comp.Vis.andPatternRecogn.,pages321–326,1996.
[26]W.Wells,R.Kikinis,W.Grimson,andF.Jolesz.Adaptive
segmentationofMRIdata.InInt.Conf.onComp.Vis.,
VirtualRealityandRoboticsinMedicine,pages59–69,1995.
[27]D.Yarowsky.Word-sensedisambiguationusingstatistical
modelsofRoget’scategoriestrainedonlargecorpora.In
Proc.Int.Conf.Comp.Linguistics,pages454–460,1992.
8
|
|