Java锁优化

来自：thedust79 > 馆藏分类

配色：

字号：大中小

Java锁优化

2016-10-11 | 阅：转： | 分享

Java锁优化

1.同步的原理

JVM规范规定JVM基于进入和退出Monitor对象来实现方法同步和代码块同步，但两者的实现细节不一样。代码块同步是使用monitorenter和monitorexit指令实现，而方法同步是使用另外一种方式实现的，细节在JVM规范里并没有详细说明，但是方法的同步同样可以使用这两个指令来实现。monitorenter指令是在编译后插入到同步代码块的开始位置，而monitorexit是插入到方法结束处和异常处，JVM要保证每个monitorenter必须有对应的monitorexit与之配对。任何对象都有一个monitor与之关联，当且一个monitor被持有后，它将处于锁定状态。线程执行到monitorenter指令时，将会尝试获取对象所对应的monitor的所有权，即尝试获得对象的锁。

2.Java对象头

锁存在Java对象头里。如果对象是数组类型，则虚拟机用3个Word（字宽）存储对象头，如果对象是非数组类型，则用2字宽存储对象头。在32位虚拟机中，一字宽等于四字节，即32bit。

长度内容说明

32/64bit MarkWord 存储对象的hashCode或锁信息等

32/64bit ClassMetadataAddress 存储到对象类型数据的指针

32/64bit Arraylength 数组的长度（如果当前对象是数组）

Java对象头里的MarkWord里默认存储对象的HashCode，分代年龄和锁标记位。32位JVM的MarkWord的默认存储结构如下：

25bit 4bit

1bit

是否是偏向锁

2bit

锁标志位

无锁状态对象的hashCode 对象分代年龄 0 01

在运行期间MarkWord里存储的数据会随着锁标志位的变化而变化。MarkWord可能变化为存储以下4种数据：

3.几种锁的类型

线程的阻塞和唤醒需要CPU从用户态转为核心态，频繁的阻塞和唤醒对CPU来说是一件负担很重的工作。

JavaSE1.6为了减少获得锁和释放锁所带来的性能消耗，引入了“偏向锁”和“轻量级锁”，所以在JavaSE1.6里锁一共有四种状态，无锁状态，偏向锁状态，轻量级锁状态和重量级锁状态，它会随着竞争情况逐渐升级。

锁可以升级但不能降级，意味着偏向锁升级成轻量级锁后不能降级成偏向锁。这种锁升级却不能降级的策略，目的是为了提高获得锁和释放锁的效率。

3.1偏向锁

Hotspot的作者经过以往的研究发现大多数情况下锁不仅不存在多线程竞争，而且总是由同一线程多次获得。偏向锁的目的是在某个线程获得锁之后，消除这个线程锁重入（CAS）的开销，看起来让这个线程得到了偏护。

偏向锁的进一步理解

偏向锁的释放不需要做任何事情，这也就意味着加过偏向锁的MarkValue会一直保留偏向锁的状态，因此即便同一个线程持续不断地加锁解锁，也是没有开销的。

另一方面，偏向锁比轻量锁更容易被终结，轻量锁是在有锁竞争出现时升级为重量锁，而一般偏向锁是在有不同线程申请锁时升级为轻量锁，这也就意味着假如一个对象先被线程1加锁解锁，再被线程2加锁解锁，这过程中没有锁冲突，也一样会发生偏向锁失效，不同的是这回要先退化为无锁的状态，再加轻量锁，如图：

另外，JVM对那种会有多线程加锁，但不存在锁竞争的情况也做了优化，听起来比较拗口，但在现实应用中确实是可能出现这种情况，因为线程之前除了互斥之外也可能发生同步关系，被同步的两个线程（一前一后）对共享对象锁的竞争很可能是没有冲突的。对这种情况，JVM用一个epoch表示一个偏向锁的时间戳（真实地生成一个时间戳代价还是蛮大的，因此这里应当理解为一种类似时间戳的identifier），对epoch，官方是这么解释的：

Asimilarmechanism,calledbulkrebiasing,optimizessituationsinwhichobjectsofaclassarelockedandunlockedbydifferentthreadsbutneverconcurrently.Itinvalidatesthebiasofallinstancesofaclasswithoutdisablingbiasedlocking.Anepochvalueintheclassactsasatimestampthatindicatesthevalidityofthebias.Thisvalueiscopiedintotheheaderworduponobjectallocation.Bulkrebiasingcanthenefficientlybeimplementedasanincrementoftheepochintheappropriateclass.Thenexttimeaninstanceofthisclassisgoingtobelocked,thecodedetectsadifferentvalueintheheaderwordandrebiasestheobjecttowardsthecurrentthread.

偏向锁的获取

当一个线程访问同步块并获取锁时，会在对象头和栈帧中的锁记录里存储锁偏向的线程ID，以后该线程在进入和退出同步块时不需要花费CAS操作来加锁和解锁，而只需简单的测试一下对象头的MarkWord里是否存储着指向当前线程的偏向锁，如果测试成功，表示线程已经获得了锁，如果测试失败，则需要再测试下MarkWord中偏向锁的标识是否设置成1（表示当前是偏向锁），如果没有设置，则使用CAS竞争锁，如果设置了，则尝试使用CAS将对象头的偏向锁指向当前线程。

偏向锁的撤销

偏向锁使用了一种等到竞争出现才释放锁的机制，所以当其他线程尝试竞争偏向锁时，持有偏向锁的线程才会释放锁。偏向锁的撤销，需要等待全局安全点（在这个时间点上没有字节码正在执行），它会首先暂停拥有偏向锁的线程，然后检查持有偏向锁的线程是否活着，如果线程不处于活动状态，则将对象头设置成无锁状态，如果线程仍然活着，拥有偏向锁的栈会被执行，遍历偏向对象的锁记录，栈中的锁记录和对象头的MarkWord要么重新偏向于其他线程，要么恢复到无锁或者标记对象不适合作为偏向锁，最后唤醒暂停的线程。下图中的线程1演示了偏向锁初始化的流程，线程2演示了偏向锁撤销的流程。

偏向锁的设置

关闭偏向锁：偏向锁在Java6和Java7里是默认启用的，但是它在应用程序启动几秒钟之后才激活，如有必要可以使用JVM参数来关闭延迟-XX：BiasedLockingStartupDelay=0。如果你确定自己应用程序里所有的锁通常情况下处于竞争状态，可以通过JVM参数关闭偏向锁-XX:-UseBiasedLocking=false，那么默认会进入轻量级锁状态。

3.2自旋锁

线程的阻塞和唤醒需要CPU从用户态转为核心态，频繁的阻塞和唤醒对CPU来说是一件负担很重的工作。同时我们可以发现，很多对象锁的锁定状态只会持续很短的一段时间，例如整数的自加操作，在很短的时间内阻塞并唤醒线程显然不值得，为此引入了自旋锁。

所谓“自旋”，就是让线程去执行一个无意义的循环，循环结束后再去重新竞争锁，如果竞争不到继续循环，循环过程中线程会一直处于running状态，但是基于JVM的线程调度，会出让时间片，所以其他线程依旧有申请锁和释放锁的机会。

自旋锁省去了阻塞锁的时间空间（队列的维护等）开销，但是长时间自旋就变成了“忙式等待”，忙式等待显然还不如阻塞锁。所以自旋的次数一般控制在一个范围内，例如10,100等，在超出这个范围后，自旋锁会升级为阻塞锁。

对自旋锁周期的选择上，HotSpot认为最佳时间应是一个线程上下文切换的时间，但目前并没有做到。经过调查，目前只是通过汇编暂停了几个CPU周期，除了自旋周期选择，HotSpot还进行许多其他的自旋优化策略，具体如下：

如果平均负载小于CPUs则一直自旋

如果有超过(CPUs/2)个线程正在自旋，则后来线程直接阻塞

如果正在自旋的线程发现Owner发生了变化则延迟自旋时间（自旋计数）或进入阻塞如果CPU处于节电模式则停止自旋

自旋时间的最坏情况是CPU的存储延迟（CPUA存储了一个数据，到CPUB得知这个数据直接的时间差）

3.3轻量级锁

轻量级锁加锁

线程在执行同步块之前，JVM会先在当前线程的栈桢中创建用于存储锁记录的空间，并将对象头中的MarkWord复制到锁记录中，官方称为DisplacedMarkWord。然后线程尝试使用CAS将对象头中的MarkWord替换为指向锁记录的指针。如果成功，当前线程获得锁，如果失败，则自旋获取锁，当自旋获取锁仍然失败时，表示存在其他线程竞争锁(两条或两条以上的线程竞争同一个锁)，则轻量级锁会膨胀成重量级锁。

轻量级锁解锁

轻量级解锁时，会使用原子的CAS操作来将DisplacedMarkWord替换回到对象头，如果成功，则表示同步过程已完成。如果失败，表示有其他线程尝试过获取该锁，则要在释放锁的同时唤醒被挂起的线程。下图是两个线程同时争夺锁，导致锁膨胀的流程图。

3.4重量级锁

重量锁在JVM中又叫对象监视器（Monitor），它很像C中的Mutex，除了具备Mutex互斥的功能，它还负责实现了Semaphore的功能，也就是说它至少包含一个竞争锁的队列，和一个信号阻塞队列（wait队列），前者负责做互斥，后一个用于做线程同步。

4.锁的优缺点对比

锁优点缺点适用场景

偏向锁加锁和解锁不需要额外的消耗，和执行非同步方法比仅存在纳秒级的差距如果线程间存在锁竞争，会带来额外的锁撤销的消耗适用于只有一个线程访问同步块场景

轻量级锁竞争的线程不会阻塞，提高了程序的响应速度如果始终得不到锁竞争的线程使用自旋会消耗CPU 追求响应时间,锁占用时间很短

重量级锁线程竞争不使用自旋，不会消耗CPU 线程阻塞，响应时间缓慢追求吞吐量,锁占用时间较长

BiasedLockinginHotSpot

ByDaveonAug18,2006

ThebiasedlockingschemeinHotSpotarosefromthefollowingpaperauthoredbymyself,MarkMoir,andBillScherer.Briefly,thecompare-and-swap(CAS)operationsnormallyusedtoacquireaJavamonitorincursconsiderablelocallatencyintheexecutingCPU.Sincemostobjectsarelockedbyatmostonethreadduringtheirlifetime,weallowthatthreadtobiasanobjecttowarditself.Oncebiased,thatthreadcansubsequentlylockandunlocktheobjectwithoutresortingtoexpensiveatomicinstructions.Obviously,anobjectcanbebiasedtowardatmostonethreadatanygiventime.(Werefertothatthreadasthebiasholdingthread).Ifanotherthreadtriestoacquireabiasedobject,however,weneedtorevokethebiasfromtheoriginalthread.(Atthisjuncturewecaneitherrebiastheobjectorsimplyreverttonormallockingfortheremainderoftheobject''slifetime).Thekeychallengeinrevocationistocoordinatetherevokerandtherevokee(thebiasholdingthread)--wemustensurethattherevokeedoesn''tlockorunlocktheobjectduringrevocation.Asnotedinthepaper,revocationcanbeimplementedinvariousways-signals,suspension,andsafepointstonameafew.Critically,forbiasedlockingtoproveprofitable,thebenefitconferredfromavoidingtheatomicinstructionsmustexceedtherevocationcosts.Anotherequivalentwaytoconceptualizebiasedlockingisthattheoriginalownersimplydefersunlockingtheobjectuntilitencounterscontention.Biasedlockingbearssomesemblancetotheconceptoflockreservation,whichiswell-describedinKawachiya''sdissertation.It''salsosimilarinspiritto"opportunisticlocks"(oplocks)foundinvariousfilesystems;theschemedescribeinMikeBurrows''s"Howtoimplementunnecessarymutexes"(InComputerSystems:Theory,TechnologyandApplications,Dec.2003);orChristianSiebert''sone-sidedmutualexclusionprimitives.

BiasedlockingisstrictlyaresponsetothelatencyofCAS.It''simportanttonotethatCASincurslocallatency,butdoesnotimpactscalabilityonmodernprocessors.AcommonmythisthateachCASoperation"goesonthebus",and,giventhattheinterconnectisafixedacontendedresource,useofCAScanimpairscalability.That''sfalse.CAScanbeaccomplishedlocally--thatis,withnobustransactions--ifthelineisalreadyinM-state.CASisusuallyimplementedontopoftheexistingMESIsnoop-basedcachecoherenceprotocol,butintermsofthebus,CASisnodifferentthanastore.Forexampleletssayyouhaveatrue16-waysystem.YoulaunchathreadthatCASes1billiontimestoathread-privatelocation,measuringtheelapsedtime.Ifyouthenlaunch16threads,allCASingtothread-privatelocations,theelapsedtimewillbethesame.Thethreadsdon''tinterferewithorimpedeeachotherinanyway.Ifyoulaunch16threadsallCASingtothesamelocationyou''lltypicallyseeamassiveslow-downbecauseofinterconnecttraffic.(ThesoleexceptiontothatclaimisSun''sNiagara,whichcangracefullytoleratesharingonamassivescaleastheL2$servesastheinterconnect).IfyouthenchangethatCAStoanormalstoreyou''llalsoseeasimilarslow-down;asnotedbefore,intermsofcoherencybustraffic,CASisn''tappreciablydifferentthananormalstore.SomeofthemisinformationregardingCASprobablyarisesfromtheoriginalimplementationoflock:cmpxchg(CAS)onIntelprocessors.Thelock:prefixcausedtheLOCK#signaltobeasserted,acquiringexclusiveaccesstothebus.Thisdidn''tscaleofcourse.Subsequentimplementationsoflock:cmpxchgleveragecachecoherencyprotocol--typicallysnoop-basedMESI--anddon''tassertLOCK#.Notethatlock:cmpxchgwillstilldriveLOCK#inoneextremelyexoticcase--whenthememoryaddressismisalignedandspans2cachelines.Finally,wecansafelyusecmpxchgonuniprocessorsbutmustuselock:cmpxchgonmultiprocessorsystems.Lock:cmpxchgincursmorelatency,butthenagainit''safundamentallydifferentinstructionthatcmpxchg.Lock:cmpxchgisserializing,providingbidirectionalmfence-equivalentsemantics.(Fenceorbarrierinstructionsareneverneededforuniprocessors)ThisfactmightalsohavecontributedtothemyththatCASismoreexpensiveonMPsystems.Butofcourselock:cmpxchgincursnomorelatencyona2xsystemthanonan8xsystem.

Whiledigressingonthetopicofbusoperations,letssaythataloadisfollowedcloselyinprogramorderbyastoreorCAStothesamecacheline.Ifthecachelineisnotpresentintheissuingprocessortheloadwillgeneratearequest-to-sharetransactiontogetthelineinS-stateandthestoreorCASwillresultinasubsequentrequest-to-owntransactiontoforcethelineintoM-state.This2ndtransactioncanbeavoidedonsomeplatformsbyusingaprefetch-for-writeinstructionbeforetheload,whichwillforcethelinedirectlyintoM-state.It''salsoworthmentioningthatontypicalclassicSMPsystems,pureread-sharingisveryefficient.Alltherequestingprocessorscanhavethecacheline(s)replicatedintheircaches.Butifevenoneprocessoriswritingtoasharedcacheline,thosewriteswillgenerateconsiderablecachecoherencetraffic;assumingawrite-invalidatecachecoherencepolicy(asopposedtowrite-update)thereaderswillcontinuallyre-loadthecachelinejusttohaveitsubsequentlyinvalidatedbythewriter(s).Putdifferently,loadstoacachelinearecheapifotherprocessorsareloadingfrombutnotstoringtothatsameline.Storesarecheaponlyifnootherprocessorsareconcurrentlystoringtoorloadingfromthatsameline.(Wecandrawanimpreciseanalogybetweencachecoherencyprotocolsandread-writelocksinthatforagivencachelinetherecanonlybeonewriteratanygiventime.That''stheprocessorwiththelineinM-state.Multiplereadersofthelineallowedandofcoursethelifetimeofareadercan''toverlapawrite.Unliketraditionalread-writelocks,however,thecachecoherencyprotocolallowswriterstoinvalidatereaders,sowecan''tpushtheanalogytoofar.Inatwistedsense,thecoherencyprotocolisobstruction-free).Coherencybandwidthisafixedandcontendedglobalresource,soinadditiontolocallatency,excessivesharingtrafficwillimpactoverallscalabilityandimpedetheprogressofthreadsrunningonotherprocessors.Aso-calledcoherencymiss--forexamplealoadonprocessorP1whereprocessorP2hasthecachelineinM-state--istypicallymuchslowerthananormalmiss(exceptonNiagara).Recalltoo,www.baiyuewang.netthatacquiringalockinvolvesastore(CAS,really)tothelockmetadata,soifyouhavethreadsonprocessorsP1andP2iterating,acquiringthesame,thelockacquisitionitselfwillgeneratecoherencytrafficandresultinthecache"sloshing"oftheline(s)holdingthemetadata.Generally,excessivecoherencytrafficistobeavoidedonclassicSMPsystems.Butasusual,there''sanexceptiontoanyrule,andinthiscasethatexceptionisSun''sNiagara,whichcantoleratesharinggracefully.WeshouldnowreturnbacktothetopicofCASlatency,asthatissuemotivatestheneedforbiasedlocking.

CASandthefence(AKAbarrierorMEMBAR)instructionsaresaidtobeserializing.Onmultiprocessorsystemsserializinginstructionscontroltheorderinwhichwhichstoresareloadsarevisiblewithrespecttomemory.Suchinstructionsareoftenimplementedinacrudefashiononmostcurrentprocessors.Typically,aserializinginstructionwillbringtheCPUtoanearhalt,killandinhibitanyout-of-order(OoO)instructions,andwaitforthelocalstorebuffertodrain.That''sOKforsimpleprocessors,butitresultsinconsiderablelossofperformanceonmoresophisticatedOoOprocessors.There''snofundamentalreasonwhyCASorfenceshouldbeslow.Untilrecently,thetrendwastowardworseCASandfenceperformance,butitappearsthatthistrendmaybereversing.IBMmadeconsiderableimprovementsinfencelatencybetweenthePower4andPower5.Likewise,Intelmaderemarkableimprovementsforbothlock:cmpxchgandmfencelatencyintherecentCore2processors.

Inthecaseofbiasedlocking--sinceweknowthattheobjectisunshared--it''sreasonabletoassumethattheactionsofthebiasholdingthreadwouldkeepcachelinesunderlyingtheobject''slockwordanddatainM-orE-state

Interestingly,ifaamemorybarrier,or"fence"instructionissufficientlyfasterthanCASwecanimplementbiasedlockingrevocationwithaDekker-likemechanism.Ifyou''reinterestedindelvingdeeperyoumightfindthefollowingofinterest:AsymmetricDekkerSynchronization.(ThatsamedocumentalsodescribesanoptimizationusedinHotSpotwherewewereabletoeliminateaMEMBARfromthestatetransitioncodepathonmultiprocessorsystems).

KennethRussellandDavidDetlefs(formerlyofSunLabs)haveapaperappearinginOOPSLA''06--EliminatingSynchronization-RelatedAtomicOperationswithBiasedLockingandBulkRebiasing--wheretheydescribeaschemetoreducethecostofrevocation.

It''simportanttonotethatbiasedlockingispermittedunderthetheJavaMemoryModelasclarifiedbyJSR133.RefertotheJavaLanguageSpecification,3rdedition,Chapter17,Section4.DougLea''sJSR-133CookbookisanexcellentresourceforJVMdevelopersorcuriousJavaprogrammers.

Finally,there''snotcurrentlyspaceinthemarkwordtosupportbothanidentityhashCode()valueaswellasthethreadIDneededforthebiasedlockingencoding.Giventhat,youcanavoidbiasedlockingonaper-objectbasisbycallingSystem.identityHashCode(o).Iftheobjectisalreadybiased,assigninganidentityhashCodewillresultinrevocation,otherwise,theassignmentofahashCode()willmaketheobjectineligibleforsubsequentbiasedlocking.Thispropertyisanartifactofourcurrentimplementation,ofcourse.

Asanaside,thelatencyofatomicread-modify-writeoperationssuchasCASwasinpartwhatmotivatedmyselfandAlexGarthwaitetodevelopourMostlyLock-FreeMalloc(ISMM2002)mechanism,whichusesprocessor-localallocationwithoutrequiringatomicsorthebindingofthreadstoprocessors.

SeealsoUS07814488(2002).

Category:GeneralTags:nonePermanentlinktothisentry

?Foraglimpseatthe...|Main|Letssayyou''re...?

Comments:

Thanksforwritingthisup,hopethere''smoretocome.

PostedbyMikaelGueckonAugust19,2006at12:39AMEDT#

btw,ifyouknowRussianthereareseveralotherpostsrelatedtoJIToptimization(includingBiasedLocking)here:http://blogs.sun.com/vmrobot/category/Compilers

PostedbykatyaonSeptember03,2006at01:14AMEDT#

Hi,

Ireadthisarticlewithinterest.BiasedlockingwasintroducedinJDK1.6andimproveduncontendedsynchronizationbyremovingatomiclocks.HoweverIhavereadelsewherethatAtomiclocks(AtomicIntergeretc)areusedinthejava.util.concurrentlibsandimproveperformance.Canyouexplaintheconflictplease.

Thankyou

PostedbyandyblackonJune22,2009at05:25AMEDT#

HelloAndy,

Insteadof''atomiclocks''letssay''atomicinstructions'',which,onx86typicallyhaveaLOCK:prefix(exceptXCHG).

AtomicIntegercouldlegitimatelybeimplementedinanumberofways,suchasvialocks("synchronized"oraj.u.c.ReentrantLock)orviaatomicread-modify-writeinstructionssuchasCAS/LOCK:CMPXCHGetc.WehappentousethelatterinHotSpot.Thatgivesusanon-blockingimplementationbutatthecostofanatomicinstruction.(Atomics,BTW,arebecomingcheaperasprocessordesignersputmoreeffortintheimplementation.Intelhasclearlypaidcloseattentiontoatomiclatency--it''smuchbetteronNehalem,forinstance).Thenon-blockingpropertyisgenerallydesirable.

Hypothetically,ifweimplementedAtomicIntegerwithsynchronizedthenitcouldalsobenefitfrombiasedlockinginthecasewhereonlyonethreadeveraccessedagiveninstance.Butbroadly,weexpectthej.u.cprimitivestobeusedinthecasewherethere''srealsharing.Thatis,they''reoptimizedforthecaseofrealconcurrentaccess,whichistherighttrade-off.

Regards,-Dave

PostedbyguestonJune22,2009at05:54AMEDT#

IsAtomicInteger.getAndIncrement()guaranteedtocompleteonallhwplatforms.ThewayIseeitforhwtoprovidethisguaranteelock:addAndIncrementonthatplatformmustnotallowthreadcontextswitchuntilitcompletes.Soona16threadsystemletsassumeall16threadswentforthisatthesametimetheyareallguaranteedtocompletebeforeotherthreadsrunonthosehwthreads.

PostedbybankkusonDecember11,2009at08:29PMEST#

Theprogresspropertiesaren''tspecifiedastheydependonthecapabilitiesoftheJVMandandthetheunderlyingplatform.Onanx86withLOCK:XADDitcouldbewait-free.OnotherplatformswithCASit''llbelock-free(LD...CASinaloop).OnsomeplatformsandJVMstheimplementationmaybelock-based.

Regardingcontextswitching,ifyouuseLOCK:XADDorsomeothersimilaratomictheneithertheatomicinstructioncompletesoritdoesn''t.IftheimplementationusesaLD...CASorLL..SCidiomthenwecanbepreemptedbetweentheLL/LDorCAS/SC.That''sharmless,however,asatworstwe''lljustretry.

PostedbydavediceonDecember13,2009at12:51AMEST#

HiDave

Thanksfortheclarification.CurioushowweimplementFairlocksinJUC?Clearlyticketlockingcannotbeimplementedonhwthatdoesnthaveawait-freefetchAndIncrement.

PostedbybankkusonDecember13,2009at08:22AMEST#

HiBank,goodquestion.

Thefairlocksare"fair"inthesensethatsuccessionismanagedbypassingownershipdirectlytoanotherthread,insteadofsimplyreleasingthelockandandusingcompetitivesuccession,whichcanallowso-called"barging"byotherthreadstryingtogainthelock.Wetypicallyuseanexplicitlistofthreadsblockedonalockaswewanttounparkthespecificsuccessor.(Wedothisbecausewecan''tallowunboundedspinning).

IIRC,fairlocksuseCASinthe"doorway"protocol(atleastonourreferenceplatforms),soit''spossiblethatathreadthatarrived1stinthedoorwaymightultimatelyenqueuebehindsomeotherthreadthatarrivedmorerecentlyatthelockdoorway.Soit''snotstrictlyfairinthatsense.

Regards

Dave

PostedbyDavidDiceonDecember14,2009at02:34AMEST#

InterestingIhadasimilarimplementationforaFairLock(orIthoughtwasfair)whereIwouldntreleasethelockvarinsteaddolockhandover.ThiswaytryLockstaysfairtooiftherearequeuedcontenders.

WhatIrealizedisthatwithhandovertolistofcontenderstheresstillaproblem.Addingathreadtothelistitselfisnotwait-free.PerhapstheJDKimplementationdoesntsufferfromthis.

BtwthisbloghasbeenveryeducationalformewishIhadfoudnitearlier:-)

PostedbybankkusonDecember14,2009at06:43AMEST#

HiBanks,

MCSandCLHaregoodalgorithms(atleastforstartingpoints)ifyouwanttoconstructalockwithawait-freedoorway.CLHarguablyhasbetterexecutionpropertiesbutalsorequiresasparedummynodeperlockinstance.

Regards,-Dave.

p.s.,someofthelockqueuesfor"synchronized"etc.inhotspotnativecodearelock-free.Iimplementedthemassuchtoavoid"doublecontention"whichcanoccurifwehaveahotlockandwheresome"inner"lockthatprotectsthequeuesisalsohot.

PostedbydavediceonDecember14,2009at07:00AMEST#

Dave-IreadwithinterestyourQRLpaperandtheotherassociatedmaterialsinyourpost.Forthosewhoareinterested,youcangetattheOOPSLApaperdirectlyfromSun,incaseyoudon''thaveanACMaccount:

http://java.sun.com/javase/technologies/hotspot/publications/biasedlocking_oopsla2006_wp.pdf

Iamspecificallyinterestedintheschemeyoumentionin3.2oftheQRLpaperregardingusingahalf-wordstorefollowedbyafull-wordreadtoensurethereadcannotbere-orderedbeforethewrite.

Inparticular,I''vebeentryingtodetermineifthiswillworkunderthex86memorymodel.IntelprovidesaprettydetaileddescriptionoftheIA-32behaviorin:

http://www.intel.com/Assets/PDF/manual/253668.pdf

...section8.2-buttheredoesn''tseemtobeenoughinformationtodeterminewhetherare-orderingthatwouldbreaktheQRLsemanticsispossible.

Forexample,under"8.2.3.5Intra-ProcessorForwardingIsAllowed",wehaveascenariowhereaCPUexecutingawritemayseethisright"before"writesofotherCPUs-andthatthoseotherCPUswillseetheirwritesfirst.ThisseemsdangerousfortheQRLtrick,sinceitmaybepossibleforthebiasedthreadtoseeitsownhalf-wordwritebeforeasuccessfulrevokingCASbyanotherCPUisseen,buttherevokingCPUmayseetheoppositecase-CASsucceededandnohalf-wordwriteyet.

ThisleadstotheoddcasewhereboththewriteandCAShavesucceeded,whichisclearlyimpossibleundersequentialconsistency,oranymodelwherememoryprocessesthroughaseriesofwell-definedstates.ItisnotcleartomewhetheritisallowedundertheIntelmodelhowever,because(a)theydon''tdiscussedextensivelytherelationshipbetweenregular(noLOCK/fence)reads/storesandinterlockedoperationsondifferentCPUs(theydopointoutthattheycan''tberelativelyreorderedonasingleCPU),or(b)whethertworead/storesareconsideredtobeat"sameaddress"iftheoperandswww.wang027.comhavedifferentlengths,butoneiscontainedwiththeotherand(c)don''tspecificallydiscussCASwhichisacompound,conditionalread-then-writeoperation-andwhetheritispossibleforanapparentre-orderingofastoreononeCPU(duetointra-processorforwarding)thatwouldhavecausedaCAStofail,toallowthetoCAStosucceed,butalsoletthelocalCPUseethestore.

Wellthatwasamouthful!Anyinsightyouhavewouldbeappreciated.ItismyunderstandingthatthecurrentmechanismintheJVMisnottousestoresatall,favoringinsteadaninvasiverevocationwhichretroactivelymodifiesthelockingrecords,soIguessthistopicdoesn''tdirectlythere(butI''malsocuriouswhythisapproachwasfavoredoveronewhichusestheregular-storetrickyoudescribe,whichispresumablyquitecheap).

PostedbyTravisDownsonJanuary04,2010at08:09AMEST#

HiTravis,

Thecollocationtrick(sometimesreferredto"wordtearing",althoughthattermisslightlyoverloaded),whileinteresting,isn''tguaranteedsafe.Asapracticalconcern,ifyouexecuteasubwordSTfollowedbyafullwordLDthatcoversthatstored-toaddress,mostprocessorswillstall,allowingthestoretodrainbeforetheloadissatisfied.(We''reassumingafullwordstoretotheaddressisn''talreadyinthestorebuffer,inwhichcaseanimplementation\might\mergethefullwordstorewiththesubsequentsubwordstoreandthensatisfytheloadbylookaside).InotherwordsonmostimplementationstheST;LDXtrickhasMEMBAR#storeloadsemantics,butthispropertyisn''tformallydefinedbythearchitecture,sowe''dneverconsiderusingitinaproduct.

AndinfactsincethiswaswrittenAMDandIntelhaveclarifiedtheirmemorymodels,clearlyindicatingthatsuchunsavorytricksaren''tsafeinthelongterm.(Perhapsitworkstoday,butitmaynotontomorrow''sprocessor).

Furthermore,theperformanceofthetechnique,evenifitweresomehowdeemedsafe,isn''tprofoundlybetterthanatraditionalfence,astheprocessorneedstostall.

Seealsohttp://blogs.sun.com/dave/entry/instruction_selection_for_volatile_fencesforsomemoreuptodateinformationonfence/barrier/membarcosts.

Asyoupointout,inourcurrentimplementationinhotspotweuseasafepointrendezvoustorevokebias.That''sratherheavy-handedbutit''sentirelysafe.

Regards,Dave

PostedbyDavidDiceonJanuary04,2010at02:28PMEST#

Thanks,Dave,foryourquickandcomprehensiveresponse.

Iguessthewarningin8.1.2.2ofthecurrentIntelguideisalongtheselines:

"Softwareshouldaccesssemaphores(sharedmemoryusedforsignallingbetween

multipleprocessors)usingidenticaladdressesandoperandlengths.Forexample,if

oneprocessoraccessesasemaphoreusingawordaccess,otherprocessorsshould

notaccessthesemaphoreusingabyteaccess."

Notrigorousbyanymeans(whytheystartreferringto"semaphores"hereratherthansimplememorylocationaccessasintheremainderofthesectionisbeyondme)-butahintthatthisso-called"wordtearing"maybeano-no.

PostedbyTravisDownsonJanuary04,2010at05:30PMEST#

HiTravis,Iagree--there''sagoodbitofinterpretationrequiredwhenreadingthevariousmemorymodeldocuments.BTW,Ihighlyrecommendthefollowing:http://www.cl.cam.ac.uk/~so294/documents/draft-scTSO.pdf.

Regards,Dave

PostedbyDavidDiceonJanuary05,2010at12:33AMEST#

Iwasre-readingyourpapertoday,andlookingattheHotSpotsourcecode,andwascuriousthatwhenrevokingabiasduetoalockattemptonanobjectbyathreadotherthanthebias-owneranddiscoveringthebias-ownerhastheobjectlocked,thatinadditiontomakingthebias-owner''sthreadappeartohavelightweightlockedtheobject,youdon''talsoinflatetheobject''smonitor.

Isthistoavoidmixinglockingandrevokingconcerns,orperhapsinthehopethatthebias-ownermightunlocktheobjectbeforetherevokingthreadactuallyruns.

Itjustseemslikethatsinceyou''realreadyatasafepointyoucouldavoidsomeofthespin-nastinessassociatedwithinflating.

PostedbygrahamsandersononJanuary06,2010at03:07PMEST#

HiGraham,

You''recorrectthatifobjectOisbiasedtoT1,T1hasOlogicallylocked,andT2triestolockOresultinginrevocationofbiasT1''sbiasfromO,thenwecouldtransitionOdirectlytoinflatedstate.Broadly,though,weweretryingtokeepthestatetransitionsassimpleaspossible.Asyoupointout,forthesakeofcodestewardshipsometimesit''sbettertoavoidmixingconcerns.Also,theinflationpath,asyou''dexpect,isburriedintheslow-pathcontended"enter"mechanism.It''sprudenttoleaveitthatway.Also,thedominantcaseisthatO,whilebiased,islogicallyunlocked.

Regards,-Dave

PostedbyDavidDiceonJanuary07,2010at07:07AMEST#

SomeinfoonSystem.identityHashCodeandusingittoavoidthebiasedlockingcausesanotherproblemw/thedefaultsettings:itusesos:random()thatcontendsonasingletonstate.

Thelatterresultsintoalotofsharing.-XX:hashCode=-1disablesitandreliesonMarsaliga''srandomthathasathreadlocalstate,i.e.concurrentfriendly.

PostedbyguestonApril04,2013at07:16AMEDT#

SomeinfoonSystem.identityHashCodeandusingittoavoidthebiasedlockingcausesanotherproblemw/thedefaultsettings:itusesos:random()thatcontendsonasingletonstate.

Thelatterresultsintoalotofsharing.-XX:hashCode=-1disablesitandreliesonMarsaliga''srandomthathasathreadlocalstate,i.e.concurrentfriendly.

PostedbybestsssonApril04,2013at07:17AMEDT#

ThanksforthecommentregardingidentityHashCode.Iimplementedthealternative-XX:hashCode=NvariantsafewyearsagobuttheJ2SEteamhasn''thadtimetoswitchtothedefaults.Asnoted,thedefaultisracy--that''slegalandarguablybenign--butundesirable.Theglobalstateforos::random()isalsoacoherencehotspot.WecanseetheimpactofthiswithasimplebenchmarkwhereotherwiseindependentthreadsassignhashCodes(weassignlazily)atahighrate.Thecoherencetrafficontheos::random()variablesquicklychokesscalability.Usingalternativeschemesprovidesrelief.That''spreciselywhyIprovidedtheMarsagliashift-xorPRNG.

Regards,-Dave

PostedbyguestonJune03,2013at10:13AMEDT#

Hello,

thanksforthisnicearticle.

IhavequestiontoQuicklyReacquirableLocks.Doesiunderstandit

wellthatThreadperformingRevokewillspinlockonline47untilthebiasthreadreleasesthelock,possiblyafterlongtime?

PostedbyjhronApril26,2015at06:01PMEDT#

Regardingthespinatline47,therevokerneedstowaitforthelocktobereleased.Forthepurposesofexplicationwejustusedtest-and-test-and-setlockswithunboundedspinning.Youwouldn''tdothatinpractice,butit''sfineforthepurposesofexplainingthebiasingmechanism.Regards,-Dave

PostedbyguestonApril27,2015at08:52AMEDT#

献花(0)

(本文系thedust79首藏)

类似文章 更多

发表评论：