garfage / 我的图书馆 / (翻译)《黑客》2010年08月第三期之理解...

0 0

   

(翻译)《黑客》2010年08月第三期之理解和应用操作转换(一)

2010-09-10  garfage

Understanding and Applying Operational Transformation

理解和应用操作转换

译者:hlh59

Almost exactly a year ago, Google made one of the most remarkable press releases in the Web 2.0 era. Of course, by “press release”, I actually mean keynote at their own conference, and by “remarkable” I mean potentially-transformative and groundbreaking. I am referring of course to the announcement of Google Wave, a real-time collaboration tool which has been in open beta for the last several months.

大概一年前,Google发布了Web 2.0时代最引人注目的新闻之一。当然,我说的“新闻发布”真正意思是指他们自己会议的幻灯片演示,而“引人注目”,则指的是具有潜在革命性和开创性的。我所指的就是Google Wave的宣布,一个已经公开测试数月的即时协作工具。

For those of you who don’t know, Google Wave is a collaboration tool based on real-time, simultaneous editing of documents via a mechanism known as “operational transformation”. Entities which appear as messages in the Wave client are actually “waves”. Within each “wave” is a set of “wavelets”, each of which contains a set of documents. Individual documents can represent things like messages, conversation structure (which reply goes where, etc), spell check metadata and so on. Documents are composed of well-formed XML with an implicit root node. Additionally, they carry special metadata known as “annotations” which are (potentially-overlapping) key/value ranges which span across specific regions of the document. In the Wave message schema, annotations are used to represent things like bold/italic/underline/strikethrough formatting, links, caret position, the conversation title and a host of other things. An example document following the Wave message schema might look something like this:

对于那些不明白的人,简单来说Google Wave是即时基础上的一种协作工具,就是通过名为“operational transformation”这种机制同时对文档进行编辑。Wave客户端以信息形式出现的实体就称为“waves”。而每个“wave”是一个“wavelets”的集合,里面包含了一系列文档文件。个人文件可以代表信息、对话结构(什么回复什么之类)、拼写检查元数据等等各种事物。这些文件都是由隐含一个根节点的结构良好的xml文件组成的。此外,他们把特殊的元数据称为“注释”,其实就是扩展到指定文件区域的(可能重叠的)键/值范围。在Wave信息架构里,注释被用来表现粗体/斜体/下划线/删除线格式、链接、插入符号位置,对话标题以及这一类东西。按照Wave信息结构来,一个样例文件格式看起来应该是这样的:

<body>
  <line/>Test message
  <line/>
  <line/>Lorem ipsum dolor sit amet.
</body>
 

(assuming the following annotations):

(假定以下这些注释):

  • style/font-weight -> bold

  • style/font-style -> italic

  • link/manual -> http://www.google.com

You will notice that the annotations for style/font-style and link/manual actually overlap. This is perfectly acceptable in Wave’s document schema. The resulting rendering would be something like this:

你会发现style/font-style和link/manual注释实际上重叠了。在Wave文件结构中,这个是完全可接受的。由上产生的结构应该是这个样子:

Test message

Lorem ipsum dolor sit amet.

The point of all this explaining is to give you at least a passing familiarity with the Wave document schema so that I can safely use its terminology in the article to come. See, Wave itself is not nearly so interesting as the idea upon which it is based. As mentioned, every document in Wave is actually just raw XML with some ancillary annotations. As far as the Wave server is concerned, you can stuff whatever data you want in there, just so long as it’s well-formed. It just so happens that Google chose to implement a communications tool on top of this data backend, but they could have just as easily implemented something more esoteric, like a database or a windowing manager.

上面这些解释主要目的就是让你们熟悉一下Wave的文件结构,这样我就可以在接下来文章里面放心使用一些术语。你们也看到了,Wave本身倒不像基于其上的这个想法这么有趣。如前所述,每一个Wave文件实际上只是一些带有辅助注释的原始XML。至于那些Wave服务器,你可以在那里填塞你想要存放的数据,只要它们结构规范。其实只是偶然让Google决定在其数据库后端上添加一个通讯工具,但他们很容易地就实现了像数据库或者窗口管理器这样更深奥的东西。

The key to Wave is the mechanism by which we interact with these documents: operational transformation. Wave actually doesn’t allow you to get access to a document as raw XML or anything even approaching it. Instead, it demands that all of your access to the document be performed in terms of operations. This has two consequences: first, it allows for some really incredible collaborative tools like the Wave client; second, it makes it really tricky to implement any sort of Wave-compatible service. Given the fact that I’ve been working on Novell Pulse (which is exactly this sort of service), and in light of the fact that Google’s documentation on the subject is sparing at best, I thought I would take some time to clarify this critical piece of the puzzle. Hopefully, the information I’m about to present will make it easier for others attempting to interoperate with Wave, Pulse and the (hopefully) many OT-based systems yet to come.

Wave的关键就是文件互动机制:操作转换。Wave其实不会让你可以访问作为原始XML的文件或者其他试图接近的东西。相反,它要求您按照操作形式访问文档文件。这就产生两个后果:第一,就得考虑像Wave客户端这样的真正的难以置信的合作工具;第二,实施任何Wave兼容服务的形式变得比较棘手。鉴于我一直工作于Novell Pulse(正是这种服务形式),而且Google关于这个问题的文档保留最多,我会花一些时间来阐明这个拼图的关键一块。希望我准备介绍的这些信息,对其他试图在Wave、Pulse或者其他更多还没出现的基于OT系统之间交互操作的人会有所帮助。

Operations

操作

Intuitively enough, the fundamental building block of operational transforms are operations themselves. An operation is exactly what it sounds like: an action which is to be performed on a document. This action could be inserting or deleting characters, opening (and closing!) an XML element, fiddling with annotations, etc. A single operation may actually perform many of these actions. Thus, an operation is actually made up of a sequence of operation components, each of which performs a particular action with respect to the cursor (not to be confused with the caret, which is specific to the client editor and not at all interesting at the level of OT).

够直观的,OT的基本构建骨架就是操作本身。一个操作就像它听起来一样:对文档的操作行为。这个行为可以是插入或者删除字符,开启(关闭)XMl元素,或者摆弄注释这些。一个单一操作可以完成许多以上行为。因此,操作实际上是由一系列操作成分组成的,每一个操作成分完成一个关于光标的特定行为(不要混淆插入符号,它是特定于客户端编辑器,而不是OT级别上的所有插入形式)。

There are a number of possible component types. For example:

  • insertCharacters — Inserts the specified string at the current index

  • deleteCharacters — Deletes the specified string from the current index

  • openElement — Creates a new XML open-tag at the current index

  • deleteOpenElement — Deletes the specified XML open-tag from the current index

  • closeElement — Closes the first currently-open tag at the current index

  • deleteCloseElement — Deletes the XML close-tag at the current index

  • annotationBoundary — Defines the changes to any annotations (starting or ending) at the current index

  • retain — Advances the index a specified number of items

存在很多可能组件类型。比如:

  • insertCharacters — 当前索引处插入指定字符串

  • deleteCharacters —当前索引处删除指定字符串

  • openElement — 当前索引处创建一个新的XML开启标签

  • deleteOpenElement —当前索引处删除指定的XML开启标签

  • closeElement — 当前索引处开闭第一个开启标签

  • deleteCloseElement — 当前索引处删除XML关闭标签

  • annotationBoundary — 当前索引处定义任何更改说明(开始或者结束)

  • retain — 索引前进指定数目项

Wave’s OT implementation actually has even more component types, but these are the important ones. You’ll notice that every component has something to do with the cursor index. This concept is central to Wave’s OT implementation. Operations are effectively a stream of components, each of which defines an action to be performed which affects the content, the cursor or both. For example, we can encode the example document from earlier as follows:

Wave的OT实际实现了更多的组件,但上面列出的都是重要的。你会发现,每一个组件都与光标相关。这一概念的核心是Wave的OT实现。操作就是有效的组件流,每个组件定义了一个影响的内容,光标或两者兼有的执行行为。举例来说,我们编码下面早期样例文档:

  1. openElement(‘body’)

  2. openElement('line')

  3. closeElement()

  4. annotationBoundary(startKeys: ['style/font-weight'], startValues: ['bold'])

  5. insertCharacters('Test message')

  6. annotationBoundary(endKeys: ['style/font-weight'])

  7. openElement('line')

  8. closeElement()

  9. annotationBoundary(startKeys: ['style/font-style'], startValues: ['italic'])

  10. openElement('line')

  11. closeElement()

  12. insertCharacters('Lorem ')

  13. annotationBoundary(startKeys: ['link/manual'], startValues: ['http://www.google.com'])

  14. insertCharacters('ipsum')

  15. annotationBoundary(endKeys: ['style/font-style'])

  16. insertCharacters(' dolor')

  17. annotationBoundary(endKeys: ['link/manual'])

  18. insertCharacters(' sit amet.')

  19. closeElement()

Obviously, this isn’t the most streamlined way of referring to a document’s content for a human, but a stream of discrete components like this is perfect for automated processing. The real utility of this encoding though doesn’t become apparent until we look at operations which only encode a partial document; effectively performing a particular mutation. For example, let’s follow the advice of Strunk and White and capitalize the letter ‘m’ in our title of ‘Test message’. What we want to do (precisely-speaking) is delete the ‘m’ and insert the string ‘M’ at its previous location. We can do that with the following operation:

很显然,对人类来说这不是一个涉及文档内容的最精简方式,但相对于自动处理像这样的离散的组件流形式是完美的。这种编码的实际效用在我们看到编码特定文档的操作前都不明显;特别是执行特殊的突变效果。举例来说,让我们听从StrunkWhite的意见,大写‘Test message’标题中的‘m’字母。我们需要做的是(准确地讲)删除'm',并在它以前的位置插入字符'M'。我们可以做如下操作:

  1. retain(8)

  2. deleteCharacters('m')

  3. insertCharacters('M')

  4. retain(38)

Instead of adding content to the document at ever step, most of this operation actually leaves the underlying document untouched. In practice, retain() tends to be the most commonly used component by a wide margin. The trick is that every operation must span the full width of the document. When evaluating this operation, the cursor will start at index 0 and walk forward through the existing document and the incoming operation one item at a time. Each XML tag (open or close) counts as a single item. Characters are also single items. Thus, the entire document contains 47 items.

这个操作大部分实际上是保持相关文档不动,而不是不断的添加内容到文档中去。实际使用中,retain()往往是最常用的组件。因为呢,每一个操作必须跨越的文档的全宽。评估这次操作,光标开始在索引0位置,然后前进穿过现有文档,一次完成一个项操作。每个XML标记(开启或关闭)作为一项。字符也作为单一项。因此,整个文档包含47项。

Our operation above cursors harmlessly over the first eight items (the <body> tag, the <line/> tag and the string 'Test '). Once it reaches the 'm' in 'message', we stop the cursor and perform a mutation. Specifically, we’re using the deleteCharacters() component to remove the 'm'. This component doesn’t move the cursor, so we’re still sitting at index 8. We then use the insertCharacters() component to add the character 'M' at precisely our currently location. This time, some new characters have been inserted, so the cursor advances to the end of the newly-inserted string (meaning that we are now at index 9). This is intuitive because we don’t want to have to retain() over the text we just inserted. We do however want to retain() over the remainder of the document, seeing as we don’t need to do anything else. The final rendered document looks like the following:

我们上面对光标的操作无恶意越过了前8个项( <body>标签, <line/>标签和字符串'Test ' )。一旦它到达'message'的'm',我们就停止光标和,执行突变。具体来说,我们使用deleteCharacters()组件,以删除'm' 。这个组件不移动光标,所以我们仍然在索引8位置。然后,我们使用insertCharacters()组件在我们在目前的位置添加字符'M'。这一次,一些新的字符被插入了,因此,光标前进到新插入字符串结尾(即我们现在索引9位置 )。这样做比较直观,因为我们不希望去retain()越过我们刚刚插入的文本。鉴于我们不需要进行其他操作了,我们无论如何想retain()越过文档的其余部分。最后提供的文档如下所示:

Test Message

Lorem ipsum dolor sit amet.

Composition

组成

One of Google’s contributions to the (very old) theory behind operational transformation is the idea of operation composition. Because Wave operations are these nice, full-span sequences of discrete components, it’s fairly easy to take two operations which span the same length and merge them together into a single operation. The results of this action are really quite intuitive. For example, if we were to compose our document operation (the first example above) with our ‘m’-changing operation (the second example), the resulting operation would be basically the same as the original document operation, except that instead of inserting the text 'Test message', we would insert 'Test Message'. In composing the two operations together, all of the retains have disappeared and any contradicting components (e.g. a delete and an insert) have been directly merged.

Google对OT支撑理论(非常老旧)的贡献之一就是操作组成的概念。由于Wave这种横跨全部离散组件操作是非常友好的,所以可以很容易实现跨越同样长度的两个操作,并且可以把它们合成为一个操作。这样做的结果是很直观的。例如,如果我们想集成文档操作(上面第一个例子)和编写改变‘m’操作 (第二个例子),那么由此产生的操作会与原先基本相同,只是不是插入文字'Test message' ,而是插入'Test Message' 。两次操作合而为一,所有retain操作消失,任何矛盾的组件(例如删除和插入)也直接合并了。

Composition is extremely important to Wave’s OT as we will see once we start looking at client/server asymmetry. The important thing to notice now is the fact that composed operations must be fundamentally compatible. Primarily, this means that the two operations must span the same number of indexes. It also means that we cannot compose an operation which consists of only a text insert with an operation which attempts to delete an XML element. Obviously, that’s not going to work. Wave’s Composer utility takes care of validating both the left and the right operation to ensure that they are compatible as part of the composition process.

一旦我们考虑客户端/服务器的不对称性,我们就会发现组成形式对Wave的OT是极其重要的。但首先需要注意的重要一点是,组成操作必须从根本上兼容。首先,这意味着两个操作必须跨越相同的索引数。这也意味着,我们不能把只执行文本插入的操作和试图删除XML元素的操作相集成。显然,这是不会起作用的。Wave的设计者小心有效验证两边的操作,以确保。

Please also note that composition is not commutative; ordering is significant. This is also quite intuitive. If you type the character a and then type the character b, the result is quite different than if you type the character b and then type the character a.

同时需要注意的是,不可交换的组成部分;顺序是很重要的。这句话意思也很直观。如果您键入的字符a , 然后键入字符b ,结果是完全不同于您先键入字符b , 然后键入字符a 。

Transformation

转换

Here’s where we get to some of the really interesting stuff and the motivation behind all of this convoluted representational baggage. Operational Transformation, at its core, is an optimistic concurrency control mechanism. It allows two editors to modify the same section of a document at the same time without conflict. Or rather, it provides a mechanism for sanely resolving those conflicts so that neither user intervention nor locking become necessary.

我们会在这节里发现一些真正有趣的东西,以及所有难以理解表象背后的动机。操作转换的核心,在其是一个乐观并发控制机制。它允许两个编辑器在同一时间不冲突的修改一个文档的相同部分。或者更确切地说,它提供了一个稳定解决这些冲突的机制,而不需用户干预或者锁定。

This is actually a harder problem than it sounds. Imagine that we have the following document (represented as an operation):

其实这是个比听起来困难得多的问题。试想一下,我们有以下文档(当作一个操作):

  1. insertCharacters('go')

Now imagine that we have two editors with their cursors positioned at the end of the document. They simultaneously insert a t and a character (respectively). Thus, we will have two operations sent to the server. The first will retain 2 items and insert a t, the second will retain 2 items and insert a. Naturally, the server needs to enforce atomicity of edits at some point (to avoid race conditions during I/O), so one of these operations will be applied first. However, as soon as either one of these operations is applied, the retain for the other will become invalid. Depending on the ordering, the text of the resulting document will either be 'goat' or 'gota'.

现在,假设我们有两个编辑人员,他们光标都在文档结尾处。他们同时插入字母t和a(分别)。因此,我们将两个操作发送到服务器。第一个操作将前进2项,插入字母t ,第二个操作将前进2项,插入字母a 。当然,服务器需要在某些点实现编辑原子性(以避免I / O的竞争),所以两个操作之一将首先得到实行。不过,只要其中任何一个操作实施,另一个的retain将变为无效。根据顺序,该文件的文本所得到的是'goat'或者'gota' 。

In and of itself, this isn’t really a problem. After all, any asynchronous server needs to make decisions about ordering at some point. However, issues start to crop up as soon as we consider relaying operations from one client to the other. Client A has already applied its operation, so its document text will be 'got'. Meanwhile, client B has already applied its operation, and so its document text is 'goa'. Each client needs the operation from the other in order to have any chance of converging to the same document state.

对其本身,这不是一个真正的问题。毕竟,任何异步服务器需要在某些点上作出有关排序的决定。然而,当我们考虑客户端之间操作转发时候问题开始暴露出来了。客户端A已实施操作,因此它的文件文本将是'got' 。同时,客户端B也实施操作 ,那么它的文档中的文本就应该是'goa' 。每个客户端都要求来自对方的操作,以便有机会收敛到同一文档状态。

Unfortunately, if we na?vely send A’s operation to B and B’s operation to A, the results will not converge:

但不幸的是,如果我们天真的发送A的操作给B,发送B的操作给A,那么结果是不会收敛的:

  • 'got' + (retain(2); insertCharacters('a') = 'goat'

  • 'goa' + (retain(2); insertCharacters('t') = 'gota'

Even discounting the fact that we have a document size mismatch (our operations each span 2 indexes, while their target documents have width 3), this is obviously not the desired behavior. Even though our server may have a sane concept of consistent ordering, our clients obviously need some extra hand-holding. Enter OT.

即使不考虑文件大小不匹配的事实(我们的每一个行动跨越2个索引,而他们的目标文件宽度为3),这也显然不是理想的结果。即使服务器对排序有清醒的概念,我们的客户也显然需要一些额外的控制手段。引入OT(操作转换)。

What we have here is a simple one-step diamond problem. In the theoretical study of OT, we generally visualize this situation using diagrams like the following:

我们这里考虑的是一个简单的单步菱形问题。在OT的理论研究中,我们一般使用类似下面的图想象这种情况:

《黑客》中文之理解和应用操作转换

The way you should read diagrams like this is as a graphical representation of operation application on two documents at the same time. Client operations move the document to the left. Server operations move the document to the right. Both client and server operations move the document downward. Thus, diagrams like these let us visualize the application of operations in a literal “state space”. The dark blue line shows the client’s path through state space, while the gray line shows the server’s. The vertices of these paths (not explicitly rendered) are points in state space, representing a particular state of the document. When both the client and the server line pass through the same point, it means that the content of their respective documents were in sync, at least at that particular point in time.

应该这样解释这幅图,它是在同一时间两份文档上的操作应用的图形化代表。客户端操作把文档移动到左侧。服务器操作文档移动到右侧。客户端和服务器操作都使文档向下移动。因此,这样图解让我们把操作应用想象成“状态矢量空间”。深蓝色的线表示状态空间下客户端的路径,而灰色的线显示的是服务器的。这些路径(没有明确地呈现)的顶点是状态空间中的点,代表文档的特定状态。当客户端和服务器端路径通过同一点时,意味着至少在那个特定时间点上,它们各自的文档内容同步。

So, in the diagram above, operation a could be client A’s operation (retain(2); insertCharacters('t')) and operation b could be client B’s operation. This is of course assuming that the server chose B’s operation as the “winner” of the race condition. As we showed earlier, we cannot simply na?vely apply operation a on the server and b on the client, otherwise we could derive differing document states ('goat' vs 'gota'). What we need to do is automatically adjust operation a with respect to b and operation b with respect to a.

因此,在上面的图,操作a代表客户端A的操作(retain(2); insertCharacters('t')),操作b代表被客户端B的操作。这里假设服务器选择客户端B的操作为竞争条件的“胜利者”。正如我们先前表明的那样,我们不能简单天真地在服务器端应用操作a ,在客户端应用操作b,否则,我们会得到不同的文档状态('goat' vs 'gota')。我们需要做的是能自动根据对方操作调整本地操作。

We can do this using an operational transform. Google’s OT is based on the following mathematical identity:

我们可以利用OT做到这一点。Google的OT是基于以下公式的:

 《黑客》中文之理解和应用操作转换

In plain English, this means that the transform function takes two operations, one server and one client, and produces a pair of operations. These operations can be applied to their counterpart’s end state to produce exactly the same state when complete. Graphically, we can represent this by the following:

简单的说,这意味着transform函数需要两个操作,一个服务器和一个客户端,并产生一个操作对。这些操作都可以应用于对方的结束状态,以此产生完全相同的状态。我们可以通过以下图形来表示:

《黑客》中文之理解和应用操作转换

Thus, on the client-side, we receive operation b from the server, pair it with a to produce (a’, b’), and then compose b’ with a to produce our final document state. We perform an analogous process on the server-side. The mathematical definition of the transform function guarantees that this process will produce the exact same document state on both server and client.

因此,在客户端,我们收到服务器的操作b,和操作a配对形成操作对(a’, b’)然后集成操作b’操作a完成最终文档状态。我们在服务器端执行类似的过程。Transform函数的数学定义确保这一过程将使服务器和客户端产生完全相同文档状态

Coming back to our concrete example, we can finally solve the problem of 'goat' vs 'gota'. We start out with the situation where client A has applied operation a, arriving at a document text of 'got'. It now receives operation b from the server, instructing it to retain over 2 items and insert character 'a'. However, before it applies this operation (which would obviously result in the wrong document state), it uses operational transformation to derive operation b’. Google’s OT implementation will resolve the conflict between 't' and 'a' in favor of the server. Thus, b' will consist of the following components:

再说回具体例子,现在我们终于可以解决'goat' vs 'gota'的难题了。我们先设客户端A执行操作a,然后文本内容变为'got'的情况。现在从服务器接收操作b,指示前进2项,插入字符'a' 。但在其应用此操作(这显然会导致错误的文件状态)前,需要利用OT机制获得操作b’ 。Google的OT实现,将按照有利服务器的原则解决't' 和 'a'冲突问题。因此,b'将包括以下内容:

  1. retain(2)

  2. insertCharacters('a')

  3. retain(1)

You will notice that we no longer have a document size mismatch, since that last retain() ensures that the cursor reaches the end of our length-3 document state ('got').

你会发现,我们没有一个文件大小不匹配,因为这最后retain()函数确保光标到达我们长度为3的文档状态('got')的末尾。

Meanwhile, the server has received our operation a and it performs an analogous series of steps to derive operation a’. Once again, Google’s OT must resolve the conflict between 't' and 'a' in the same way as it resolved the conflict for client A. We’re trying to apply operation a (which inserts the 't' character at position 2) to the server document state, which is currently 'goa'. When we’re done, we must have the exact same document content as client A following the application of b’. Specifically, the server document state must be 'goat'. Thus, the OT process will produce the operation a’ consisting of the following components:

与此同时,服务器也收到了操作a,并经过一些列类似步骤后得到操作a’再一次,Google的OT必须以同样的方式解决't' 和 'a'冲突问题。我们要在服务器端现为'goa'的文档状态上执行操作a(在位置2插入字符't')。完成时,我们应该和执行了操作b’后的客户端A有完全相同的文档内容。因此,OT运作过程会产生包括如下组成部分的操作a’

  1. retain(3)

  2. insertCharacters('t')

Client A applies operation b’ to its document state, the server applies operation a’ to its document state, and they both arrive at a document consisting of the text 'goat'. Magic!

客户端A应用操作b’,服务器应用操作a’,然后二者都得到了包含文本'goat'的文档。神奇吧!

It is very important that you really understand this process. OT is all about the transform function and how it behaves in this exact situation. As it turns out, this is all that OT does for us in and of itself. Operational transformation is really just a concurrency primitive. It doesn’t solve every problem with collaborative editing of a shared document (as we will see in a moment), but it does solve this problem very well.

这对你真正理解这个过程是非常重要的。OT全部就是transform函数以及在精确情况下表现。事实证明,这也就是OT本身能为我们做的一切。事实上,操作转换仅仅是一个并发原型。它并不能解决共享文档协作编辑的所有问题(待会我们会看到),但它确实非常好的解决了这个问题。

One way to think of this is to keep in mind the “diamond” shape shown in the above diagram. OT solves a very simple problem: given the top two sides of the diamond, it can derive the bottom two sides. In practice, often times we only want one side of the box (e.g. client A only needs operation b’, it doesn’t need a’). However, OT always gives us both pieces of the puzzle. It “completes” the diamond, so to speak.

思考OT的一种方法,就是记住上图所示的“菱形”形状。OT解决了一个很简单的问题:给出菱形的上两边,可以推导出下两边。在实践中,很多时候我们只想要盒子的一边(比如客户端A只需要一个操作b’ ,它并不需要操作a’)。然而,OT往往给出难题的两边。它“完成”菱形形状,可以这么说。

Compound OT

复合型OT

So far, everything I have presented has come pretty directly from the whitepapers on waveprotocol.org. However, contrary to popular belief, this is not enough information to actually go out and implement your own collaborative editor or Wave-compatible service.

到目前为止,上面我所提到的都直接出自于waveprotocol.org网站上的白皮书。然而,与大家普遍认为的相反,对于实际去应用于自己协作编辑器或者Wave兼容服务,这些信息是不够的。

The problem is that OT doesn’t really do all that much in and of itself. As mentioned above, OT solves for two sides of the diamond in state space. It only solves for two sides of a simple, one-step diamond like the one shown above. Let me say it a third time: the case shown above is the only case which OT handles. As it turns out, there are other cases which arise in a client/server collaborative editor like Google Wave or Novell Pulse. In fact, most cases in practice are much more complex than the one-step diamond.

问题在于,OT本身并没有真的完成一切。如前所述,OT解决了在状态空间菱形两边问题。但它也只是解决了这种简单的、单步的菱形两边问题。让我再说第三次:上述案例中是OT掌握的唯一的案例。事实证明,像Google Wave或者Novell Pulse这种服务器/客户端协同编辑器会出现各种其他情况。实际实践当中多数情况比单步菱形问题复杂的多。

For example, consider the situation where the client performs two operations (say, by typing two characters, one after the other) while at the same time the server performs one operation (originating from another client). We can diagram this situation in the following way:

举例来说,假定这种情况:客户端执行两个操作(一个接一个键入两个字符),与此同时服务器端只执行一个操作(来自另外客户端)。我们可以用以下图示表示这种情况:

《黑客》中文之理解和应用操作转换

So we have two operations in the client history, a and b, and only one operation in the server history, c. The client is going to send operations a and b to the server, presumably one after the other. The first operation (a) is no problem at all. Here we have the simple one-step diamond problem from above, and as well know, OT has no trouble at all in resolving this issue. The server transforms a and c to derive operation a’, which it applies to its current state. The resulting situation looks like the following:

因此,客户端段有两个操作a b,而在服务器段只有一个操作 c。客户端将发送其操作a b给服务器,假定一个接一个。第一次操作a没有任何问题。这是一个简单的单步菱形问题,而我们知道OT解决这类问题完全没有任何困难。服务器把a c转换成适用于目前的状态操作a’。由此产生的情况如下所示:

《黑客》中文之理解和应用操作转换

Ok, so far so good. The server has successfully transformed operation a against c and applied the resulting a’ to its local state. However, the moment we move on to operation b, disaster strikes. The problem is that the server receives operation b, but it has nothing against which to transform it!

好了,到目前为止,一切良好。服务器在操作c对应下成功转换了a,并且应用a’得到本地状态。然后当我们继续进行b操作时,灾难发生了。问题在于服务器收到了操作b,但没有对应操作以实现操作b的转换。

Remember, OT only solves for the bottom two sides of the diamond given the top two sides. In the case of the first operation (a), the server had both top sides (a and c) and thus OT was able to derive the all-important a’. However, in this case, we only have one of the sides of the diamond (b); we don’t have the server’s half of the equation because the server never performed such an operation!

应该记得,OT只有在给出顶部两边的情况下才能解决底部两边的问题。第一个操作a情况服务器有顶部两边(a and c),因此OT能获得重要的操作a’但是,在现在这种情况下,我们只有菱形的一边(b);我们没有服务器方程的另一半,因为服务器根本没有执行这样的操作!

In general, the problem we have here is caused by the client and server diverging by more than one step. Whenever we get into this state, the OT becomes more complicated because we effectively need to transform incoming operations (e.g. b) against operations which never happened! In this case, the phantom operation that we need for the purposes of OT would take us from the tail end of a to the tail end of a’. Think of it like a “bridge” between client state space and server state space. We need this bridge, this second half of the diamond, if we are to apply OT to solve the problem of transforming b into server state space.

一般来说,这个问题是由客户端和服务器不只一步的分歧引起的。每当我们陷入这种情况,OT会变得更加复杂,因为我们需要有效的转换这种对应操作根本没有发生的引入操作(如b在这种情况下,需要用于OT的这种幽灵操作会连接a末端和a’的末端。把它想象成连接客户端状态空间和服务器状态空间之间的一座“桥梁”。 如果我们要应用OT解决把b转换到服务器状态空间问题的话,我们就需要这座桥梁,作为菱形的另外一半。

Operation Parentage

操作派生

In order to do this, we need to add some metadata to our operations. Not only do our operations need to contain their components (retain, etc), they also must maintain some notion of parentage. We need to be able to determine exactly what state an operation requires for successful application. We will then use this information to detect the case where an incoming operation is parented on a state which is not in our history (e.g. b on receipt by the server).

为了做到这一点,需要给我们操作添加一些元数据。我们的操作不仅需要包含本身组件(retain等),它们也必须保持一定的衍生观念。我们需要能够准确确定成功执行一个操作所需要的状态。然后,我们将利用这一信息来检测传入的操作源于不存在状态的情况( 服务器接收到的操作b)。

For the record, Google Wave uses a monotonically-increasing scalar version number to label document states and thus, operation parents. Novell Pulse does the exact same thing for compatibility reasons, and I recommend that anyone attempting to build a Wave-compatible service follow the same model. However, I personally think that compound OT is a lot easier to understand if document states are labeled by a hash of their contents.

根据记录,Google Wave使用了单调递增版本号来标记文档状态,也就是操作双亲。Novell Pulse出于兼容性考虑,使用了和Google完全一致的东西,所以我建议任何希望建立一个Wave兼容服务的朋友都遵循相同的模式。不过,其实我个人认为,如果文档状态用其内容的散列值来标记,就能更容易理解复合型OT。

This scheme has some very nice advantages. Given an operation (and its associated parent hash), we can determine instantly whether or not we have the appropriate document state to apply said operation. Hashes also have the very convenient property of converging exactly when the document states converge. Thus, in our one-step diamond case from earlier, operations a and b would be parented off of the same hash. Operation b’ would be parented off of the hash of the document resulting from applying a to the initial document state (and similarly for a’). Finally, the point in state space where the client and server converge once again (after applying their respective operations) will have a single hash, as the document states will be synchronized. Thus, any further operations applied on either side will be parented off of a correctly-shared hash.

这个原理有一些非常好的优势。根据一个操作(及其关联的父级散列),我们可以立即确定是否有适当的文档状态执行上述操作。而且当文档状态聚合时,散列也有很方便的聚合属性。前面提到的单步菱形的情况,操作a b都是同一个散列派生而来的。而操作b’是对原始状态执行操作a后所得的文档散列派生得到的(操作a’也类似)。最后,随着文档状态的同步,状态空间中客户端和服务器再次相交的一点(在执行对应操作后)将有一个单一的散列。因此,将来应用于任一边的操作都会由正确共享的散列派生得到。

Just a quick terminology note: when I say “parent hash”, I’m referring to the hash of the document state prior to applying a particular operation. When I say “parent operation” (which I probably will from time to time), I’m referring to the hash of the document state which results from applying the “parent operation” to its parent document state. Thus, operation b in the diagram above is parented off of operation a which is parented off of the same hash as operation c.

有个简短术语需要注意下:当我说“父级散列”,我指的是在应用一个特定操作前的文档状态的散列。当我说“父级操作”(我大概会时不时提到),我指的是父级文档状态应用“父级操作”后得到的结果散列。因此,前面图示中操作b源于操作a,而操作a和操作c源于相同的散列。

Compound OT

复合型OT

Now that our operations have parent information, our server is capable of detecting that operation b is not parented off of any state in its history. What we need to do is derive an operation which will take us from the parent of b to some point in server state-space. Graphically, this operation would look something like the following (rendered in dark green):

因为我们操作都拥有父级信息,所以服务器就能检测出操作b不源于结构历史中任一状态。我们需要做的是派生一个操作,连接b的父节点到服务器状态空间中的某点。这个操作图形化表示如下图(深绿色的虚线表示):

《黑客》中文之理解和应用操作转换

Fortunately for us, this operation is fairly easy to derive. In fact, we already derived and subsequently threw it away! Remember, OT solves for two sides of the diamond. Thus, when we transformed a against c, the resulting operation pair consisted of a’ (which we applied to our local state) and another operation which we discarded. That operation is precisely the operation shown in green above. Thus, all we have to do is re-derive this operation and use it as the second top side of the one-step diamond. At this point, we have all of the information we need to apply OT and derive b’, which we can apply to our local state:

非常幸运,这次操作很容易获得。事实上,我们曾经获得过,但把它抛弃了!还应该记得,OT是解决了菱形两边问题的。因此,当我们在操作c的背景下转换a,由此产生的操作对包括a’我们本地空间应用的)和另一个我们抛弃的操作。这一操作正是上面显示绿色的操作。因此,所有我们需要做的就是重新获得这一操作,并以此为第二次单步菱形问题的顶点。在这一点上,我们拥有所有需要的信息,用来应用OT获得用于本地状态的操作b’

《黑客》中文之理解和应用操作转换

At this point, we’re almost done. The only problem we have left to resolve is the application of operation c on the client. Fortunately, this is a fairly easy thing to do; after all, c is parented off of a state which the client has in its history, so it should be able to directly apply OT.

到这一步,我们基本完工了。唯一留待解决的问题,是操作c在客户端上的应用。幸运的是,这是一个相当容易做到的事;毕竟操作c源于客户端上保有的状态,因此应该能够直接应用OT。

The one tricky point here is the fact that the client must transform c against not one but two operations (a and b). Fortunately, this is fairly easy to do. We could apply OT twice, deriving an intermediary operation in the first step (which happens to be exactly equivalent to the green intermediary operation we derived on the server) and then transforming that operation against b. However, this is fairly inefficient. OT is fast, but it’s still O(n log n). The better approach is to first compose a with b and then transform c against the composition of the two operations. Thanks to Google’s careful definition of operation composition, this is guaranteed to produce the same operation as we would have received had we applied OT in two separate steps.

这里棘手的一点是,客户端必须不是对应一个而是对应两个操作(a和b)完成c的转换。幸运的是,这也是相当容易的事。我们可以应用OT两次,第一步推导获得一个中间过度操作(这就类似于在服务器我们获得了绿色媒介),然后再对应b转换前面获得的操作。然而,这是相当没有效率的。OT是快,但它仍然On log n)的 。更好的做法是先合成a b,然后对应这两个操作的组合转换c。多亏Google对操作构成的仔细定义,这样确保了我们能得到和应用两次OT相同的操作。

The final state diagram looks like the following:

最终状态图解如下所示:

《黑客》中文之理解和应用操作转换

Client/Server Asymmetry

客户端/服务器不对称性

Technically, what we have here is enough to implement a fully-functional client/server collaborative editing system. In fact, this is very close to what was presented in the 1995 paper on the Jupiter collaboration system. However, while this approach is quite functional, it isn’t going to work in practice.

从技术上讲,讲到这里就足以实现全功能的客户端/服务器协同编辑系统。事实上,这很接近我们发表于1995年,关于‘the Jupiter collaboration system’的论文。然而,这种做法虽然很实用,但它不会被应用在工作实践中。

The reason for this is in that confusing middle part where the server had to derive an intermediary operation (the green one) in order to handle operation b from the client. In order to do this, the server needed to hold on to operation a in order to use it a second time in deriving the intermediary operation. Either that, or the server would have needed to speculatively retain the intermediary operation when it was derived for the first time during the transformation of a to a’. Now, this may sound like a trivial point, but consider that the server must maintain this sort of information essentially indefinitely for every client which it handles. You begin to see how this could become a serious scalability problem!

原因在于为了处理操作b服务器不得不去生成一个过渡操作(图中绿色线),这个中间过程容易造成混乱。为了做到这一点,要么服务器需要保留操作a,并在生成过渡操作的过程中第二次调用它;要么,或者服务器不得不在转换操作a的时候保留过渡操作。现在,这听起来像微不足道,但考虑到服务器本质上必须无限期保存它处理的每个客户信息。你就会开始了解这将成为一个严重的可扩展性问题!

In order to solve this problem, Wave (and Pulse) imposes a very important constraint on the operations incoming to the server: any operation received by the server must be parented on some point in the server’s history. Thus, the server would have rejected operation b in our example above since it did not branch from any point in server state space. The parent of b was a, but the server didn’t have a, it only had a’ (which is clearly a different point in state space).

为了解决这个问题,Wave(和Pulse)对服务器引入的操作施加了一个非常重要的约束:服务器收到任何操作必须源于服务器历史记录上的某点。因此,在上面我们提到的例子中,服务器会拒绝上述操作b,因为它不是服务器状态空间中任何一点派生而来的。b的父操作是a ,但服务器没有a ,它只有a'这显然是状态空间中的两个不同点)。

Of course, simply rejecting any divergence which doesn’t fit into the narrow, one-step diamond pattern is a bit harsh. Remember that practically, almost all situations arising in collaborative editing will be multi-step divergences like our above example. Thus, if we na?vely rejected anything which didn’t fit into the one-step mold, we would render our collaborative editor all-but useless.

当然,单纯地拒绝任何不符合狭隘单步菱形模式的分歧是有点苛刻。但请记住,实际上,协同编辑所产生的几乎所有的情况都是是像我们上面的例子那样的多步分歧。因此,如果我们天真地拒绝不符合单步模型的所有情况,那只会使我们的编辑器毫无用处。

The solution is to move all of the heavy lifting onto the client. We don’t want the server to have to track every single client as it moves through state space since there could be thousands (or even millions) of clients. But if you think about it, there’s really no problem with the client tracking the server as it moves through state space, since there’s never going to be any more than one (logical) server. Thus, we can offload most of the compound OT work onto the client side.

解决的方案就是将所有重负转移到客户端。我们不希望服务器移动状态空间时必须跟踪每一个客户,因为可能存在成千上万的(甚至几百万)的客户。但是如果你想一想,客户端在移动状态空间时跟踪服务器是没有任何问题的,因为不会存在任何一个以上的(逻辑)服务器。因此,我们可以卸载复合OT的大部分工作到客户端上。

Before it sends any operations to the server, the client will be responsible for ensuring those operations are parented off of some point in the server’s history. Obviously, the server may have applied some operations that the client doesn’t know about yet, but that’s ok. As long as any operations sent by the client are parented off of some point in the server’s history, the server will be able to transform that incoming operation against the composition of anything which has happened since that point without tracking any history other than its own. Thus, the server never does anything more complicated than the simple one-step diamond divergence (modulo some operation composition). In other words, the server can always directly apply OT to incoming operations, deriving the requisite operation extremely efficiently.

在发送任何操作到服务器之前,客户端将负责确保这些操作源于服务器上历史记录的某点。显然,服务器可能应用了一些客户端不知道的操作,但这没关系。只要任何客户端发送的操作在服务器历史记录上都有父节点,服务器就能够依靠节点上发生事件构成转换引入的操作,而不需要跟踪出本身外其他历史记录。因此,服务器永远不用负责比单步菱形分歧更复杂的问题(操作组合的模型)。换句话说,服务器总是可以直接应用OT,更加有效的获得结果操作。

Unfortunately, not all is sunshine and roses. Under this new regime, the client needs to work twice as hard, translating its operations into server state space and (correspondingly) server operations back into its state space. We haven’t seen an example of this “reverse” translation (server to client) yet, but we will in a moment.

不幸的是,并非一切都是美好的。按照新的架构,客户端就需要付出加倍努力,转化本身操作到服务器状态空间,(相应的)还要转换服务器操作到本身状态空间。我们还没有看到 “反向”转换(服务器到客户端)的例子,但马上我们就会看到了。

In order to maintain this guarantee that the client will never send an operation to the server which is not parented on a version in server state space, we need to impose a restriction on the client: we can never send more than one operation at a time to the server. This means that as soon as the client sends an operation (e.g. a in the example above), it must wait on sending b until the server acknowledges a. This is necessary because the client needs to somehow translate b into server state space, but it can’t just “undo” the fact that b is parented on a. Thus, wherever b eventually ends up in server state space, it has to be a descendant of a’, which is the server-transformed version of a. Literally, we don’t know where to translate b into until we know exactly where a fits in the server’s history.

为了维持保证客户端不会发送服务器状态空间不存在其父节点的操作,我们需要对客户端施加一个限制:一次不能发送多个操作到服务器。这意味着,只要客户端发送的操作(如上述例子中的a),它必须等待服务器确认a后才能发送b这是很有必要的,因为客户需要以某种方式转化b到服务器状态空间,但它不能“抵消”一个事实,即b源于a。因此,只要b最终在服务器状态空间结束,它必须是a’a的服务器转换版本一个子操作。从字面上看,我们不知道把b转换到哪里,直到我们确切知道a在服务器历史记录中的对应位置。

To help shed some light into this rather confusing scheme, let’s look at an example:

为了帮助阐明这个相当混乱的架构,举一个例子:

《黑客》中文之理解和应用操作转换

In this situation, the client has performed two operations, a and b. The client immediately sends operation a to the server and buffers operation b for later transmission (the lighter blue line indicates the buffer boundary). Note that this buffering in no way hinders the application of local operations. When the user presses a key, we want the editor to reflect that change immediately, regardless of the buffer state. Meanwhile, the server has applied two other operations, c and d, which presumably come from other clients. The server still hasn’t received our operation a.

这种情况,客户端已执行两次操作ab。客户端立即发送了操作a到服务器,并且为接下来转换缓冲操作b(蓝色线指示缓冲区边界)。请注意,这种方式的缓冲不会干扰本地操作的执行。当用户按下一个键,我们要求编辑器立即响应,而不用考虑缓冲空间。同时,服务器应用了其他两个操作c d,这可能来自于其他客户。图中该服务器还没有收到操作a

Note that we were able to send a immediately because we are preserving every bit of data the server sends us. We still don’t know about c and d, but we do know that the last time we heard from the server, it was at the same point in state space as we were (the parent of a and c). Thus, since a is already parented on a point in server state space, we can just send it off.

请注意,我们之所以能够立即发送操作a,是因为我们保留服务器发送给我们的所有数据。我们仍然不知道cd但我们知道最后一次从服务器收到的信息是我们和服务器处在状态空间中同一点(即ac的父节点)。因此, 既然a派生于服务器状态空间中一点,我们就可以立即发送它。

Now let’s fast-forward just a little bit. The server receives operation a. It looks into its history and retrieves whatever operations have been applied since the parent of a. In this case, those operations are c and d. The server then composes c and d together and transforms a against the result, producing a’.

现在,让我们快进一点点。服务器收到操作a,然后浏览历史记录,检索是否有操作在a的父节点之后被应用。在图示情况下,这些操作就是cd。接着,服务器集合c和d,并对应其结果转换a,得到a’

《黑客》中文之理解和应用操作转换

==================================

偶这篇文章竟然超过了linux.cn的发文长度 只好分为两个部分了 不好意思各位 ^_^

[本话题由 hlh59 于 2010-09-04 01:07:11 编辑]

本文系Linux中国原创文章,版权归作者(hlh59)及Linux中国所有。

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。如发现有害或侵权内容,请点击这里 或 拨打24小时举报电话:4000070609 与我们联系。

    猜你喜欢

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多
    喜欢该文的人也喜欢 更多