Skip to main content

mvcc translate

· 18 min read

5.1 INTRODUCTION 对于多版本并发控制算法来说,对于每个数据实例x写操作都会生成一个X的新的副本(或者叫做版本). 数据管理器因此会保存一个包含数据管理器赋值过给X的所有版本的列表. 对于每个读操作Read(x),调度器不仅把读操作发送给数据管理器,还会告诉数据管理器他想要读x的哪一个版本. In a multiversion concurrency control algorithm, each Write on a data item x produces a new copy (or version) of X. The DM that manages x therefore keeps a list of versions of X, which is the history of values that the DM has assigned to X. For each Read(x), the scheduler not only decides when to send the Read to the DM, but it also tells the DM which one of the versions of x to read. 多版本并发控制的优点在于帮助调度器避免拒绝太晚的操作(也就是说晚一点的操作不会被拒绝).举个例子,(单版本的情况下)调度器会拒绝读他本该读但是被覆盖的数据. 在多版本的情况下,旧的值不会被覆盖,因此可以延迟去读.调度器可以通过读旧版本的值来避免(单版本下)对于读操作的拒绝. 保持多版本并对于并发控制并不会花费太多成本,因为对于故障恢复算法来说也是需要版本的信息. The benefit of multiple versions for concurrency control is to help the scheduler avoid rejecting operations that arrive too late. For example, the scheduler normally rejects a Read because the value it was supposed to read has already been overwritten. With multiversions, such old values are never overwritten and are therefore always available to tardy Reads. The scheduler can avoid rejecting the Read simply by having the Read read an old version.’ Maintaining multiple versions may not add much to the cost of concurrency control, because the versions may be needed anyway by the recovery algorithm. As we’ll see in the next chapter, many recovery algorithms have to maintain some before image information, at least of those data items that have been updated by active transactions; the recovery algorithm needs those before images in case any of the active transactions abort. The before images of a data item are exactly its list of old versions. It is a small step for the DM to make those versions explicitly available to the scheduler. 一个很明显的花费是保存多个版本需要很多存储空间,为了控制这些存储的现在,版本内容必须要定期周期性地清理或者归档. 因为某些特定的版本还会被活跃的事务(也就是没有提交或者没有回滚回滚的事务),清理版本信息需要同时兼顾活跃的事务. 清理动作也是mvcc的另外一个花费. An obvious cost of maintaining multiple versions is storage space. To control this storage requirement, versions must periodically be purged or archived. Since certain versions may be needed by active transactions, purging versions must be synchronized with respect to active transactions. This purging activity is another cost of multiversion concurrency control.

我们假设当事务被抛弃,那些那么这个事务创建的版本也会被销毁.在我们后续的讨论中,词版本描述的是已提交事务或者活跃事务的数据对应的值.因此,当调度器决定分配x的特定版本给操作Read(x),返回的值不会包含被抛弃的事务. 如果版本读产生于活跃的事务,可恢复性(也就是回滚)要求读操作对应的事务的提交必须晚于被读的活跃事务的提交.

We assume that if a transaction is aborted, any versions it created are destroyed. In our subsequent discussion, the term “version” will refer to the value of a data item produced by a transaction that’s either active or committed. Thus, when the scheduler decides to assign a particular version of x to Read(x), the value returned is not one produced by an aborted transaction. If the version read is one produced by an active transaction, recoverability requires that the reader’s commitment be delayed until the transaction that produced the version has committed. 如果被读的事务最后被抛弃了(这个事务对应的版本也无效了),该活跃的事务也需要因此而被抛弃. If that transaction actually aborts (thereby invalidating its version), the reader must also be aborted. 当前存在的多版本内容仅仅对调度器和数据管理器可见,对于用户使用事务是不可见的. The existence of multiple versions is only visible to the scheduler and DM, not to user transactions. 事务只会持有该数据比如x和y.除了数据库处理系统,用户本身看上去只有一个版本,即,在用户的角度看是最后一个会写入 Transactions still reference data items, such as x and y. Users therefore expect the DBS to behave as if there were only one version of each data item, namely, the last one that was written from that user’s perspective. 调度器会通过使用多版本来减少拒绝的操作,从而提升性能. The scheduler may use multiple versions to improve performance by rejecting operations less frequently. 不过最后的结果会和单版本的结果看上去一样. But it must not change the system’s functionality over a single version view of the database. There are many applications of databases in which users do want to explicitly access each of the multiple versions of a data item. For example, a user may wish to maintain several versions of a design database: the last design released for manufacturing, the last design checked for correctness, and the most recent working design. The user may update each version of the design independently. Since the existence of these multiple versions is not transparent to the user, such applications are not appropriate for the multiversion concurrency control algorithms described in this chapter.

Analyzing Correctness 分析mvcc的正确性 为了分析mvcc算法的正确性,我们需要扩展可序列化理论.我们需要扩展两类历史:运行在多版本数据库的多版本历史,在用户看来是单版本的单版本历史.用户会把序列化但版本历史(因为我们把事务看成只有一个版本的,我们所有的目标就是多版本执行的内容和但版本看到的一样,相类似的例子就是编译器经过ssr优化后很多顺序都变了但是看上去都一样) To analyze the correctness of multiversion concurrency control algorithms, we need to extend serializability theory. This extension requires two types of histories: multiversion (MV) histories that represent the DM’s execution of operations on a multiversion database, and single version (IV) histories that represent the interpretation of MV histories in the users’ single version view of the database. Serial 1V histories are the histories that the user regards as correct. 不过实际上系统是多版本的(只是看上去和单版本的一样).所以为了证明这个并发控制算法是正确的,我们必须证明多版本的历史的约束和单版本的是等价的.那么多版本历史单版本历史等价是什么意思?(也就是多版本历史和单版本历史等价的动作语义是什么) But the system actually produces MV histories. So, to prove that a concurrency control aIgorithm is correct, we must prove that each of the MV histories that it can produce is equivalent to a serial 1V history, What does it mean for an MV history to be equivalent to a 1V history? 我们通过拓展单版本历史之间的等价来描述多版本历史单版本历史等价.为了做这个扩展,我们需要引入一些符号. Let’s try to answer this by extending the definition of equivalence of 1V histories that we used in Chapters 2-4. To attempt this extension, we need a little notation. 对于每个数据实例x,我们用xi,xj...来表示x的版本,下标是写这个版本的事务的编号(也就是xi就是表示事务i写了一个版本x), 因此对于多版本历史,永远都是这个下标Wi[Xi],版本的下标和和事务的下标一样.多版本历史的读操作则没有那么特殊,举个例子ri[xj]. For each data item X, we denote the versions of x by xi, xj, . . . , where the subscript is the index of the transaction that wrote the version. Thus, each Write in an MV history is always of the form Wi[Xi], where the version subscript equals the transaction subscript. Reads are denoted in the usual way, such as ri[xj]. 假如我们说多版本历史单版本历史等价的(equivalence)的定义是:如果多版本的每个操作的重复和单版本的冲突都一样. 考虑多版本历史

H1 = w0[x0]c0w1[x1]c1r2[x0]w2[y2]c2

H1这个历史里面,只有w0[x0]r2[x0]是冲突的,写操作w1[x1]w0[x0]以及r2[x0]不冲突,因为他们操作的是不同版本的数据,即x1.现在我们来考虑单版本历史:

H2 = w0[x]c0w1[x]c1r2[x]w2[y]c2 

我们通过去掉操作的版本的下标(也就是去掉版本号)来构造历史H2,比如x1映射成x,x0也映射成x,y2映射成y.这种情况下(H2的单版本历史),w0[x]r2[x0]是冲突的.按照定义(如果他们冲突一样)则他们等价,但是这其实是不合理的(虽然他们都冲突).在历史H2,T2T1x.但是在历史H2,T1读的是T0(也就是说冲突是一样但是读的数据来源是不一样的).因为这两个历史(H1和H2)读的来源不一致,所以他们最后写的操作也不一致 Suppose we adopt a definition of equivalence that says an MV history HM” is equivalent to a 1V history HIV if every pair of conflicting operations in Hbp, is in the same order in HIV. Consider the MV history H, = wobol co WEA cl rz[xol w,[yzl cz. The only two operations in H, that conflict are w,[x,] and r,[x,]. The operation w,[x,] does not conflict with either w,[x,] or r,[x,], because it operates on a different We constructed H, by mapping each operation on versions x0, x,, and yz in H, into the same operation on the corresponding data items x and y. Notice that the two operations in H, that conflict, w,[x,] and r,[x,], are in the same order in H, as in H,. So, according to the definition of equivalence just given, H, is equivalent to H,. But this is not reasonable. In H,, T, reads x from T,, whereas in H,, T, reads x from T,,.’ Since T2 reads a different value of x in H, and H,, it may write a different value in y. 我们的对于(单版本和多版本之间)的冲突之所以有一点问题,是因为多版本历史单版本历史操作的是不同的对象: 一个是对版本的操作,一个是对数据的操作(可以类比一个是"打11岁的你"和"打你"是不同语义的).他们的操作是有不同的冲突属性. 举个例子,多版本情况下w1[x1]r2[x0]不冲突,(相对应的版本历史.怎么对应?当然是把下标去掉)单版本情况下w1[x]r2[x]是冲突的. 因此,如果仅仅通过冲突来定义他们是等价的,这是不精确的. This definition of equivalence based on conflicts runs into trouble because MVand 1V histories have slightly different operations - version operations versus data item operations. These operations have different conflict properties. For example, w,[x,] does not conflict with yz[xo], but their corresponding 1V operations w,[x] and TJX] do conflict. Therefore, a definition of equivalence based on conflicts is inappropriate. 因此,为了解决这个问题(读的来源不一样的问题),我们需要回到2.6节的视图等价. 回想一下,如果两个历史的读的源都一样而且最后写也一样则他们视图等价. 比较历史H1H2.在H1(多版本历史)中,T2从T1读x,但是对于H2(单版本历史),T2从T0读x,因此H1和H2视图不等价 To solve this problem, we need to return to first principles by adopting the more fundamental definition of view equivalence developed in Section 2.6. Recall that two histories are view equivalent if they have the same reads-from relationships and the same final writes. Comparing histories H, and H,, we see that T, reads x from T, in H,, but T, reads x from T, in H,. Thus, H, is not view equivalent to H2. 然后我们获得满足条件等价(单版本和多版本之间的等价,定义是视图等价,那么两者是等价的)的定义. 我们需要通过一个对单版本历史与多版本历史之间等价的方式. 其中一个方式是SG(H)是无环的,所以历史H是等价于序列化多版本历史,但是仅仅是这样是没有太大帮助.因为(无环)序列化历史不是等价于序列化单版本历史. 举个例子: H3 = w0[x0]w0[y0]c0R// todo 如果我们把版本的内容作为单独的数据实例,就会构造一下的序列化图,虽然这个序列化图是无环的,H3还是和单版本的不等价,因为他们映射的读来源不一样

Now that we have a satisfactory definition of equivalence, we need a way of showing that every MV history H produced by a given multiversion concurrency control algorithm is equivalent to a serial 1V history. One way would be to show that SG( H) is acyclic, so H is equivalent to a serial MV history, Unfortunately, this doesn’t help much, because not every serial MV history is equivalent to a serial 1V history. For example, consider the serial MV history 仅仅是多版本历史的子集,也就是l-serial MV histories才会等价于序列化单版本历史,如果所有的read-from关系,要么读自己的事务,要么读最新的事务,这样就说l-serial MV histories, 这样的serial MV histories是可以于单版本的历史等价 Only a subset of serial MV histories, called l-serial MV histories, are equivalent to serial 1V histories. Intuitively, a serial MV history is I-serial if for each reads-from relationship, say T, reads x from T,, T, is the last trdnsaction preceding T, that writes any version of x. Notice that Ii, is not l-serial because TL reads x from T,), not T,, which is the last transaction preceding T2 that writes x.

mv2pl > 1sr > 5.3 Let H be an MV history over ?: C(H) is equivalent to a serial, 1V history over T iff H is 1SR.

我整理的mvcc原理

1SR 定义

An MV history is one-copy serializable (or 1SR) if its committed projection is equivalent to a l-serial MV history.

1-series MV history:

每个事务读先于他的事务里面写版本最大的 1 多版本事务Hi 和事务Hj,如果Hi从事务Hj读x,那么满足约束: Hj的x的版本是先于Hi的所有事务里面写x的版本最新的一个

  • 一个小问题:一个事务先于另外一个事务,这是个偏序关系,这里有个疑问,怎么定义这个先于

MV History Equivalence:

两个历史的事务中,读的来源都一样则两个多版本事务视图等价

mv2pl imply mvsg mvsg imply 1sr 1sr imply 1-serial

相关阅读