2.2. Transactions
It is up to the application programmer to bracket the set of operations that should be executed as part of the same transaction. This section focuses on the semantics that is implied by the transaction brackets. For the most part, we use pseudocode to express this bracketing explicitly, because it is easy to understand and exposes the semantic issues that are at stake. Other styles of programming are described later in this section. Product-specific programming models and APIs for transaction bracketing are presented in Chapter 10. System software components that support TP applications are discussed in Chapter 3. Transaction BracketingTransaction bracketing offers the application programmer commands to Start, Commit, and Abort a transaction. These are expressed explicitly in some programming models and implicitly in others, but in either case these are the commands whose execution begins and terminates a transaction. The commands to bracket a transaction are used to identify which operations execute in the scope of a transaction. The Start command creates a new transaction. After an application invokes a Start command, all of its operations execute within that transaction until the application invokes Commit or Abort. In particular, if the application calls a procedure, that procedure ordinarily executes within the same transaction as its caller. After invoking Commit or Abort, the application is no longer executing a transaction until it invokes Start again. Sometimes, a procedure is designed to be executed either as an independent transaction or as a step within a larger transaction. For example, consider the following two procedures:
Each of these procedures could execute as an independent ACID transaction. In that case, you would expect to see Start at the beginning of the body of each of the procedures and Commit at the end. Example procedures for DebitChecking and PayLoan are shown in Figure 2.1. Figure 2.1. Explicit Transaction Brackets. The DebitChecking and PayLoan procedures explicitly bracket their transactions with a Start command and a Commit or Abort command.
As long as a transaction executes a single procedure, it is quite straightforward to bracket the transaction using Start, Commit, and Abort. Things get more complicated if a procedure that is running a transaction calls another procedure to do part of the work of the transaction. For example, suppose there is a procedure PayLoan FromChecking(acct, loan, amt) that calls the DebitChecking and PayLoan procedures to withdraw money from a checking account to pay back part of a loan, as shown in Figure 2.2. Figure 2.2. A Composite Transaction. The transaction PayLoanFromChecking is written by composing the DebitChecking and PayLoan procedures.
We would like the PayLoanFromChecking procedure to execute as an ACID transaction. We therefore bracket the body of the procedure with calls to Start and Commit. This PayLoanFromChecking transaction includes its calls to the DebitChecking and PayLoan procedures. However, there is a potential problem with this, namely, that DebitChecking and PayLoan also invoke the Start and Commit commands. Thus, as they’re currently written, DebitChecking and PayLoan would execute separate transactions that commit independently of PayLoanFromChecking, which is not what we want. That is, we cannot compose DebitChecking and PayLoan into a larger transaction. We call this the transaction composability problem. One solution is to have the system ignore invocations of the Start command when it is executed by a program that is already running within a transaction. In this approach, when the PayLoanFromChecking procedure calls the DebitChecking procedure, the Start command in the DebitChecking procedure in Figure 2.1 would not cause a new transaction to be created. However, the system cannot completely ignore this second Start command. It must remember that this second Start command was invoked, so it will know that it should ignore the execution of the corresponding Commit command in DebitChecking. That is, the Commit command in Figure 2.1 should not commit the “outer” transaction created by PayLoanFromChecking. More generally, the system maintains a start-count for each executing application, which is initialized to zero. Each execution of the Start command increments the start-count and each Commit decrements it. Only the last Commit, which decrements the count back to zero, causes the transaction to commit. What if the DebitChecking procedure issues the Abort command? One possible interpretation is that if an inner procedure calls Abort, then the transaction that the procedure is executing really does need to abort. Thus, unlike the Commit command, the Abort command in the DebitChecking procedure causes an abort of the outer transaction created by PayLoanFromChecking. In some systems, it is simply an error for a procedure that has executed a second Start command to subsequently invoke an Abort command. In others, the invocation of Abort is ignored. Another possible interpretation is that it is an attempt to abort only the work that was performed since the last Start command executed. This semantics is discussed in the later subsection, Nested Transactions. Another solution to the transaction composability problem is to remove the Start and Commit commands from DebitChecking and PayLoan, so they can be invoked within the transaction bracketed by the PayLoanFromChecking procedure. Using this approach, the DebitChecking procedure would be replaced by the one in Figure 2.3. To enable DebitChecking to execute as an independent transaction, one can write a “wrapper” procedure CallDebitChecking that includes the transaction brackets, also shown in Figure 2.3. This approach avoids the need to rewrite application code when existing procedures are composed in new ways. Another programming model that realizes this benefit is described in a later subsection entitled, Transaction Bracketing in Object-Oriented Programming. Figure 2.3. Enabling Composability. The Start, Commit, and Abort commands are removed in this revised version of the DebitChecking procedure (Figure 2.1), so it can be invoked in a larger transaction, such as PayLoanFromChecking (Figure 2.2). A wrapper procedure CallDebitChecking is added, which includes the transaction brackets needed to execute DebitChecking as an independent transaction.
The impact of the transaction composability problem is something that needs to be evaluated and understood in the context of whichever programming model or models you are using. Transaction IdentifiersAs we explained in Chapter 1, each transaction has a unique transaction identifier (transaction ID), which is assigned when the transaction is started. The transaction ID is assigned by whichever component is responsible for creating the transaction in response to the Start command. That component could be a transaction manager (see Section 1.4) or a transactional resource manager such as a database system, file system, or queue manager. There are two major types of transaction IDs: global and local. The transaction manager assigns a global ID, which is needed when more than one transactional resource participates in a transaction. If the transactional resource managers also assign transaction IDs, then these are local IDs that are correlated with the global transaction ID since they all refer to the same transaction. Whenever a transaction accesses a transactional resource, it needs to supply its transaction ID, to tell the resource’s manager on which transaction’s behalf the access is being made. The resource manager needs this information to enforce the ACID properties. In particular, it needs it for write accesses, so that it knows which write operations to permanently install or undo when the transaction commits or aborts. When an application program invokes Commit or Abort, it needs to pass the transaction ID as a parameter. This tells the transaction manager which transaction it is supposed to commit or abort. Since the application needs to supply its transaction ID to resource managers and the transaction manager, it needs to manage its transaction ID. It could do this explicitly. That is, the Start operation could return a transaction ID explicitly to the application, and the application could pass that transaction ID to every resource it accesses. Most systems hide this complexity from the application programmer. Instead of returning the transaction ID to the program P that invokes Start, the system typically makes the transaction ID part of a hidden context, which is data that is associated with P but is manipulated only by the system, not by P. In particular, using the context the system transparently attaches the transaction ID to all database operations and Commit and Abort operations. This is more convenient for application programmers—it’s one less piece of bookkeeping for them to deal with. It also avoids errors, because if the application passes the wrong transaction identifier, the system could malfunction. Typically, the hidden context is associated with a thread, which is a sequential flow of control through a program. A thread can have only one transaction ID in its context, so there is no ambiguity about which transaction should be associated with each database operation and Commit or Abort. Threads are discussed in detail in the next section. Notice that there are no transaction IDs in Figure 2.1 through Figure 2.3. The transaction ID is simply part of the hidden program context. Throughout this chapter, we will assume that transaction IDs are hidden in this way, although as we will see some programming models allow access to this transaction context. Chained TransactionsIn some programming models, an application is assumed to be always executing within a transaction, so there is no need for the developer to start a transaction explicitly. Instead, an application simply specifies the boundary between each pair of transactions. This “boundary operation” commits one transaction and immediately starts another transaction, thereby ensuring that the program is always executing a transaction. In IBM’s CICS product, the verb called syncpoint works in this way. Microsoft SQL Server offers an implicit transaction mode that works this way too. This programming style is called chained transactions, because the sequence of transactions executed by a program forms a chain, one transaction after the next, with no gaps in between. The alternative is an unchained model, where after a program finishes one transaction, it need not start the execution of another transaction right away. For example, this can be done using the Start and Commit commands for explicit transaction bracketing. Most of today’s programming models use the unchained model, requiring that the developer explicitly defines the start of each new transaction. On the face of it, the unchained model sounds more flexible, since there may be times when you would want an application to do work outside of a transaction. However, in fact there is really very little purpose in it. The only benefit is in systems where a transaction has significant overhead even if it doesn’t access recoverable data. In that case, the unchained model avoids this overhead. On the other hand, the unchained model has two significant disadvantages. First, if the code that executes outside a transaction updates any transactional resources, then each of those updates in effect executes as a separate transaction. This is usually more expensive than grouping sets of updates into a single transaction. That is, it is sometimes important to group together updates into a single transaction for performance reasons. Second, the unchained model gives the programmer an opportunity to break the consistency property of transactions by accidentally executing a set of updates outside of a transaction. For these reasons, the chained model usually is considered preferable to the unchained model. Transaction Bracketing in Object-Oriented ProgrammingWith the advent of object-oriented programming for TP applications, a richer style of chained transaction model has become popular. In this approach each method is tagged with a transaction attribute that indicates its transactional behavior, thereby avoiding explicit transaction bracketing in the application code itself. The transaction attribute can have one of the following values:
This style of programming was introduced in the mid-1990s in Microsoft Transaction Server, which evolved later into COM+ in Microsoft’s .NET Enterprise Services. In that system, a transaction attribute is attached to a component, which is a set of classes, and applies to all classes in the component. In its intended usage, the caller creates an object of the class (rather than calling a method of an existing object), at which time the transaction attribute is interpreted to decide whether it is part of the caller’s transaction, is part of a new transaction, is not part of any transaction, or throws an exception. The called object is destroyed when the transaction ends. The concept of transaction attribute was adopted and extended by OMG’s CORBA standard and Enterprise Java Beans (EJB, now part of Java Enterprise Edition (Java EE)). It is now widely used in transactional middleware products, as well as in Web Services. In EJB, the attributes tag each method and apply per method call, not just when the called object is created. A class can be tagged with a transaction attribute, in which case it applies to all untagged methods. EJB also adds attributes to cover some other transaction options, in particular, Mandatory, where the called method runs in the caller’s transaction if it exists and otherwise throws an exception. Microsoft introduced per-method transaction attributes in Windows Communication Foundation in .NET 3.0. It uses separate attributes to specify whether the method executes as a transaction and whether the caller’s transaction context propagates to the called method (i.e., the difference between Required and Requires New). Let us call a method invocation top-level if it caused a new transaction to be started. That is, it is top-level if it is tagged with Requires New or is tagged with Required and its caller was not executing in a transaction. Generally speaking, a transaction commits when its top-level method terminates without an error. If it throws an exception during its execution, then its transaction aborts. A top-level method can call other methods whose transaction attribute is Required, Mandatory, or Supported. This submethod executes in the same transaction as the top-level method. If the submethod terminates without error, the top-level method can assume that it is fine to commit the transaction. However, the top-level method is not obligated to commit, for example, if it encounters an error later in the execution of another submethod. In some execution models, a submethod can continue to execute after announcing that the transaction can be committed as far as it is concerned. If the submethod throws an exception, then the top-level method must abort the transaction. In some execution models, the exception immediately causes the transaction to abort, as if the submethod had issued the Abort command. In other models, it is left to the top-level method to cause the abort to happen. Instead of having a method automatically vote to commit or abort depending on whether it terminates normally or throws an exception, an option is available to give the developer more explicit control. For example, in the .NET Framework, a program can do this by calling SetComplete and SetAbort. Java EE is similar, offering the setRollbackOnly command for a subobject to tell the top-level object to abort. The approach of using transaction attributes is declarative in that the attributes are attached to interface definitions or method implementations. Microsoft’s .NET framework also offers a runtime layer, exposed through the class TransactionScope, that allows a program to invoke the functionality of the transaction bracketing attributes shown previously. A program defines a transaction bracket by creating a TransactionScope object with one of the following options:
In the case of Requires New and Suppress, if the program was running within a transaction T when it created the new transaction scope S, then T remains alive but has no activity until S exits. Additional details of these approaches to transaction bracketing appear in Section 10.3 for .NET and Section 10.4 for Java EE. Nested TransactionsThe nested transaction programming model addresses the transaction composability problem by capturing the program-subprogram structure of an application within the transaction structure itself. In nested transactions, each transaction can have subtransactions. For example, the PayLoanFromChecking transaction can have two subtransactions DebitChecking and PayLoan. Like ordinary “flat” (i.e., non-nested) transactions, subtransactions are bracketed by the Start, Commit, and Abort operations. In fact, the programs of Figure 2.1 and Figure 2.2 could be a nested transaction. What is different about nested transactions is not the bracketing operations—it’s their semantics. They behave as follows:
Consider the properties of subtransactions relative to the ACID properties. Rule (4) means that a subtransaction is atomic (i.e., all-or-nothing) relative to other subtransactions of the same parent. Rule (5) means that a subtransaction is isolated relative to other transactions and subtransactions. However, a subtransaction is not durable. Rule (6) implies that its results become visible once it commits, but by rule (3) the results become permanent only when the top-level transaction that contains it commits. The nested transaction model provides a nice solution to the transaction composability problem. In our example, DebitChecking and PayLoan in Figure 2.1 can execute as subtransactions within a top-level transaction executed by PayLoanFromChecking or as independent top-level transactions, without writing an artificial wrapper transaction like CallDebitChecking in Figure 2.3. Although nested transactions are appealing from an application programming perspective, they are not supported in many commercial products. Exception HandlingAn application program that brackets a transaction must say what to do if the transaction fails and therefore aborts. For example, suppose the program divides by zero, or one of the underlying database systems deadlocks and aborts the transaction. The result would be an unsolicited abort—one that the application did not cause directly by calling the Abort command. Alternatively, the whole computer system could go down. For example, the operating system might crash, in which case all the transactions that were running at the time of the crash are affected. Thus, an application program that brackets a transaction must provide error handling for two types of exceptions—transaction failures and system failures. For each type of exception, the application should specify an exception handler, which is a program that executes after the system recovers from the error. To write an exception handler, a programmer needs to know exactly what state information is available to the exception handler; that is, the reason for the error and what state was lost due to the error. Two other issues are how the exception handler is called and whether it is running in a transaction. Information about the cause of the abort should be available to the exception handler, usually as a status variable that the exception handler can read. If the abort was caused by the execution of a program statement, then the program needs to know both the exception that caused the statement to malfunction and the reason for the abort—they might not be the same. For example, it’s possible that there was some error in the assignment statement due to an overflow in some variable, but the real reason for the abort was an unavailable database system. The exception handler must be able to tell the difference between these two kinds of exceptions. When a transaction aborts, all the transactional resources it accessed are restored to the state they had before the transaction started. This is what an abort means, undo all the transaction’s effects. Nontransactional resources—such as a local variable in the application program, or a communications message sent to another program—are completely unaffected by the abort. In other words, actions on nontransactional resources are not undone as a result of the abort. It’s generally best if a transaction failure automatically causes the program to branch to an exception handler. Otherwise the application program needs an explicit test, such as an IF-statement, after each and every statement, which checks the status returned by the previous statement and calls the appropriate exception handler in the event of a transaction abort. In the chained model, the exception handler is automatically part of a new transaction, because the previous transaction aborted and, by definition, the chained model is always executing inside of some transaction. In the unchained model, the exception handler is responsible for demarcating a transaction in which the exception handling logic executes. It could execute the handler code outside of a transaction, although as we said earlier this is usually undesirable. If the whole system goes down, all the transactions that were active at the time of the failure abort. Since a system failure causes the contents of main memory to be lost, transactions cannot resume execution when the system recovers. So the recovery procedure for transaction programs needs to apply to the application as a whole, not to individual transactions. The only state that the recovery procedure can rely on is information that was saved in a database or some other stable storage area before the system failed. A popular way to do this is to save request messages on persistent queues. The technology to do this is described in Chapter 4. Some applications execute several transactions in response to a user request. This is called a business process or workflow. If the system fails while a business process is executing, then it may be that some but not all of the transactions involved in the business process committed. In this case, the application’s exception handler may execute compensating transactions for the business process’ transactions that already committed. Business process engines typically include this type of functionality. More details appear in Chapter 5. SavepointsIf a transaction periodically saves its state, then at recovery time the exception handler can restore that state instead of undoing all the transaction’s effects. This idea leads to an abstraction called savepoints. A savepoint is a point in a program where the application saves all its state, generally by issuing a savepoint command. The savepoint command tells the database system and other resource managers to mark this point in their execution, so they can return their resources to this state later, if asked to do so. This is useful for handling exceptions that only require undoing part of the work of the transaction, as in Figure 2.4. Figure 2.4. Using Savepoints. The program saves its state at savepoint “A.” It can restore the state later if there’s an error.
A savepoint can be used to handle broken input requests. Suppose a transaction issues a savepoint immediately after receiving an input request, as in the program Application in Figure 2.5. If the system needs to spontaneously abort the transaction, it need not actually abort, but instead can roll back the transaction to its first savepoint, as in ExceptionHandlerForApplication in Figure 2.5. This undoes all the transaction’s updates to transactional resources, but it leaves the exception handler with the opportunity to generate a diagnostic and then commit the transaction. This is useful if the transaction needs to abort because there was incorrect data in the request. If the whole transaction had aborted, then the get-input-request operation would be undone, which implies that the request will be re-executed. Since the request was incorrect, it is better to generate the diagnostic and commit. Among other things, this avoids having the request re-execute incorrectly over and over, forever. Figure 2.5. Using Savepoints for Broken Requests. The application’s savepoint after getting the request enables its exception handler to generate a diagnostic and then commit. If the transaction were to abort, the get-input-request would be undone, so the broken request would be re-executed.
Unfortunately, in some execution models the exception handler of a transactional application must abort the transaction. In this case, a mechanism outside the transaction needs to recognize that the broken request should not be re-executed time after time. Queuing systems usually offer this function, which is described in Chapter 4. Some database systems support the savepoint feature. Since the SQL standard requires that each SQL operation be atomic, the database system does its own internal savepoint before executing each SQL update operation. That way, if the SQL operation fails, it can return to its state before executing that operation. Since the database system supports savepoints anyway, only modest additional work is needed to have it make savepoints available to applications. In general, savepoints seem like a good idea, especially for transactions that execute for a long time, so that not all their work is lost in the event of a failure. Although it’s available in some systems, it’s a feature that reportedly is not widely used by application programmers. Using Savepoints to Support Nested TransactionsSince a savepoint can be used to undo part of a transaction but not all of it, it has some of the characteristics of nested transactions. In fact, if a transaction executes a sequential program, then it can use savepoints to obtain the behavior of nested transactions if the system adheres to the following rules: This implementation works only if the transaction program is sequential. If it has internal concurrency, then it can have concurrently executing subtransactions, each of which can independently commit or abort. Since a savepoint applies to the state of the top-level transaction, there will not always be a savepoint state that can selectively undo only those updates of one of the concurrently executing subtransactions. Consider the example in Figure 2.6, where functionX starts a transaction and then calls functionY and functionZ concurrently, indicated by the “concurrent block” bracketed by cobegin and coend. Both functionY and functionZ access a resource manager RM_A that supports savepoints. Each of them has transaction brackets and therefore should run as a subtransaction. Since functionY and functionZ are executing concurrently, their operations can be interleaved in any order. Consider the following steps:
Figure 2.6. Savepoints Aren’t Enough for Concurrent Subtransactions. If functionZ commits and functionY aborts, there is no savepoint in RM_A that produces the right state.
According to rule 2 (in the previous list), in step 6 the system should restore the savepoint created on behalf of functionY. However, this will undo the update performed by functionZ, which is incorrect, since functionZ commits in step 5. |
|