[Bernstein09] 2.5. Shared State

Stefen 2010-06-24

展开全文

2.5. Shared State

There are many situations in which components of a TP system need to share state information about users, activities, and the components themselves. Some examples of state information are the following:

Transaction—the transaction ID of the programs executing a transaction
Users—a user’s authenticated identity or the address of the user’s device
Activities—the identity or contents of the last message that one component sent to another, or temporary information shared between a client and the system, such as the contents of a shopping cart
Components—the identity of transaction managers that need to participate in committing a transaction, or the identity of processes that can handle a certain kind of request

The rest of this section explores these kinds of state and mechanisms to share it.

The kind of shared state we are interested in here is usually short-lived. That is, it is a state that can be discarded after a few seconds, minutes, or hours, though in some cases it may be much longer than that. Often it is information that describes a current activity of limited duration, such as a transaction or a shopping session. It is usually shared mostly for convenience or performance, to avoid having to send it repeatedly when components communicate. If this shared state is lost due to a failure, it can be reconstructed in the same way it was created in the first place—a nuisance and an expense, but not a catastrophe.

Of course, a TP system also needs to manage long-lived, permanent state. Examples of such state are databases that contain information about accounts, loans, and customers in a bank; or information about products, warehouses, and shipments in a retail business. In a sense, this information describes the state of the enterprise. This is the information that transactions are keeping track of. Unlike the short-lived state, it must not be lost in the event of a failure. This kind of long-lived state information is a very important part of TP systems, but it is not the kind of state that is the subject of this section.

Transaction Context

Earlier in this chapter, we saw that each transaction has a transaction ID, and that each program that executes a transaction has context information that includes its transaction ID. Thus, the transaction ID is state shared by the programs executing a transaction.

There are two design issues for any kind of shared state: how to establish the shared state and how to stop sharing and release the state. For transaction IDs, the first issue is addressed by native transactional RPC and WS-Transactions for SOAP. They propagate transaction context from caller to callee, to ensure that all programs executing the transaction have the same transaction context.

The second issue is addressed in different ways, depending on whether the program is a resource manager that needs to participate in two-phase commit. If so, then it retains the transaction context until after it processes the Commit operation in the second phase of two-phase commit. If not, and if it does not need to retain transaction state across calls, then it can release its transaction state when it returns from the transactional RPC that called it. If it does need to retain transaction state across calls, then it retains the transaction state until some later time, determined either by two-phase commit or by the program itself.

For example, in .NET, a program can release its transaction context by calling SetComplete or SetAbort before returning from a call. As we explained earlier, these operations tell the system that the transaction may or may not be committed (respectively) insofar as the caller is concerned. To retain the transaction context, the program calls EnableCommit or DisableCommit. These operations tell the system that the transaction may or may not be committed (respectively) insofar as the caller is concerned, but unlike SetComplete and SetAbort, they do not release the transaction context. These two situations—releasing or retaining transaction context—are special cases of stateless and stateful servers, which are discussed in more detail later in this section.

In Java EE, context is managed using a context object that is created when the transaction is started. The Java APIs to release context are javax.transaction.UserTransaction.commit and rollback—there’s no equivalent for SetComplete but for SetAbort the Java extensions (Javax) API provides setRollbackOnly.

Sessions

A communication session is a lasting connection between two system components, typically two processes, that want to share state. The main reason to establish a session is to avoid having the components send the shared state information in each message. This saves not only the transmission cost, but also the sender’s cost of obtaining the state information when composing the message and the receiver’s cost of validating and saving the state information when receiving the message. The following are some examples of state information that might be shared by a session:

The network address of both components, so they do not need to incur costly address lookups every time they send a message to each other
Access control information, so each party knows that the other one is authorized to be sending it messages, thereby avoiding some security checks on each message
A cryptographic key, so the components can encrypt information that they exchange in later messages
The identity of the last message each component sent and received, so they can resynchronize in case a message is not delivered correctly
The transaction ID of the transaction that both components are currently executing

A session is created between two components by exchanging messages that contain the state to be shared. For example, a component C₁ can send a message R EQUEST -S ESSION (id, x) to component C₂, which asks it to become a party to a new session that is identified by id and whose initial state is x. C₂ replies with a message A CCEPT -S ESSION (id), which tells C₁ that C₂ received the R EQUEST -S ESSION message, agrees to be a party to the session, and has retained the initial session state x. Usually, this is enough to establish the session. However, sometimes C₂ needs to be sure that C₁ received its A CCEPT -S ESSION message before it sends C₁ another message. In that case it should require that C₁ acknowledge C₂’s A CCEPT -S ESSION message by sending a message C ONFIRM -S ESSION (id). In the latter case, the protocol to establish the session is called a three-way handshake (see Figure 2.14).

Figure 2.14. Three-Way Handshake to Create a Session. Component C₁ initiates the protocol by requesting to establish a session. C₂ agrees to be a party to the session. Finally, C₁ acknowledges receipt of that agreement.

Sometimes a session is established as a side-effect of another message. For example, it might be a side-effect of the first RPC call from a client to a server, and it stays around until it times out.

Each component that is involved in a session needs to allocate some memory that holds the shared state associated with the session. This is usually a modest cost per session. However, the memory cost can be significant if a component is communicating with a large number of other components, such as server with sessions to a million clients over the Internet. This is one good reason why HTTP is not a session-oriented protocol.

Most sessions are transient. This means that if one of the components that is involved in a session fails, then the session disappears. Continuing with our example, suppose component C₂ fails and loses the contents of its main memory. Then it loses the state information that comprises the session. The other component C₁ involved in the session may still be operating normally, but it will eventually time out waiting for a message from C₂, at which point it discards the session. If C₂ recovers quickly, before C₁ times out, then C₂ might reply to C₁’s attempt to re-establish contact. However, since C₂ lost the session due to its failure, it no longer has the shared state of the session when it recovers. Therefore, it should reply to C₁’s message with a negative acknowledgment, thereby telling C₁ to discard the session. If C₁ and C₂ want to re-establish their session after C₂ has recovered, then they have to recreate the session from scratch.

If C₂ had sessions with only a few other components at the time it failed, then re-establishing the sessions does not cost very much. However, if it had a large number of sessions at the time it failed, then re-establishing them all at recovery time can be very time-consuming. During that time, C₂ is still unavailable. If one of the components with which C₂ is re-establishing a session is slow to respond to the R EQUEST -S ESSION or, even worse, is unavailable, then C₂’s availability may be seriously degraded waiting for that session to be established.

A given pair of components may have more than one session between them. For example, they may have a transport session for the network connection, a session for the application state, and a session for end user information. Although in principle these sessions could be bundled into a single session between the components, in practice they are usually maintained independently, because they have different characteristics. For example, they may be established in different ways, use different recovery strategies, and have different lifetimes.

To summarize, the benefit of using sessions is to avoid resending and reprocessing the same information over and over again in every message exchange between a pair of components. The costs are the time to establish the session and to recover it after a failure, which in turn negatively affects availability.

One common use of sessions in TP is to connect an application component to a database system. The session state typically includes a database name, an authenticated user ID, and the transaction ID of the current transaction being executed by the application component. When the application component creates the session via R EQUEST -S ESSION, it includes the user ID and password as parameters. They are validated by the database system before it replies with A CCEPT -S ESSION. The database system executes all the operations it receives from the application component on behalf of the session’s user. Thus, operations only succeed if the session’s user has privileges for them. All the operations execute within the session’s transaction. After the application commits the transaction, the session either automatically starts a new transaction (i.e., if it uses the chained transaction model) or it no longer is operating in the context of a transaction (i.e., if it uses the unchained transaction model).

Another common use of sessions in TP is to connect transaction managers that participate in the two-phase commit protocol for a given transaction. The protocol for establishing sessions between these participants is a major part of a two-phase commit implementation and is discussed in Chapter 8.

Stateless Servers

Consider a session between a client process and a server process, where the client calls the server using RPC in the context of the session, so both the client and server can use the session’s shared state. There are three problems that arise in this arrangement:

The session ties the client to a particular server process. In a distributed system with multiple server processes that are running the same application, it is desirable for a given client to be able to send different requests to different server processes; for example, to use the most lightly loaded one. However, if the client is relying on the server to retain state information about their past interactions, then it does not have the freedom to send different requests to different servers. All its requests have to go to the same server, namely, the one that is keeping track of their shared state.
If the server fails, then the session is lost. Since the client was depending on the server to remember the state of the session, the server needs to rebuild that state after it recovers. The server can do this either by having the client resend that state or by recovering the state from persistent storage, which in turn requires that the server saved the state in persistent storage before it failed.
If the server is servicing requests from a large number of clients, then it costs a lot of memory for it to retain a shared state. Moreover, the problem of rebuilding sessions after a failure becomes more acute.

For these three reasons, it is sometimes recommended that server processes be stateless. That is, there is no session between the client and server processes, and the server retains no application state after it services and replies to a client’s request. Thus, it processes each request message from a clean state. Let us reconsider the preceding three problems for stateless server processes. First, if there are multiple server processes running the same application, then successive calls from a client can go to any of the server processes since none of them retain any state from the client’s previous calls. Second, if a stateless server process fails, then it has no application state that it needs to recover. And third, a stateless server process does not incur the memory cost of retaining shared state.

The recommendation that servers be stateless applies mainly to communication between middle-tier servers and front-end processes associated with an end-user (i.e., clients), such as a browser or other presentation manager on a desktop device. This is a case where these three problems are likely to appear: (1) a client may want to send different requests to different servers, depending on server load; (2) re-establishing client-server sessions may be problematic, because clients can shut down unexpectedly for long periods and because a server would need a large number of these sessions since there is typically a large number of clients; and (3) the server would need to dedicate a lot of memory to retain shared state.

By contrast, this recommendation usually does not apply to communication between a middle-tier server and a back-end server, which are often database systems. As mentioned earlier, there usually are sessions between a middle-tier server and each back-end database system it invokes. Therefore, the back-end server is stateful with respect to the middle-tier servers that call it. Thus, the preceding three problems need to be addressed. We will discuss solutions in the next section.

It may sound a little strange to hear about stateless middle-tier server processes, because of course a TP application needs to store a lot of application state in databases. The point is that this database state is the only state that the stateless server process depends on. The server process itself does not retain state. Thus, if the server fails and subsequently recovers, it doesn’t need to rebuild its internal state, because all the state that it needs is ready and waiting in the databases it can access.

A well-known example of a stateless middle-tier process is the use of a web server for HTTP requests for static web pages. All the state needed by the web server is stored in files. After servicing a request, a web server does not need to retain any state about the request or response. Since such web servers are stateless, if there are multiple web server processes, then each request can be serviced by a different web server. And if a web server fails and is then restarted, it has no state that needs to be recovered.

Stateful Applications

Having just explored reasons why stateless applications are beneficial, let us now examine cases where a middle-tier application needs to retain state information across multiple front-end requests. Here are four examples:

A user request requires the execution of several transactions, and the output of one transaction may need to be retained as input to the next.
A middle-tier server wants to retain information about a user’s past interactions, which it will use for customizing the information it displays on later interactions.
A front end establishes a secure connection with a server using authentication information, which requires it to cache a token.
A user wants to accumulate a shopping cart full of merchandise before actually making the purchase.

In each of these scenarios, the state that is retained across client requests has to be stored somewhere. There are several places to put it, such as the following:

Save it in persistent storage, such as a database system. The operation that stores the state should be part of the transaction that produces the state, so that the state is retained if and only if the transaction that produces it commits.
Save it in shared persistent storage, but not within a transaction.
Store it in volatile memory or in a database that is local to one server process. This makes the server stateful. Whether or not there is a communication session, future requests from the same client need to be processed by the server that has the shared state.
Return it to the caller that requested the transaction execution. It is then the caller’s responsibility to save the state and pass it back to the server on its next invocation of that server.

Wherever the state is stored, it must be labeled with the identity of the client and/or server, so that both client and server can find the state when they need it.

Let us explore these ways of managing state and client-server identities in examples (1) to (4) in the previous list. The first scenario is a business process, that is, a user request that requires the execution of multiple transactions. A variety of state information is accumulated during a business process execution. This state includes a list of the business process steps whose transactions have committed and those that have yet to be executed. It may also include results that were returned by the transactions that committed, since these results may be needed to construct input to other transactions in the business process (see Figure 2.15 ). For example, if a travel reservation executes as a business process, then the arrival time of the flight that is returned by the flight reservation transaction may be needed to construct the input to a car rental reservation transaction, since that input requires a pick-up time. This information also needs to be saved so it can be returned to the client when the business process has finished executing.

Figure 2.15. Retaining State in a Business Process. Each transaction in a business process saves the process state for use by the next transaction in the sequence.

Like any transaction, each transaction that executes as part of a business process should execute at most once. Therefore, the business process state must be maintained in persistent storage. If it were stored in volatile memory instead of persistent storage, and the contents of that memory were lost due to a failure, then it could not be reconstructed by executing the business process’ transactions again (because transactions should execute at most once). For the same reason, the state must be updated by each transaction that executes as part of the business process. Suppose the application is written so that the result of the transaction is stored in the business process state after the transaction committed. If a failure occurs between the time the transaction commits and the time its results are supposed to be written to the business process state, then those results would be lost.

In scenario (2) the server keeps track of a user’s interactions over a long period of time. For example, it may remember all the user’s past orders and past window-shopping. It may use this information to suggest new products that are likely to be of interest based on that past behavior. In this case, the shared state needs to be identified by a long-lived name. The user’s e-mail address commonly is used for this purpose. But in some cases it might not be good enough, since the user may access the server both from home and the office, and may switch e-mail providers from time to time. The user’s full name and address might be better, although this too has problems due to variations in spelling and typos. Thus, depending on the requirements, selecting and using long-lived names can be a nontrivial design problem.

In scenario (3) a client browser establishes a secure connection with a server by exchanging authentication information. The connection establishes trust between the client and server so that the authentication information does not have to be passed on each subsequent call. The server caches the authentication token and identifies it with the connection to the browser. This is handy because then the user does not have to log in again and can submit multiple requests during the same session to the same resource. Since the connection is established as secure, the user’s credentials do not have to be presented on each request.

Scenario (4) concerns creating and maintaining a shopping cart. Each item that a user selects to buy is put into the user’s shopping cart. Since a user may be shopping for awhile, the shopping cart may be stored in a database or other persistent storage, to avoid the expense of using main memory for information that is infrequently accessed. This need not be written in the context of a transaction. However, the shopping cart is not the permanent state. The server system retains the shopping cart until either the user checks out and purchases the items in the cart, or until a time-out has occurred after which the server disposes of the shopping cart. The shopping cart is the shared state between the user and the system. So is the user ID that the system needs to know in order to find the user’s shopping cart while processing each of the user’s operations.

What user ID should be associated with the shopping cart? If the server is stateful, the session ID can be used to identify the user and hence the shopping cart. If the session goes away before the customer purchases the contents of the shopping cart, then the shopping cart can be deleted. If the server is stateless, and the user has not identified herself to the server, then the system must generate a user ID. Since the server is stateless, that user ID must accompany every call by that user to the server. One way to do this is to ensure that all calls from the client to the server, and all return messages, include the server-generated user ID. Since this is rather inconvenient, a different mechanism has been adopted for web browsers, called cookies.

A cookie is a small amount of information sent by a server to a web browser that the web browser then stores persistently and returns to the same server on subsequent calls. For example, when an anonymous user places his or her first item in a shopping cart, the server that performs the action could generate a user ID for that user and return it in a cookie. The user’s subsequent requests to that server would contain the cookie and therefore would tell the server which shopping cart is relevant to those subsequent requests. Thus, the cookie is the shared state between the web browser and the server.

A cookie has a name, domain, and path, which together identify the cookie. It also has a value, which is the content of the cookie, such as a server-generated user ID for the shopping cart. For privacy reasons, the browser should send the cookie with HTTP requests only to the cookie’s domain (e.g., books.elsevier.com). Since cookies are easily sniffed, they are also usually encrypted. Each cookie also has an expiration date, after which the browser should dispose of the cookie.

Cookies are sometimes not available, for example, because a user disabled them in the browser. In this case, the server can use a different technique, called URL rewriting. Before the server sends an HTML page back to the browser, it rewrites all the URLs on the page to include the user’s session ID. For example, it could append “;jsessionid=1234” to every URL on the page. That way, any action that the user takes on that page causes the session ID to be sent back to the server.

URL rewriting is less secure than an encrypted cookie, since it can be seen by others. Moreover, an unsuspecting user might copy the rewritten URL into an e-mail to send to a friend, who might thereby have access to the sender’s private session information.

In summary, maintaining the state across multiple requests requires a fair bit of design effort to choose where and how the state is identified and maintained. For this reason, it is worthwhile to design an application to limit the use of shared state whenever possible.