LevSelector.com |
Enterprise Application Integration.
Main considerations | home - top of the page - |
In big organizations you don't really see "ALL Java" or "ALL C++" or "ALL PERL" or "ALL MICROSOFT" architecture. It is futile trying to design the system as a single application (compiled together, high-coupled parts). In the long run it never happens this way. Different departments and groups could never agree on one language or platform. What you end up with is a system consisting of heterogenious parts (mainframe, unix, Microsoft, etc.). The challenge is to make them work together. The problem is that these systems use different data formats and protocols internally. They may have different philosophy of how security or session issues are handled, etc.
To make these systems work together, the organization has to define
(to limit the choice of) the methods and formats of communication (cooperation)
between systems.
These methods should feature:
- low coupling/dependency between parts - to allow different
parts of the system to evolve independently at their own
speed and taste.
- being simple and flexible - to be used by ALL current and future
systems.
An approach we see more and more lately is shifting from Distributed
Objects (like CORBA or DCOM) to Distributed
Services (XML/SOAP). This is
the essense of the whole concept of "Web Services" - the
main defining feature of Microsoft's ".NET"
architecture, as well as the competing with it J2EE. Both competing
approaches use XML-based communication.
Why using distributed objects (DO) is not
good?
-- 1. It is a "heavy-weight" technology (needs high qualification to implement and maintain through upgrades). -- 2. Some 3rd party applications don't support it - and it is difficult and expensive to add it. -- 3. Using DO means that you create "high coupling" in your system, that is high level of dependency between parts of the system. While developing and testing, you usually have to recompile both sides of communicating systems. And you have to make sure that you are using the same versions of compiler and distributed objects system. When the next version of CORBA or Java comes out - you can't upgrade just one part of the system - you usually have to upgrade all parts simultaneously - or none. Otherwise they will not be able to communicate. So in practice you wait and don't upgrade systems which need upgrades badly. This is NOT good. |
The basic ideas described below on this page are:
- Low
coupling - use services instead of objects. Allow parts
of the system to evolve independently. Avoid high level of coupling between
parts of the system. Instead of using distributed objects (which usually
requires re-compilation and testing of both sides of communicating systems)
- use distributed services. That is, make small independent services which
can be developed and tested independently. And define how you can request
and receive these services (communication). Also teach those services
a standard way of describing themselves to other services on request.
- Low-coupling - use multiple transport mechanisms, but avoid transport-protocol-specific formats and binary formats. Instead use some simple common format (for example, XML and SOAP). Also use messaging middleware such as MQSeries (further decoupling, to assist with distributed transactions, etc.). - Why you should avoid using EJBs. - instead you can use servlets-daemons in commercial application server - or write your own servers. - Use XML/SOAP. |
Distributed Services vs. Distributed Objects | home - top of the page - |
Low coupling - use services instead of objects. Allow parts of
the system to evolve independently. Avoid high level of coupling between
parts of the system.
Instead of using distributed objects (which usually requires re-compilation
and testing of both sides of communicating systems) - use distributed services.
That is,
make small independent services which can be developed and tested independently.
Two main types of frameworks for distributed computing:
Distributed objects
- CORBA, DCOM and EJB. All distributed objects architectures in their attempt
to provide higher-level services, have become very intrusive and impose
severe architectural restrictions on the application services (high-level
coupling).
Distributed services
- Java Servlets and XML/SOAP.
Distributed objects:
|
Distributed services:
|
Idea: decouple system by using "services" instead of "objects". That is, separate individual components into standalone services which can be changed and debugged independently. Thus you don't have to recompile and test the whole system - but just the small part of it.
Object-oriented programming is a proven winner for application design,
but NOT for distributed applications.
Frameworks such as CORBA attempt to hide the fact that an object is
remotely located when in reality that fact should not be hidden.
So use services instead of objects.
Transports and Formats | home - top of the page - |
Low-coupling - use multiple transport mechanisms,
but avoid transport-protocol-specific formats and binary formats. Instead
use some simple common format (for
example, XML and SOAP). Also use messaging middleware
such as MQSeries (further decoupling, to assist with distributed transactions,
etc.).
Transports: HTTP(S), MQSeries, IIOP,
and plain TCP, SMTP, FTP and others.
Types of interraction: request-response
(live chat) or messaging (e-mail).
Idea: don't use transport-specific data format. Instead use XML messages which can be passed over all transports. All you need to communicate via XML is an XML-parser, which is available for all languages and platforms. Use SOAP for RPCs and callbacks.
Idea: decouple systems by using mostly messaging instead of request-response communication. Messaging is asynchronous (thus puts less load on network). It is also convenient to implement broadcasting.
Note:
Different transports may be required. Sometimes nothing can beat
pure sockets. For example, broadcasting of real-time updates to thousands
of subscribed clients can be done very effectively from one computer using
open sockets, but it will require much more hardware power to provide similar
information as an HTTP server responding to periodic update requests from
clients.
Why EJBs are NOT recommended | home - top of the page - |
Java is a good technology - but EJBs are NOT recommended.
Here is why.
- EJBs don't integrate good
with other languages (EJB use RMI/IIOP, which is similar to OMG' s (CORBA)
and is impractical).
- most of benefits of using EJBs are in fact provided by
a container and is also available to servlets.
- EJBs are complex.
- EJBs are very restrictedin what they can do. For instance, EJBs are not allowed to start their own threads and therefore cannot start their own event loops; this means that they cannot support other transport protocols such as HTTP or MQSeries.
- EJBs are slow
- EJB's implementation of object
persistence (Entity Beans) - leads to very tight coupling
between the database and the application and to unmaintainable database
schemas. This is bad, considering that data normally far outlives any application,
and thus it is very important that the database schema be independent of
the application services. Also it is largely futile exercise to try
and hide the database from the application service developer. Also even
EJB specification states that "the overhead of an inter-component call
will likely be prohibitive for object interactions that are too fine-grained".
It also says that EJBs should be very coarse in granularity and that finer
business object modeling should be done without EJBs. Entity Beans use
significant system resources since each load or store forms a separate
database operation - in contrast data access objects would aggregate
the database operations.
- Lifecycle Management (activation/passivation of EJBeans) - is still an immature solution. CORBA faces the same problem and has gone through several iterations at trying to solve it. The latest Portable Object Adapter (POA) specification shows that container-managed lifecycles need to be driven by user-specified policies that are applied differently to different sets of objects. At the same time, anybody that has implemented a big POA-based system would concede that the paradigm introduces a lot of complexity. Furthermore, it is apparent that lifecycle management is a problem similar in nature to the caching implementations of relational databases. Decades of research has been poured into optimizing the caching abilities of databases - can we honestly expect to see better caching in EJB containers in the near future? Of course, data caching is a simpler problem than object lifecycle management, but the question becomes are we really gaining anything from the more complex solution? The rejection of IBM' s Component Broker in the marketplace would indicate that object lifecycle management is not a feature that developers and system architects are clamoring for.
- Distributed transaction management - is not need for most of transactions (local transactions can be handled by the database itself). When you need distributed transaction - you don't have to depend on EJBs. You can use reliable messaging systems (such as IBM's MQSeries). Or use Java Transaction API (JTA) - which is an independent specification and is available to servlets as well.
- Component Interfaces - The EJB framework dictates that Enterprise Beans can only be accessed though their Home or Remote interfaces via JNDI service which adds additional overhead. A better approach may be to use regular JavaBeans which don't impose this limitation.
- Session Management - good thing provided by the container, it is also available to Servlets (you don't need EJBs for that).
- DB Connection Pooling - good thing provided by the container, it is also available to Servlets (you don't need EJBs for that).
- Fine granularity access Control / Security - good thing, but fine granularity (on the method level) is note required by most of the applications. The security provided by the container to servlets is usually enough - and you don't need EJBs for that. When you need more - you can use other ways to enforce security (Kerberos, Netegrity, etc.)
- Rapid Development - true feature if you consider the stand-alone application. But its inflexibility and difficulties of integrating with other systems may in fact make the development time longer (not shorter !). And further evolution is very difficult (as for all tightly-coupled systems). Better approach to RAD (Rapid Application Development) is to to use pre-built services instead of pre-built components. IBM' s MQSI v2 ( messaging ) and webMethods B2B ( webmethods.com - xml ) are two examples where true rapid application development can be achieved when the required application services are available. The use of XML as a common data representation and XML-RPC (whether SOAP or not) as a common communication mechanism simplifies the evolution of our systems and thus offers significant time-to-market benefits in the long run.
- Portable Deployment - not true. Different containers still don't allow portability of EJBs between them, because they differ in many pretty-basic aspects (for example, the find methods on EJB Home interfaces and the O/R mapping tools).
- Third-party Components - not limited to EJBs. Most EJB vendors make their components also available in other forms such as plain class libraries or JavaBeans. Those components can also be built into standalone services providing an XML-based interface.
- Strong Vendor Support - Yes. But it also exists for other technologies.
- Successful Adoption - can't be fully exposed until a few years down the road and we have thus not seen this yet.
If SOAP does emerge as a viable platform for distributed computing,
we will most likely see an effort to facilitate EJB-SOAP interoperability.
Another aspect of container-managed systems that is interesting is the
CORBA 3 specification which includes support for CORBA components (and
promises more bells and whistles than EJB - four types of Components in
place of two types of Beans, for starters).
Why Servlets are recommended | home - top of the page - |
There are many ways how you can offer services.
In many cases you can use a web server model (webserver-script-database).
In others you can have your own server ( C++ or Java servers).
Probably the best way to make a server is to use Servlets - because
they can take advantage of functionality provided by commercial application
servers (such as session management, fail-over, load-balancing).
There are two types of servlets:
- typical servlets (request-response)
- daemon servlets - serve as services.
Unlike EJBs, Java Servlets are free to start their own threads. This allows servlets to manage event loops that can handle requests from other transports such as MQSeries. A servlet can use the init() and destroy() methods of an HTTPServlet to start and stop threads. Thus, a servlet can start a separate thread for each additional transport that it intends to support. If all the transports carry XML/SOAP messages, you can use centralized XML-processing functionality of the servlet for all the transport interfaces.
Session support - 2 types of sessions:
- transient per-client servlet sessions can be persisted
temporarily, although at a fairly high cost.
- per-application context information can be stored through
the lifetime of a servlet and can be accessed by other servlets that belong
in the same logical application
The Servlet Container also provides access to a JNDI server where the application service can publish a reference to itself so that clients and other services can locate it through a JNDI lookup instead of a URL.
Developing application services as servlets also has its drawbacks.
This model makes the application service dependent on the Servlet Container
to provide a runtime environment. This may prove to be complexity that
is unnecessary in the application architecture. So, if a service does not
utilize any of the Container' s facilities, it may be better to run the
service as a standalone process.
XML vs, binary formats | home - top of the page - |
Using XML is always slower than using binary formats. Especially for DOM API (SAX API was shown to be pretty fast).
DOM is in fact not the best representation for data-oriented XML since it supports many intricacies that only practically apply to full-featured XML documents. JDOM (http://www.jdom.org/), an emerging API for XML parsing in Java, alleviates some of these issues by making the representation more appropriate for XML data. Another approach may be a HashTree-based XML API that will optimize performance.
XML data takes more space and increases bandwidth requirements. But the cost of bandwidth is usually much less than cost of software development and maintenance.
XML is not the cure-all solution. In some situations it is impossible to reach required speed. Some 3rd party applications simply don't have XML interfaces.
Distributed transactions | home - top of the page - |
The "traditional" transaction (not distributed) is simply a set of operations which should be either successfully performed together - or not performed at all. The simplest example - a money transfer between 2 accounts. It involves 2 actions: removing money from one account and adding money to the other. Imagine that the computer loses power in the middle of this process. The money was removed from the 1st account - but was never added to the second. This is an error. How to prevent it? It is simple. We will record all steps of the transaction in a log file. This way after restarting the system can read the log and successfully finish the transaction (commit) or cancel all the changes (roll back). This was a simple explanation of something called "Transaction Protocol" (TP). TP should comply with 4 fundamental properties, usually denoted ACID: Atomicity, Consistency, Isolation, Durability.
Transaction may be nested (one big transaction includes several smaller transactions - and failure of any one of them would rollback the whole big transaction). There are some standards and specifications (ISO, OMG, JTA - Java Transaction API, etc.) for transaction protocol (basic decisions concerning the nested transaction models (open / closed subtransactions), the set of service primitives and their roles, etc.).
When individual actions of a transaction run of different systems -
we are dealing with distributed transactions. (DT).
Example 1: money transfers between remote accounts (between different
banks).
Example 2: data replication between corporate directory, Outlook, some
sales CRM package, Web Authorization database, etc.
DT allows individual actions to run simultaneously (in parallel) - for some transaction this can be used to increase the speed.
Distributed transactions can be governed by different protocols. One of the simpliest and commonly used protocols - two-phase commit (2PC) protocol. The 2PC protocol uses a central "transaction monitor" process. It goes like this: first, all changes required by a transaction are stored temporarily by each database. The transaction monitor then issues a "pre-commit" command to each database which requires an acknowledgment. If the monitor receives the appropriate response from each database, the monitor issues the "commit" command, which causes all databases to simultaneously make the transaction changes permanent.
You may define your own transaction protocol (TP) to custom fit your transactions. Do you need nested transactions? Do you need parallel processing. Some TPs don't have a central monitor - but instead they have a truly distributed system. Some TPs define also a communication method they use (for example, XML messages). For example, TP for a system where all parts are almost never available simultaneously should be different from a standard banking 2 phase commit system. Different time-outs. Probably messaging is a requirement. Etc.
Check out the links below. Or for more reading search Internet for "distributed transaction" or "two phase commit".
- http://ei.cs.vt.edu/~cs5204/fall99/distributedDBMS/duckett/tpcp.html
-
- http://aspn.activestate.com/ASPN/Mail/Message/xml-dev/755432
-
- http://www.computer.org/proceedings/dexa/7662/76620100abs.htm
-
- http://www.vermicelli.pasta.cs.uit.no/ipv6/students/andrer/doc/html/
-
- http://java.sun.com/products/jta/
- The Java Transaction API (JTA) 1.0.1 Specification
Some books:
- Data Replication : Tools and Techniques for Managing Distributed
Information - by Marie Buretta
- Principles of Distributed Database Systems - by M. Tamer Ozsu, Patrick Valduriez - Transaction Management : Managing Complex Transactions and Sharing Distributed Databases - by Dimitris N. Chorofas, Dimitris N. Chorafas - Transaction Processing : Concepts and Techniques (Morgan Kaufmann Series in Data Management Systems) - by Jim Gray, Andreas Reuter - Distributed Algorithms (Data Management Series) - by Nancy A. Lynch |
Transaction:
- Atomicity: All updates are successful
or no updates are successful. Must support commit and rollback, may support
savepoints. Do one transaction at a time. Or, for parallel - what should
be read: the cache (or rollback segment) or the database?
- Consistency: Each transaction leaves the database in a consistent state. Constraints are satisfied. - Isolation: Concurrent transactions have the same
effect as single transactions. Uncommitted changes are hidden from other
transactions. Changes from other transactions are hidden from the app.
- Durability: Changes should stick. |
Deadlocks | home - top of the page - |
Example: what NOT to do.
Imagine that you have 2 processes reading/writing data from/between
databases A & B. Imagine further that reading and writing puts locks
on the tables. Imagine that one process has locked A, but couldn't get
a lock on B - and vice versa, another process got B and is waiting for
A. What you have is a dead lock (mutual exclusion). Processes may
wait for each other forever.
|
Let's apply these principles to the A/B example above, namely let's forbid simlutaneous locking of more than one resource. If we can't do locking of A and B simultaneously by one job, we need to alternate between A and B. We can first read a little from A, release lock on A - then get a lock on B and write there. Repeat this process as many times as needed. Or we can first read everything from A into a temporary table, then release the A-lock. After that we can work between this temporary table and B. Yet another approach wouldbe to avoid locks on reading byusing "dirty reads".
simple locking with a queue | home - top of the page - |
simple locking with a queue
sub lock {
delete locks where timestamp older than 15 sec insert lock (tabname, my_pid, timestamp); for (1 .. 10) { if(my_pid = first lock) { return $success } else { sleep(1), next } } delete lock (tabname, my_pid); return $error; } sub unlock {
|
misc links | home - top of the page - |
* www.ittoolbox.com - One of the children sites is EAI.Toolbox ( http://eai.ittoolbox.com/ )
--------------------------------------