The Distra Universal Payments Platform’s (UPP) distributed architecture inherently provides a replicated, scalable and fault-tolerant operating environment for enterprise class payment applications.
The Distra UPP follows a true distributed computing model, with the application distributed as a collection of distributed software services, across separate nodes or servers within the platform.
The Distra UPP manages two distinct types of services; the peer-to-peer services are managed in such a way that distribution is supported via the reliable messaging mechanism for state distribution and each service is considered a peer.
Primary-secondary services are replicated in such a way that one is designated as the primary instance and the others are considered ordered replicas or slaves, each being kept in constant synchronization and the Distra UPP orchestrating the failover of the primary to an available secondary instance in the event of failure.
Real-time distribution facilitates high performance via dynamic load-balancing, allowing processing to spread across all resources available to the platform. Synchronization is managed by using a process called Group Communications.
Group Communications provides a mechanism to allow for reliable communications between elements of distributed systems that have inherently unreliable components, such as hardware, operating systems, networks and databases. In practice, the Distra UPP uses Group Communications to ensure reliable messaging and synchronization of distributed services.
Replication ensures that each distributed instance of a service is kept within the same state; modifications to a service's shared state are guaranteed to be consistent replicated to the other instances by the Group Communications mechanism.
Failure detection involves detecting blocking, overload and exception conditions within services and the applications server itself. These may be caused by transient conditions such as network outages or blocked threads.
Under these circumstances, the platform takes evasive action such as quarantining the problematic service and migrating processing to a good known replica on another instance. Recovery involves automatically and seamlessly returning the failed service back to a steady state of operation.