Performance is a key feature
Low latency, high throughput and extreme scalability is imprinted into the Tbricks DNA.
Low latency, high throughput and extreme scalability is imprinted into the Tbricks DNA.
The principal tenet of the Tbricks system architecture is what we believe is one of its most important strengths: "Do the right thing, in the right place, at the right time.". The internal protocols and data flow have been designed with a server-based colocated system in mind, with the goal to minimize the machine resources wasted on performing unnecessary work. This includes extensive use of source-side filtering for data streams whenever possible.
The implementation uses highly efficient data structures and algorithms, with the best possible time-complexity characteristics, all the way down to O(1) for performance-sensitive operations, like source-side filtering of data streams. We have minimized or outright eliminated all possible system calls, database accesses, context switches, mutex locks and other synchronization primitives in latency-sensitive critical paths.
Sophisticated caching schemes are used when proved beneficial, for example for numerically intensive calculated instrument values, such as options pricing. Platform-unique performance optimizations are used whenever beneficial; for example, adopting the scalable libumem slab memory allocator, using separated ZFS storage pools and custom file system record sizes, and tuning kernel, TCP/IP and NIC driver settings for minimum latency. Tbricks also has built-in support for creating and assigning processor sets.
“Do the right thing, in the right place, at the right time.”
Consistently considered throughout the design and implementation process, performance is imprinted into the DNA of the system and engineering resources are consistently dedicated to further improve performance with each release.
All performance critical services can be run in multiple instances for true horizontal scalability, and transparent multiplexing for market data and trading is built right in. All services have been heavily optimized to perform their designated task quickly and robustly, and all apps are built with native development tools for no-compromise performance.
For excellent vertical scalability, the Tbricks services have been carefully multithreaded to ensure they can use all available processor cores. Multiple services running on the same machine will additionally benefit automatically from the multiprocessing provided by the operating system. To ensure efficient use of threading resources, Grand Central Dispatch has been integrated and is used throughout the system, allowing for lock-free operation of critical sections under load.
This consistent work to scale well on multi-core processors, ensures excellent performance even when facing the ever-more relevant challenges of Amdahl's law.
“Tbricks inherently supports the fusing of latency-sensitive services into a single process”
Services in Tbricks typically run as separate processes using shared memory for interprocess communication. Tbricks inherently supports the fusing of latency-sensitive services into a single process using our Speedcore® technology. This allows for mimicking the deployment of a typical in-house application, while retaining a clear architectural separation of services. Services can easily be moved into or out from a Speedcore®.
This innovative approach, allows you to carefully control how services should be deployed to ensure the best possible performance. The benefits of running in a Speedcore® configuration is the removal of the interprocess communication overhead between the services running in the Speedcore® as well as an improved CPU cache hit rate with dedicated CPU resources assigned to the Speedcore® using processor sets.
Tbricks includes a blazingly fast embedded transactional database — Oracle Berkeley DB — which vastly outperforms conventional SQL databases. The embedded database resides in the same address space as the service, so there is zero IPC overhead for communicating with a database server. The fact that each service has access to its own private storage also allows for highly parallelized I/O across the system.
Oracle Berkeley DB is consistently used for all storage in the system and requires virtually no configuration. Oracle’s state-of-the-art ZFS file system is used as the underlying storage mechanism for the database, which allows for storage pools that can be grown on demand, as well as support for hybrid storage systems combining SSDs and conventional magnetic disks for the best combination of performance and storage space. Additionally, ZFS provides data integrity checks, avoiding potential silent data corruption.
“Tbricks includes a blazingly fast embedded transactional database”
“A typical front-end only uses 200 Kbit/s on average”
When performing interprocess communication on the same host, Tbricks uses shared memory transport for the best possible latency and throughput. For services running on different machines, TCP/IP is used to allow for the source-side filtering and throttling of data streams.
All interprocess communications in the system are done using an efficient binary encoded protocol, which is further efficiently compressed for traffic sent across the WAN. Partial message updates are fully supported and are consistently used throughout the system to only send the actual delta changes over the wire rather than full business objects each time.
The extensive use of source-side filtering also avoids superfluous data transfers and removes unnecessary wake ups of threads and allows trading apps to simply react when something of note has happened, thus avoiding repeated inefficient 'should I do something?' checks.
The Tbricks front-end is carefully implemented to use a minimum amount of bandwidth, as only the exact information that you see on screen is transferred. A typical front-end only uses 200 Kbit/s on average, with a full-fidelity truly responsive user experience. This removes the need for using remote display solutions such as Citrix, which additionally do not solve the problem of connecting a single unified front-end to a fully distributed system which is running in multiple geographical locations.
It is also possible to further improve performance by dampening quickly oscillating data streams using throttling conditions. This is beneficial when you aren’t interested in, say, market data updates unless they deviate more than a certain amount since the last update you received, or when you don’t need updates more often than at a predefined maximum frequency.
For instance, it’s possible to set up a throttling condition that limits the update rate to be at most every X milliseconds, or to only send an update for a currency rate when the bid or ask changes more than Y% since the last update received.
Such throttling conditions provide an additional performance boost as trading strategies don’t have to react on smaller price movements while still making sure an up-to-date value is received periodically by specifying the maximum update frequency.
The use of source-side filtering together with data stream throttling is a powerful combination that allows trading strategies as well as internal Tbricks services to eliminate unnecessary updates that are wasting processing power.
The key to understanding performance is through measuring. We have designed Tbricks to make it possible to both measure and monitor many interesting performance aspects of production systems, such as market data and order latency and throughput, as well as trading strategy runtime performance.
Tbricks includes a Quality of Service framework, which makes key performance indicators available both in the front-end as well as to apps. This makes it possible, for example, to allow a ‘fastest execution’ order strategy to consistently pick the market that currently has the lowest average outbound latency.
It’s challenging to analyze the throughput and latency characteristics of either a system or a specific trading strategy, without having a probe effect, especially when measuring time events in the low microsecond range. Using log statements and comparing timestamps often proves to be misleading when measuring such short time spans.
To enable such analysis without a significant probe effect, there are a number of DTrace scripts bundled together with the system, which allows analysis of order and market data throughput rates and strategy processing times, including latency distributions. There are also ready-to-use scripts that allow you to analyze your own strategy execution time without any significant probe effect, even on a production system. There is also convenient built-in support for static DTrace probes for custom events.
Tbricks supports live export of latency correlation information, allowing external performance measurement tools to correlate e.g. a given market data update to a specific outbound quote or order.