StreamBase Documentation


Tuning Guide

Welcome — This guide is a start on some design and configuration tips that may help you avoid mistakes and tune your StreamBase applications for better performance. Of course, not all of the suggestions in this guide will apply to your unique application, or result in improved performance. Many variables contribute to the performance of applications, including the nature of the real world data, when it arrives, and dependencies on other data. Our overall tuning advice is that you read these suggestions and try the ones that you feel may help.

Measure Current Performance to Establish Baseline for Comparisons

Although it may be stating the obvious, it's worth mentioning that you should measure the current performance of your StreamBase application before starting a tuning project. Using the current performance numbers as a baseline, you can then observe the positive or negative results of experimenting with different application designs, configurations, and so on.

If possible, create a final test or "staging" environment that contains the same hardware and software configuration you propose to use when the application is deployed into production.

In general, you may notice that applications typically have bottlenecks that are slowing the overall throughput. The bottlenecks can occur with:

  • Enqueuing (input) speed in adapters or clients
  • Dequeuing (output) speed
  • Operators that you defined which are waiting for matching tuples on queues (Merge, Join, Gather), which could benefit from the addition of Heartbeat operators
  • Other "slow" operators such as misuse of Aggregate or Query operators
  • Java operators that are providing custom functionality, but are not performing well

To measure normal and peak performance of a StreamBase application, you can use several tools:

  • The StreamBase Feed Simulator provides a convenient way to quickly send test data to your application during the development and testing phases. It is also a good tool during staging, where you should try to submit "real world" data through a Feed Simulation via *.csv input files, which can be specified in an *.sbfs configuration file. For example, the Moving Average Convergence Divergence (MACD) sample that is installed with StreamBase contains a MACD.sbfs configuration file that references a MACD.csv file of test data. Run the sbfeedsim command from the command line to submit test data to your application. Do not use the Feed Simulator view in StreamBase Studio for performance testing, as it incurs the overhead of the IDE itself, which may skew your results.

    For more information, start in Using the Feed Simulation Editor, and then read the sbfeedsim command reference. As noted in those topics, a Feed Simulation configuration gives you lots of options, including the ability to "throttle up" the data rate to stress test the application closer to expected peak data rates.

  • With your application running, you can use the StreamBase Performance Monitor (sbmonitor) to see which operators are "hot spots," meaning which ones receive the most number of tuples, and which threads are using the most CPU. Once hot spots are identified, you can distribute the processing of hot-spot operators into separate threads, which in turn can be assigned to run on separate physical machines in a StreamBase cluster.

    With the StreamBase server running and test or "real world" streaming data running through the application, the StreamBase Performance Monitor may be invoked via the sbmonitor command from a DOS or terminal window, or from within StreamBase Studio (Monitor View on UNIX, or use the menu commands "Window > Launch Monitor on Windows to open a DOS command window).

    For example, here is a sample Monitor window for the running MACD sample:

    StreamBase monitor sample

    For the running StreamBase Server that is hosting your application, the performance statistics present:

    • The name of each operator
    • IN: The number of tuples enqueued on this operator's input queues during this past second (or the configured snapshot interval; the default is 1000 milliseconds, or 1 second).
    • OUT: The number of tuples output by this operator during the past second.
    • w-us/T: The current wall-clock time in microseconds, per input tuple (EWMA).
    • %TIME: The current percentage of time spent in this operator.
    • SIZE: The current number of tuples in this operator.

    For more information, see:

Filter Early, Filter Wide

In data processing, for traditional and real-time applications, an important guideline is to minimize the amount of data movement for a given task. This simple fact, obvious once you observe it, is critical when processing high-volume, real-time data streams.

We recommend that you avoid defining multiple sets of Filter operators too far "downstream" in your application (where downstream means on the right side of the application diagram). Instead, spit the data as soon as possible after it enters an Input Stream, so that the downstream operators only process or move the data that is appropriate for that path through the application. A common design mistake is to add multiple Filter operators throughout the application diagram, when in fact a single Filter operator with multiple predicates (expressions that result in TRUE or FALSE) would be sufficient to do the work, and not incur the overhead of excessive data movement.

Remember that each operator in a StreamBase application has to move data in, and move data out. Assume for the moment that the cost of performing these steps, per operator in your application, is 1. In reality, the cost is different per operator, but let's start by adding up the costs of moving data through applications using varying designs.

Example 1: Simple View of Just Moving Tuples Through Three Operators

StreamBase operator data movement cost

In the example above, the total cost to simply move tuples through the operators is 3.

In the next example, consider the cost when you add a Filter operator on the right side (downstream) of the application diagram. In this example, assume that the Filter operator's predicates and the inbound data values resulted in 30% of the tuples being moved out port #1, and 70% out port #2, to the next set of downstream components:

Example 2: Filter Operator is Added, But Too Far Downstream for Efficient Processing

StreamBase operator data movement cost

If we can redefine this application so that the Filter operator does the split processing further upstream, closer to when the data first enters the system, we can avoid moving 100% of the data tuples through the operators that preceded the (further downstream) Filter operator shown above.

Example 3: Filter Operator Moved Upstream, Avoiding Expensive Data Movement by Downstream Operators

StreamBase operator data movement cost

In the example above, we pushed our predicates in the Filter operator upstream (to the left in an application diagram), to reduce the flow of tuples in the downstream operators. In addition, you can define your Filter operator so that the predicates cover the conditional cases you want, and also drop any non-matching tuples that do not return TRUE for any of the predicates. In doing so, you can design the application so that "uninteresting" data records do not clog your system, further reducing the cost of data processing and increasing the overall throughput of the application.

Filter operators are efficient. The point in this section is that the more expensive operation is actually moving the data, not determining how to split the data based on the predicate expression(s).

Define or Process Minimal Inbound Tuples

The previous section included the tip to perform data filtering as early as possible in the application diagram, to minimize the amount of data that must be moved as the tuples proceed downstream. The general rule, then, is to optimize performance by reducing the number of tuples processed on each path, or the number of fields in each tuple, as early as possible. Narrowing down the tuples to just the essential fields has the biggest benefit when you do this at the source, before the data is enqueued into StreamBase. So if possible for your application, determine which fields are essential for a tuple, and remove any unnecessary fields.

For data that must be entered into the StreamBase application, you can use the Filter and Map operator to define criteria that set when certain tuples can be dropped from the stream. However, these operators have caveats in terms of their placement in an application, as noted in the previous section about Filters, and the next section about Maps.

Remember: developers who are new StreamBase tend to focus on the relative processing cost of individual operators or operator type. However, the number of tuples processed by the operator dominates the relative cost of the operator.

Consolidate Multiple Map Operators

As an application developer, you have an instinct to design or create code that is modular. In the case of StreamBase applications, this can sometimes result in the use of multiple sets of Map operators. It is understandable why a developer would do this, believing that (for example) Map1 should change the value of FieldA, while Map2 should exist to do something to the value of FieldB, and Map3 could drop FieldC and FieldD. As you may have guessed by now, unless Map2 is dependent on the result of Map1's processing, and unless Map3 is dependent on the results of Map1 and Map2's processing, you can consolidate these Map operators into a single Map, and use multiple expressions to perform the work. Using one Map operator is, of course, much more efficient in terms of doing the work and not forcing the application to move data three times before it is then sent on to the next downstream component.

Moving data has a cost, and so does evaluating expression and changing the schema. As noted earlier, filter streams as far to the left in the application diagram as you can. And mutate (with a Map operator) as far to the right in the application diagram as you can.

The Rule of Thumb is: "Filter on the left, Mutate on the right."

Set Query Table Cache Size If Using Disk-Based Option

If your application must use disk-based Query Tables, your performance may improve if you increase the cache size internally allocated to disk-based query table operations. See this XML element in the <server> section of the sbd.sbconf server configuration file:

<server>
    <param name="tcp-port" value="portno"/>
    <param name="datadir" value="dir"/>	
    <param name="disk-querytable-cache" value="number"/>
</server>

The parameter shown in bold text is commented out, by default. The value="number" attribute sets the amount of main memory that will be allocated to any disk-based query table operations. The value units are in MB of main memory and must be a power of 2. When unspecified, the default value is 1 MB. Use caution when setting this parameter, as too high a value may consume more memory than needed and could negatively impact other resources that require memory during the execution of the StreamBase application or other applications. The memory footprint of the sbd program is increased by the amount specified in this parameter. As with any resource setting, you should establish baseline performance metrics and then test the effect of increasing or decreasing values under normal and peak loads.

In Clients, Narrow Results with Filtered Subscriptions

You can use the StreamBase Java or C++ client API to write a consumer (dequeue) client that narrows the result set, by using a predicate on the subscription to an output stream. The predicate is applied to the data output from the stream before it is delivered to the client. By narrowing the result set that the server must provide to the client, you can increase the efficiency of your application. The predicate language follows the same rules and syntax as implemented in the Filter operator.

We have provided examples with the installed client sample:

STREAMBASE_HOME/sample/client/FilteredSubscribe.cpp 
STREAMBASE_HOME/sample/client/FilteredSubscribe.java

For details, see Narrowing Dequeue Results with Filtered Subscribe in Creating Clients, a topic in the API Guide.

Add Custom Functions to Increase Performance

You may find that the processing of your application can be improved by introducing your own custom functions, instead of performing special calculations in client applications. This idea assumes that the built-in functions provided by StreamBase do not meet your application's needs. Once implemented and configured, the custom functions can be used in the expressions of most operators in your application. To learn how to write your own functions by extending the StreamBase Java or C++ APIs, see these API Guide topics:

Consider Using Concurrency and Data Parallelism Features

On the Concurrency tab of the Properties view for most StreamBase components, you may notice the option to "Run this component in a separate thread," and below it the Data Parallelism options. If this is a compute-intensive component and you know that it can run without data dependencies on other components in the StreamBase application, you may be able to improve performance by enabling the first, or both, options:

StreamBase concurrency and data parallelism options

When a component runs in its own thread, the server processes the component's requests concurrently with other processing in the application. The processing of the threads may be distributed automatically across multiple processors on an SMP machine; or, via configuration settings that you define in an edited sbclusterd.sbconf file, across physical machines in a cluster.

If you checked the first option, StreamBase provides an additional option to enable data parallelism. This feature allows you to run multiple instances of the operator or module reference; each instance runs in its own thread. At runtime, tuples are dispatched to particular instances based on the Key Expression value.

Note: The concurrency features are not suitable for every application. For details, see StreamBase Execution Order, Concurrency Options, and Data Parallelism. It includes important caveats about the use of these features.

Tune Memory Parameters

The maximum size of a tuple in your application cannot exceed the value set by the StreamBase page-size parameter. The default StreamBase page sizes are:

  • 4096 bytes on Windows or Linux machines
  • 8192 bytes on SPARC

If you have larger tuples, increase the page-size parameter's value in the <page-pool> section of the StreamBase Server configuration file for your application. By default, the file is named sbd.sbconf. The StreamBase page size, if increased, must be a multiple of the system's page size. The default maximum size of a schema is 1048576 bytes (1 MB).

In non-HA environments, the initial memory allocation is determined by the JVM heap size for the hosted StreamBase application. If needed for StreamBase applications with large memory demands, you can set a higher initial allocation of heap by specifying -Xms (min) and -Xmx (max) JVM arguments. For your StreamBase Server instance, these values can be set in the <java-vm> section of the sbd.sbconf configuration files. The default maximum for the JVM is 256 MB. For example:

     <param name="jvm-args" value="-Xms64m -Xmx256m"/>

At runtime, this default may be insufficient for your StreamBase application. We recommend that you experiment with the JVM arguments for memory use and, as needed, increase the values. For example:

<java-vm>
    <param name="java-home" value="C:\Program Files\Java\jdk1.5.0_06" />
    <param name="jvm-args" value="-Xms512m -Xmx1024m" />
</java-vm>

The pages-per-chunk and max-chunks parameters are now only used for memory allocation when High Availability (HA) has been enabled for your application. The pages-per-chunk parameter sets the number of pages to allocate at a time in the page pool. Its value defaults to 1024 pages, or 4MB at a time (with 4K page size). The max-chunks parameter sets the maximum number of chunks to allocate. Its value defaults to 128 chunks, or 512MB total memory allocation (4MB * 128).

Also pay attention to the <param name="max-client-pages"> parameter in the <page-pool> section of the sbd server configuration. Its value sets the maximum number of pages that a dequeue client connection can allocate. The sbd process will disconnect clients that try to allocate more. This parameter is designed to protect sbd from a slow or hung dequeue client. Therefore to understand the resulting number of bytes allowed, multiple the max-client-pages value by the page-size value. A default max-client-pages value of 2048 pages results in about 8 megabytes, for a page size of 4096 bytes. If you want to enable a dequeue client connection that would allocate unlimited memory use, change the max-client-pages value to "0".

Increase Computing Resources

If design and configuration changes do not result in sufficient performance, consider increasing your computing resources, such as the processor speed, the number of processors on an SMP machine, the amount of memory, and the number of machines in a cluster (where distributing the processing load across several physical machines may help).

Back to Top ^