Contents
This topic is for application developers who need to run legacy feed simulations created with StreamBase 3.5 and earlier. These instructions require you to run your legacy feed simulations from a StreamBase Command Prompt or a terminal window, instead of in StreamBase Studio's Feed Simulations View.
Some background information: StreamBase 3.7 introduced a new version of the StreamBase Feed Simulator. Now feed simulations are functionally equivalent whether they are run in StreamBase Studio or from the command line using sbfeedsim. The new feed simulator offers several advantages, including application independence and the ability in StreamBase Studio to enqueue data to multiple streams from a single feed simulation. In Studio, when you open a legacy feed simulation and then save it, the .sbfs file is automatically upgraded to the new version. The XML elements that comprise the .sbfs file are all new as of StreamBase 3.7.
However, upgrading to the new StreamBase Feed Simulator may
not be possible for all users with a large set of saved test cases in
legacy feed simulation files. In addition, new format feed simulations do
not support the group-by or switch elements that were available in prior releases. Thus,
for compatibility with legacy feed simulations, StreamBase
includes the sbfeedsim-old
command. It can run legacy .sbfs
configurations that contain elements no longer supported by the new
StreamBase Feed Simulator.
The StreamBase Feed Simulator connects to a running StreamBase Server and generates tuples for some or all of its input streams. Its default behavior is to generate uniformly random data on your application's input streams at a rate of one tuple per second, but you can customize the generated data in many ways.
The StreamBase Feed Simulator always talks to a
StreamBase Server process that is hosting a running
application. In this legacy topic, we will use the firstapp sample that came with your
StreamBase software distribution. (This sample is the same
application that you may have built yourself by running the Creating Your
First StreamBase Application tutorial, but with some new
files added to demonstrate the Feed Simulator features described here.)
Start a StreamBase Server by typing
sbd
/opt/streambase/sample/firstapp/firstapp.sbapp. In a
separate window, type sbc dequeue -v
--all to view the data being input and output by the
StreamBase Server.
The simplest way to use sbfeedsim-old is to give it no configuration at all and let it generate "default load." Default load means that the Feed Simulator will generate about one tuple per second for each input stream in your application. Every int and double field will be assigned a random value from 0 to 10000; every boolean field will be assigned true or false; and every string field will be filled with a random set of uppercase characters. Start sbfeedsim-old by simply typing the command, and you will see something like the following, though with slightly different times and field values. (Press Control-C to terminate the Feed Simulator after a few lines have been output.)
t=0.927: ItemsInputStream1 ((time=2007-01-17 17:35:43.801Z)
ITEM_NAME="ZNRNCRCDMW" SKU=6714)
t=2.947: ItemsInputStream1 ((time=2007-01-17 17:35:45.820Z)
ITEM_NAME="CTAHKGLLSI" SKU=4863)
t=3.230: ItemsInputStream1 ((time=2007-01-17 17:35:46.104Z)
ITEM_NAME="GATZCOEVLG" SKU=6013)
t=3.788: ItemsInputStream1 ((time=2007-01-17 17:35:46.661Z)
ITEM_NAME="TWCXQTHKJC" SKU=243)
At 0.927 seconds after it was started, the Feed Simulator generated the first line; about 2 seconds later (at 2.947 seconds after it was started), the Feed Simulator generated the second line, and so forth. Note that tuples are not generated exactly one second apart: they are generated on average one second part, according to an exponential distribution, which is more representative of real-world, randomly-arriving data. (You can, however, instruct the Feed Simulator to generate exactly one tuple per second, i.e., at t=1.0, t=2.0, etc.; we will get to that explanation later.)
Look in the sbc dequeue
terminal you started earlier; you will see each of these ItemsInputStream1 tuples there. Recall that the
firstapp sample separates tuples into
ItemsOutputStream1 (items with SKU > 5000)
and AllTheRest (items with SKU <= 5000);
therefore, for each of the four lines above you will also see a line
containing ItemsOutputStream1 or AllTheRest.
There are a few command-line options you can try to modify the Feed Simulator's behavior. Try running:
-
sbfeedsim-old -n: The Feed Simulator will display output but not actually send it to the server (nothing will appear in your
sbc dequeuewindow). This is useful for debugging more complex Feed Simulator configurations (you will see one later). -
sbfeedsim-old -x 5: The Feed Simulator will generate data 5 times as fast as normal, i.e., about 5 tuples per second rather than 1 per second. (You could also usesbfeedsim-old -x .2to generate 0.2 tuples per second, i.e., 5 times as slow as normal.) -
sbfeedsim-old -z 100: The Feed Simulator will use a different random seed, so it will generate similar but slightly different data. In general, any two invocations of sbfeedsim-old with the same command-line arguments and the same configuration file (if any) will output exactly the same data, but-zprovides a way to change the data. You can also us-z clockto use the system clock as a random seed, which will result in different data every time. -
sbfeedsim-old --max-tuples 5: The Feed Simulator will stop after generating 5 tuples. -
sbfeedsim-old --max-time 3: The Feed Simulator will stop after 3 seconds, regardless of how many tuples have been generated. -
sbfeedsim-old -a ItemsInputStream1: this tells the Feed Simulator to generate default load on onlyItemsInputStream1, rather than all input streams in the application.ItemsInputStream1happens to be the only input stream in this simple application, so this behaves exactly like sbfeedsim-old without any command-line arguments, but it is good to know for more complicated applications.
Often you will want to configure the data generated by the Feed Simulator
in more complex ways than the command-line switches above allow; to this
end the Feed Simulator lets you provide a configuration file. Type
sbfeedsim-old -s >
firstapp.sbfs to generate a (legacy-format) customizable
"skeleton" configuration file for your application into a file named
firstapp.sbfs. (Note that, like any other
sbfeedsim-old invocation,
sbfeedsim-old -s requires
that a StreamBase Server containing your
application is running.) Open firstapp.sbfs
in a text editor and you'll see something like this:
<?xml version="1.0"?>
<!-- FeedSim skeleton generated by ... at ... -->
<feed-simulation>
<stream name="ItemsInputStream1">
<rate per-second="1.0"/>
<field name="ITEM_NAME"> <random-string/> </field>
<field name="SKU"> <uniform min="0" max="10000"/> </field>
</stream>
</feed-simulation>
As you can see, the skeleton is tailored for your application; it's
actually the "default load" specification described earlier, i.e., one
tuple per second per input stream. You can run the Feed Simulator using
this configuration with sbfeedsim-old
firstapp.sbfs (although right now, since you haven't
modified the file, it will work exactly like sbfeedsim-old with no arguments).
The configuration file contains one or more stream elements, each describing the data to be
generated for a particular input stream. (There's only one input stream in
our sample application, so there's only one stream section here.) Each stream element contains, in order:
-
A rate specification describing how often to generate a tuple for that input stream. The simplest form of rate specification is simply
rate per-second="n", where n is a rate in tuples per second. We'll discuss other kinds of rate specifications in a bit. -
A
fieldelement for each field in that stream. Eachfieldelement must contain a source, which is a description of how the Feed Simulator should create each value for that field in the tuples it generates. There are several kinds of sources, including:-
random-string/: fill the field with a series of randomly-generated uppercase characters (for example, "RGQWOZ" for a six-character string field). This applies to string fields only. -
uniform min="min" max="max"/: generate a random number greater than or equal to min and less than max. -
step min="min"/: start at min (defaults to 0), incrementing by 1 each time a value is generated. For example, 0, 1, 2, ... -
random-walk min="min" max="max"/: start at a value between min and max, incrementing or decrementing by 1 (chosen at random) each time a value is generated (pinned between min and max). For example,random-walk min="1" max="3"/might generate 2, 3, 3, 2, 1, 2, 3, ... -
constant value="value"/: always just use value.
Note that this is not a complete reference: there are many other sources (and other parameters for some of the sources that are listed here). Refer to the StreamBase Legacy Feed Simulation XML reference topic for a complete list.
-
Try replacing the uniform .../ tag in
the configuration with step min="10"/
or random-walk min="10" max="30"/, and
type sbfeedsim-old -n --max-tuples=20 -x10
firstapp.sbfs to see what is generated (without sending
tuples to the server [-n], stopping after 20
tuples [--max-tuples=20], and at 10x speed
[-x10]).
We have covered the simplest form of rate specification, rate per-second="n"/, but there are two other
forms:
-
intervalsource/interval: Use source to determine the amount of time the Feed Simulator should let pass between tuples. For example, if you want tuples to be exactly one second apart, use:<interval> <constant value="1.0"/> </interval>
The following would put 1 second before the first tuple, then 2 seconds between the first and second tuple, then 3 seconds between the second and third tuples, and so forth.
<interval> <step min="1.0"/> </interval>
-
timestampsource/timestamp: Use source to determine the relative point in time at which the Feed Simulator should generate the tuple. The values generated by source must be strictly non-decreasing (it doesn't make sense to generate a tuple at t=4 seconds, then t=5 seconds, then t=3 seconds)! This is mostly useful in combination with trace files, described in the next section.
Try replacing the rate element in the
configuration with one of the interval
snippets above, and type sbfeedsim-old -n
--max-tuples=20 firstapp.sbfs again. (Add -x10 if you are in a hurry, but be aware this will make all
your rate specifications 10 times faster.)
You can also use the Feed Simulator to read tuples (or parts of tuples)
from a trace file rather than
randomly generating data. Let's say that we have a list of item names and
SKUs in a CSV (comma-separated values) file called firstapp-trace1.csv that looks like this:
TEA,96371 COFFEE,785799 EGGS,873904 CHEESE,353728 EGGS,394293 COCOA,575788
(The first column is the item name, and the
second column is the SKU.) We want to send
these tuples to the server at the usual 1 tuple per second. The following
legacy (pre-3.7) configuration will do this:
<feed-simulation>
<stream name="ItemsInputStream1" trace-file="firstapp-trace1.csv">
<rate per-second="1.0"/>
<field name="ITEM_NAME"> <trace column="1"/> </field>
<field name="SKU"> <trace column="2"/> </field>
</stream>
</feed-simulation>
This file is included in the firstapp sample
as firstapp-trace1.sbfs; to run it go into
the /opt/streambase/sample/firstapp directory
and type sbfeedsim
firstapp.sbfs.
Note that the name of trace file is listed in the stream attribute, and the special trace column="n" source is used to refer to a
particular column in that trace file.
Not all fields in a stream must come from the trace file. You could replace
the SKU field specification with another source, such as uniform as above; the item name for each row would
still be read from the trace file but the SKU would be randomly generated
as before.
Trace files may also include timestamps. firstapp-trace2.csv looks like the trace file above,
except that it has a timestamp in the third column:
TEA,96371,0.171 COFFEE,785799,1.041 EGGS,873904,1.733 CHEESE,353728,2.479 EGGS,394293,3.211
By replacing the rate tag with
timestamp> <trace column="3"/>
</timestamp, we can tell the Feed Simulator to generate the given
tuples at the given times in the third column (TEA at t=0.171 seconds,
COFFEE at t=1.041 seconds, and so on).
Timestamps must be strictly non-decreasing: if the timestamp on any line in the file were less than a previous timestamp, then the Feed Simulator would abort with an error message. (It wouldn't make sense for the Feed Simulator to generate a tuple, say, at t=2.479 seconds, and then generate a tuple at t=2.211 seconds!)
Try sbfeedsim
firstapp-trace2.sbfs to run this example.
If your timestamps don't start at zero, you can use the origin attribute of the timestamp tag to specify an offset. See the
StreamBase Feed Simulation XML reference topic for
more information.
Often you will find it useful to instruct the Feed Simulator to choose at
random from a predefined set of possible values. You can use enumerations
to do this. Define an enumeration with the define-enum tag, and then refer to it later with
the special enum source:
<feed-simulation>
<define-enum name="item-names">
<value>CEREAL</value>
<value>MILK</value>
<value weight="2">EGGS</value>
</define-enum>
<stream name="ItemsInputStream1">
<rate per-second="1.0"/>
<field name="ITEM_NAME"> <enum ref="item-names"/> </field>
<field name="SKU"> <uniform min="0" max="10000"/> </field>
</stream>
</feed-simulation>
The Feed Simulator will pick an item name for each tuple at random from the
item-names enumeration. Note the weight="2" to make EGGS twice as likely to appear as other
items. This configuration can be run by typing sbfeedsim firstapp-enum.sbfs.
This section covers advanced topics.
Sometimes you may find it useful for some fields' values to depend on the
contents of a "key field." For instance, let's say that for CEREAL we
want SKUs to be fixed at 1, for MILK we want SKUs chosen at random from
between 1000 and 2000, and for everything else we want SKUs to start at
3000 and increase. We can use the switch construct to do this:
<switch field="ITEM_NAME">
<case value="CEREAL">
<field name="SKU"> <constant value="1"/> </field>
</case>
<case value="MILK">
<field name="SKU"> <uniform min="1000" max="2000"/> </field>
</case>
<default>
<field name="SKU"> <step min="3000"/> </field>
</default>
</switch>
Some sources, like random-walk, are
stateful: their value is not generated independently each time but
depends on previous values. Use group-by to
cause a separate state to be used for each value of a key field, such as
one random-walk for CEREAL, one for MILK, and one for EGGS:
<stream name="ItemsInputStream1">
<rate per-second="1.0"/>
<field name="ITEM_NAME"> <enum ref="item-names"/> </field>
<group-by field="ITEM_NAME">
<field name="SKU"> <random-walk min="1000" max="2000"/> </field>
</group-by>
</stream>
Of course, this does‑not make much sense for groceries. But it would be useful, for example, in the case of stock tickers (where each ticker symbol should have its own state) or position information (where each of a series of objects is moving independently).
This topic described legacy StreamBase feed simulations, which are provided for compatibility with pre-3.7 .sbfs configurations. If you need more information, see:
