StreamBase Documentation


StreamBase Expression Language and Functions

This topic describes the StreamBase expression language and each of the built-in functions that you can use with StreamBase applications. In application diagrams, you use StreamBase functions in operators (except Heartbeat, Metronome, Split, and Union, which do not take an expression). You can also use StreamBase expressions in statements within StreamSQL applications, as described in the StreamSQL Guide.

Pointer to StreamBase Functions

Are you looking for information about the functions provided by StreamBase? Start in the StreamBase Functions section below. If you want an alphabetized list of all functions, with links to the descriptions, please see Alphabetical Index to All StreamBase Functions.

Evaluating Expressions

A useful tool for understanding and testing expressions is the sbd --eval command, documented in the sbd help topic. Use sbd --eval to test the results of expressions as you read this topic, and before using them in your StreamBase applications. For example:

$ sbd --eval "1e1 * (15 % -4)"
(double) 30.0

Data Types

Expressions have static data types. Supported types:

  • bool
  • int
  • long
  • double
  • timestamp
  • string

Booleans

In the Filter and Join operators, you will notice the term "Predicate," which is a type of an expression that always evaluates to either true or false (a Boolean).

Integers

The valid range of integers (int) is -2147483648 to 2147483647, inclusive. Integers are always signed. So the following expression is invalid, because the number 2147483648 is invalid:

2147483648 + 0

However, integer computations wrap around unchecked, so this next expression is valid and evaluates to -2147483648:

2147483647 + 1

The following expression is valid, because the dot (.) means that the number is treated as a floating-point value, which can have a much greater value:

2147483648. + 0.

Doubles

A double is always 8 bytes. Any numeric literal in scientific notation is considered a double. That is, you do not need a decimal point for a numeric literal to be recognized as a double (instead of an int). For example, 1e1 is considered to be 10.0 (double) instead of 10 (integer).

Strings

StreamBase provides support for large tuples, including large string fields. In releases prior to StreamBase 3.0.0, the maximum for an individual string field in a tuple was 32K. While the theoretical maximum for a string in a single-field tuple is now around 2 gigabytes (2 GB), of course the practical limit is much smaller. Be aware that moving huge amounts of data through any application will negatively impact throughput.

Important: Before you can use tuples that exceed the operating system's page size (such as 4096 bytes on Windows or Linux, or 8192 bytes on Solaris), you must increase the StreamBase memory parameters defined in the sbd.sbconf configuration file. The StreamBase page-size parameter must be a multiple of the system page size. You must allocate sufficient memory for the StreamBase Server process (sbd) so that it can host your application and its largest tuples. However, do so with the caveats that the operating system will be constrained by competing concurrent requests for RAM space from other processes. Thus you will need to estimate the total amount of requests for memory resources on the machine during peak loads. Always keep in mind that you can help your application's throughput by limiting the size of input tuples to the bare minimum. Keeping the input tuples as small as reasonably possible is always more efficient than dropping tuples during the processing of the StreamBase application.

For details about the memory parameters, please see the StreamBase Server Configuration XML: <page-pool> element topic in the Reference Guide. For general ideas about tuning StreamBase applications, please see the Tuning Guide.

Null Literals in Expressions

You can use nulls in the StreamBase expression language. Starting in StreamBase 2.0.0, we introduced five null literals, one for each type:

int(null)
double(null)
bool(null)
string(null)
timestamp(null)

Each null literal is case sensitive; for example, Int(NULL) is invalid. The data type of each null is never implicit: you must specify which type of null you are using in the expression.

In general, when you apply any arithmetic operator or function with <data-type>(null) as one of the arguments, the result is null. Three exceptions are the isnull() and notnull() functions, which test whether an expression evaluates to null, and the coalesce() function, which selects a non-null value from a list of potentially null arguments.

Expression Example... Result
3 + int(null) A null int
int(null) + int(null) A null int
int(null) + bool(null) A typecheck error. You cannot add an int and a bool.
if bool(null) then 3 else 4 A null int
int(null) == int(null) A null bool, because null is not equal to itself
int(null) != int(null) A null bool, because null is not equal to itself
isnull(int(null)) A bool that evaluates to true
notnull(int(null)) A bool that evaluates to false

For related information, please see Using Nulls in StreamBase Applications.

Case Sensitivity

Expressions are case sensitive. Function names should be all lowercase.

Back to Top ^

Identifiers

Only use alphabetic characters, numbers, and underscores. The first character must be alphabetic or an underscore. Do not use hyphens ( - ) or other special characters. Identifier rules apply to the name you assign to any StreamBase component, including schemas, fields, operators, tables, lock sets, and modules.

Note: the StreamBase software creates identifiers that include colons, but you may not use colons in your identifiers.

Reserved Words

When you name fields, do not use the following reserved words:

  • true
  • false
  • if
  • then
  • else
  • null
  • int
  • long
  • bool
  • double
  • timestamp
  • string

In an Input Stream's Properties View, StreamBase Studio will not prevent you from specifying a reserved word for a field name. However, a schema that contains reserved words will cause one or more typecheck errors in the connected downstream components. The error message in the Typecheck View will identify the offending field's name. Rename the field in the Input Stream's Property View.

Back to Top ^

Qualifying Field Names

If a name conflict occurs in a field name that's used in an expression within an operator instance, you can qualify them to clarify the inbound tuple stream to which you are referring.

In the Properties view for a Gather operator, you can qualify the fields as input[port-number].field-name.

Example: input1.is_alarm, input2.is_alarm, input3.is_alarm.

In the Properties view for a Join operator, you can qualify the fields in the two input streams as input1.field-name and input2.field-name. The input1.field-name refers to a field in the stream arriving at the "top" port (#1). The input2.field-name refers to a field in the stream arriving at the "bottom" port (#2).

Examples:

 input1.SKU
 input2.SKU

In the Properties view for a Query operator, you can qualify the fields in the two input streams as input.field-name and old.field-name. The input.field-name refers to the current input tuple, while old.field-name refers to the field's old (prior) value.

Examples:

 Table1_current_Name
 Nasdaq100Table_old_Symbol

Back to Top ^

Arithmetic Operators

You may use +, -, *, /, and %. The ^ expression is not supported.

For example, in the Compliance sample that we provide with the StreamBase kit, the is_alarm field is a bool. In one of the Compliance application's Map operators, we use this expression for the is_alarm check on each tuple:

((((sector_in_fund_value_t + (shares_traded * price_at_trade)) * 100) / fund_value_t) > 25)

This expression is designed to execute the 25% sector test; that is, one sector is not allowed to be more than 25% of the total fund value. is_alarm will be set to true if the test fails. If you are interested in viewing the full sample, use File > Load Samples from the StreamBase Studio top menu, and select Compliance from the list.

Modulus Operators

The modulus operator ( % ) finds the remainder of a division operation. Division and modulus are always related such that a/b*b+(a%b) == a, where a/b is an integer division. For example, 15%4 can be resolved as follows: as:

    15 / 4 * 4 + MOD == 15
    3 * 4 + MOD = 15
    12 + MOD = 15
    MOD = 3

This relationship is preserved with negative divisors. As a result, whenever the quotient is negative, the modulus is negative. For example:

    15 % 4 = 3
    15 % -4 = 3
    -15 % 4 = -3
    -15% -4 = -3

Also note that if you divide a smaller number by a larger number, the modulus is always the smaller number. For example:

    4 % 15 = 4 

This behavior may be different than modulus in some programming languages. Hint: The easiest way to understand StreamBase modulus operations is to try different expressions out using the sbd --eval command.

Back to Top ^

Unary Operators

Unary operators act on only one operand in an expression. +a and -a are valid for integer and double types.

Back to Top ^

Relational Operators

You may use the <, <=, = or ==, >, >=, and != relational operators. The relational operator <> is not supported.

Back to Top ^

Logical Operators

You may use && or AND, || or OR, and !.

Logical Operator Meaning
&& or AND AND
|| or OR OR
! NOT

Note: The evaluation of expressions using "&&" and "||" will short-circuit. This means that after StreamBase encounters the first term that evaluates as false, no other terms in a statement using "&&" will be evaluated. Similarly, evaluation of a statement using "||" will stop after the first true term.

For related information on Boolean logic and nulls, see Effect of null values in Boolean logic.

Back to Top ^

Precedence Order

The precedence order of mathematical operators in expressions, from highest to lowest, is as follows:

  1. Unary operators: ! and -

    The logical negation operator, !, reverses the meaning of its operand.

    The unary negation operator, -, produces the negative of its operand.

  2. Multiplicative operators: * and / and %

    Multiplication: *

    Division: /

    Modulus, the remainder from division: %

  3. Additive operators: + and -

    Addition: +

    Subtraction: -

  4. Relational operators: < and <= and > and >=

    Less than: <

    Less than or equal to: <=

    Greater than: >

    Greater than or equal to: >=

  5. Equality operator: == or =
  6. Not equal: !=
  7. Logical AND: && or AND
  8. Logical OR: || or OR

You can use parentheses in expressions to override the default precedence.

Back to Top ^

Casting

In an expression, to cast a value to a double, use the double() function.

To cast a value to a string, use the string() function.

To cast a value to a Boolean, use the bool() function. The bool() function treats numeric (int or double) 0 as false; all other numeric values are cast to true. A "true" string (case-insensitive) maps to true and a string of "false" maps to false. All other strings are errors.

Back to Top ^

White Space

The StreamBase parser ignores any spaces, new lines, and tab characters in expressions.

Back to Top ^

String Literals

You can use single quotes ( 'string' ) or double quotes ( "string" ) around strings. Escape characters are supported, as follows:

Character Results in...
\" Double quotation mark
\' Single quotation mark
\n Newline
\t Tab
\b Backspace
\r Carriage return
\\ Backslash

Examples:

String Literal Results in...
"He said, \"Hello.\"" He said, "Hello."
'She said, \'Hello.\'' She said, 'Hello.'
"A\tB" A<tab>B
"A\nB" A<newline>B
"C:\\WINDOWS" C:\WINDOWS

Back to Top ^

Boolean Literals

The Boolean literals true and false are supported.

Back to Top ^

Conditional Expressions

In an expression such as: if p then e1 else e2, p must be a valid bool expression, and both e1 and e2 must be expressions that have the same type. If p is true, then e1 will be evaluated and the result returned. If p is false, e2 will be evaluated and the result returned. In either case the other sub-expression is not evaluated.

Note: In the StreamBase expression language, each if clause must have a pair of explicit then and else clauses. Should you create compound if clauses, to avoid ambiguity, remember to specify the then and else clauses for each if clause. For an example, please see the next section.

Back to Top ^

Compound Conditional Expressions

You can use combinations of if...then and else if...then statements to form compound conditional expressions. For example:

if i==0 then "Buy" else if i==1 then "Sell" else if i==2 then "Hold" else "None of the above"

Here is a second example (indented for clarity), where we nest an "if then else" in a then clause:

if p1
    then
        if p2
            then 1
            else 2
    else
        if p2
            then 3
            else 4

Notice how each if clause contains a then clause and an else clause.

Back to Top ^

Concatenation in Expressions

You can enter an expression that performs concatenation. For a string variable, the concatenation expression is stringvar + x. For example: b + 25. For a numeric variable with a static string, the concatenation expression is "staticstring" + numvar. For example: "foo" + a. For two variables, the concatenation expression is stringvar + numvar. For example: b+a. Note that when you use an expression to concatenate a number to a string, the total size becomes the original string size + 12.

Back to Top ^

Timestamp Expressions

Timestamp expressions are represented in the following date/time format:

YYYY-MM-DD hh:mm:ss

The maximum timestamp precision is one millisecond.

StreamBase parses timestamps as follows:

"2005-10-08" Implicit time of "00:00:00", local time zone
"2005-10-08 13:17" Implicit seconds of "00", local time zone
"2005-10-08 13:17:23" Local time zone
"2005-10-08 13:17:23.12355" Fractional seconds
"2005-10-08 13:17:23" == "2005-10-08 13:17:23"
"2005-10-08 24:00:00" == "2005-10-09 00:00:00"
"2005-10-08 13:17:23-0500" Explicit time zone (EST in this case)
"2005-10-08 13:17-05:00" Explicit time zone, implicit seconds

Back to Top ^

When used with comparison operators ( == != > < <= >= ) you must compare time interval-to-interval, or timestamp-to-timestamp. You cannot use the comparison operators with interval-to-timestamp.

To express constant intervals, you can use the seconds() and minutes() functions. For example, if you want to add 60 seconds to a timestamp, you can enter expressions such as:

t + seconds(60)
t + minutes(1)

Back to Top ^

Parameters in Expressions

You can declare operator-parameters in the sbd.sbconf file, and then use them as parameters within StreamBase expressions. StreamBase Server Configuration XML explains how to declare operator-parameters.

To then reference a parameter in an expression, wrap the operator-parameter name in braces and prefix the open brace with a dollar sign. If you are referencing an operator-parameter that is defined as a string, use quotes around the entire reference.

For example, consider an sbd.sbconf file that defines these two operator-parameters:

<operator-parameters>
    <operator-parameter name="MyInt" value="2"/>
    <operator-parameter name="MyString" value="somestring"/>
</operator-parameters>

You could reference the first parameter (for example, in an output field) using an expression like 35* ${myInt}. The expression would evaluate at run time to 70. This StreamSQL statement references the second parameter:

 SELECT * FROM InputStream1 "${MyString}" AS source

Notice the quotation marks: they are needed so that the expression is resolved as a string.

Note: Expression parameters are distinct from both module parameters, which are described in Using Module Parameters, and StreamSQL CREATE PARAMETER statements:

  • A module parameter is defined only for a parameterized module (or, in StreamSQL, a Java operator or an embedded Java adapter), not globally in the sbd configuration.
  • If you use an expression parameter and a module parameter with the same name, the module parameter value takes precedence.
  • Expression parameters can only be used within expressions. StreamSQL parameters are more flexible: they can be used to represent expressions as well as other parts of a statement.

Note: Expression parameters can be referenced in both EventFlows and StreamSQL applications. However, if you convert an EventFlow .sbapp file to StreamSQL, any expression parameters are resolved during the conversion; the parameter itself is not preserved.


StreamBase Functions

This section provides reference information for all the StreamBase functions. StreamBase provides two types of functions:

Simple functions are evaluated over one set of arguments, and return a single result. The strlen() function is an example of a simple function. Simple functions can be used in expressions for any StreamBase operator, except Heartbeat, Metronome, and Union, which do not use an expression.

Aggregate functions are evaluated over multiple sets of arguments and return a single result. In StreamBase applications, aggregate functions are used only in Aggregate operators and in Query operators that do query read operations. The avg() function, which calculates an average value for tuples in an Aggregate window, is an example of an aggregate function.

function_name(arg1,arg2,arg3,...)

Alphabetical Index to All StreamBase Functions

For your convenience, here is a complete, alphabetized index into the available functions provided by StreamBase, with links to the function descriptions in this section. Note that if you do not find a built-in function that you need here, you can use the StreamBase custom function APIs to implement your own function and configure your application to use it. For details, please see these topics in the API Guide:

Function Name & Link to Section Summary Comment
abs function Simple function: math
acos function Simple function: math
alpha function Aggregate function: statistical calculations
avg function Aggregate function: statistical calculations
asin function Simple function: math
atan function Simple function: math
atan2 function Simple function: math
beta function Aggregate function: statistical calculations
black_scholes function Simple function: financial
bool function Simple function: type conversions
There are two versions of callcpp: Simple function: C++ API call

Aggregate function: C++ API call
There are two versions of calljava: Simple function: Java API call

Aggregate function: Java API call
catchexception function Simple function: check for errors in expression
cbrt function Simple function: math
closeval function Aggregate function: for windowing
coalesce function Simple function: working with nulls
compound_interest Simple function: financial
correlation_coefficient function Aggregate function: statistical calculations
correlation_coefficientp function Aggregate function: statistical calculations
cos function Simple function: math
count function Aggregate function: statistical calculations
count_distinct function Aggregate function: statistical calculations
covariance function Aggregate function: statistical calculations
covariancep function Aggregate function: statistical calculations
date function Simple function: creating absolute timestamps
days function Simple function: creating interval timestamps
double function Simple function: type conversions
error function Simple function: error
exp function Simple function: math
exp_moving_avg function Aggregate function: statistical calculations
firstval function Aggregate function: for windowing
floor function Simple function: math
from_gmtime function Simple function: creating absolute timestamps
from_localtime function Simple function: creating absolute timestamps
get_day_of_week function Simple function: working with timestamp fields
get_day_of_month function Simple function: working with timestamp fields
get_hour function Simple function: working with timestamp fields
get_minute function Simple function: working with timestamp fields
get_month function Simple function: working with timestamp fields
get_second function Simple function: working with timestamp fields
get_year function Simple function: working with timestamp fields
hash function Simple function: type conversions
hours function Simple function: creating interval timestamps
indexof function Simple function: strings
int function Simple function: type conversions
intercept function Aggregate function: statistical calculations
interval function Simple function: creating absolute timestamps
isnan function Simple function: NaN (not a number)
isnull function Simple function: detecting null values
lastindexof function Simple function: strings
lastval function Aggregate function: for windowing
ln function Simple function: math
log10 function Simple function: math
There are two versions of max(): Simple function: math

Aggregate function: statistical calculations
There are two versions of min(): Simple function: math

Aggregate function: statistical calculations
minutes function Simple function: creating interval timestamps
notnan function Simple function: detecting NaN (not a number)
notnull function Simple function: detecting null values
now function Simple function: creating absolute timestamps
openval function Aggregate function: for windowing
pow function Simple function: math
product function Aggregate function: statistical calculations
seconds function Simple function: creating interval timestamps
set_day_of_month function Simple function: working with timestamp fields
set_day_of_week function Simple function: timestamp
set_hour function Simple function: working with timestamp fields
set_minute function Simple function: working with timestamp fields
set_month function Simple function: working with timestamp fields
set_second function Simple function: working with timestamp fields
set_year function Simple function: working with timestamp fields
sin function Simple function: math
slope function Aggregate function: statistical calculations
sqrt function Simple function: math
stdev function Aggregate function: statistical calculations
stdevp function Aggregate function: statistical calculations
strftime function Simple function: creating absolute timestamps
string function Simple function: type conversions
strlen function Simple function: strings
strpinterval function Simple function: creating absolute timestamps
strptime function Simple function: creating absolute timestamps
strresize function Simple function: strings
strresizetrunc function Simple function: strings
substr function Simple function: strings
sum function Aggregate function: statistical calculations
tan function Simple function: math
time function Simple function: creating absolute timestamps
timestamp function Simple function: type conversions
today function Simple function: creating absolute timestamps
to_seconds function Simple function: working with timestamp fields
variance function Aggregate function: statistical calculations
variancep function Aggregate function: statistical calculations
vwap function Aggregate function: statistical calculations
weeks function Simple function: creating interval timestamps

Simple Functions

This section describes the simple functions that you can use in expressions for any StreamBase operator, except the Heartbeat, Metronome, and Union operators, which do not accept an expression.

This section is organized according to categories of simple functions:

Simple Functions: Financial

  • black_scholes(type, underlying, strike, dividendYield, riskFreeInterestRate, Volatility, exerciseDate, dealDate)

    The black_scholes function calculates fair-value and risk statistics (delta, gamma, vega...) for European style options on securities with continuous dividend yields. This is known as the "Black-Scholes Generalized" model. It also calculates implied volatility.

    The black_scholes function takes eight input arguments; the first seven arguments are required, while the eighth argument, dealDate, is optional.

    The arguments are:

    1. type: Required. Should be set to one of the following string values:
      • Call  (The price of an option.)
      • Put  (The price of an option.)
      • DeltaCall  (The sensitivity of the price of an option to changes in the price of the stock.)
      • DeltaPut  (The sensitivity of the price of an option to changes in the price of the stock.)
      • ThetaCall  (Theta measures how the price of an option changes with time.)
      • ThetaPut  (Theta measures how the price of an option changes with time.)
      • Gamma  (The sensitivity of a stock's delta to the stock price.)
      • Vega  (The rate of change of the value of an option with respect to the volatility of the stock's price.)
      • RhoCall  (The sensitivity of the price of an option with respect to the risk-free interest rate.)
      • RhoPut  (The sensitivity of the price of an option with respect to the risk-free interest rate.)
      • ImpliedVolatilityCall  (Implied volatility of the underlying stock for a given price.)
      • ImpliedVolatilityPut  (Implied volatility of the underlying stock for a given price.)
    2. underlying: Required. A double. The price of the underlying stock.
    3. strike: Required. A double. The strike price of the stock on the exercise date.
    4. dividendYield: Required. A double. For example, 0.03 for 3.0%.
    5. riskFreeInterestRate: Required. A double. For example, 0.05 for 5.0%.
    6. Volatility (or value): Required. A double. This input argument is overloaded. When computing anything except the implied volatility for a given call or put price this argument is the volatility of the stock. For example, 0.2 for 20%. When computing implied volatility this argument is the option price.
    7. exerciseDate: Required. A timestamp. The option exercise date.
    8. dealDate: Optional. A timestamp. The option deal date. If not provided, the current date is the default.
  • compound_interest(principleValue, matureValue, numberOfPeriods, guess)

    Returns the compound interest given the principle value, mature value, and number of periods. An optional fourth argument, guess, can be used to set the initial compound interest used by the function. The compound_interest function uses the Newton-Raphson algorithm to compute compound interest rate. This algorithm performs the same calculations repetitively. Each iteration results in a compound interest that is closer to the final result. The number of iterations required to get the results within an acceptable inaccuracy (1.0e-4) depends on the number of periods and the initial compound interest used to start the calculation.

    If the guess argument is not provided, the default initial value is 0.01 (1%). If the function returns a NaN (means "not a number"), you can provide the initial compound interest in this fourth argument. If the initial value is closer to the actual rate the computation, it can be sped up significantly.

Simple Functions: Math

  • abs(e)

    Returns the absolute value of the double or integer Expression e. The return type is the same as the expression's type.

  • acos(x)

    Calculate in radians the arc cosine of x, that is the value whose cosine is x. Undefined outside of the range -1 to 1.

  • asin(x)

    Calculate in radians the arc sine of x, that is the value whose sine is x. Undefined outside of the range -1 to 1.

  • atan(x)

    Calculate in radians the arc tangent of x, that is the value whose tangent is x.

  • atan2(x, y)

    Calculate in radians the arc tangent of two variables, x and y. This is similar to calculating the arc tangent of (y/x), except that the signs of both arguments are used to determine the quadrant of the result. tangent is x.

  • cbrt(x)

    Calculate the (real) cube root of x.

  • cos(x)

    Calculate the cosine of x, where x is given in radians.

  • exp(x)

    Calculate the value of e (the base of natural logarithms) raised to the power of x.

  • floor(d)

    Returns the largest (closest to positive infinity) integer value not greater than the double value d.

  • ln(x)

    Calculate the natural logarithm of x.

  • log10(x)

    Calculate the base-10 logarithm of x.

  • max(e1,...,en)

    Returns the maximum value from an expression, where e1 through en must all be either integer or double expressions. The return type is the same as its inputs.

  • min(e1,...,en)

    Returns the minimum value from an expression, where e1 through en must all be either integer or double expressions. The return type is the same as its inputs.

  • pow(x, y)

    Calculate the value of x raised to the power of y.

  • sin(x)

    Calculate the sine of x, where x is given in radians.

  • sqrt(x)

    Calculate the non-negative square root of x.

  • tan(x)

    Calculate the tangent of x, where x is given in radians.

Simple Functions: NaN

The functions in this section detect NaN (not a number) values. They require one input argument and return a bool. The input argument data type can be int, double, timestamp, or null. bool and string are not valid data types.

  • isnan(value)
    • Returns true if the input argument is NaN
    • Returns null if the input argument is null
    • Returns false otherwise
  • notnan(value)
    • Returns false if the input argument is NaN
    • Returns null if the input argument is null
    • Returns true otherwise

    Note that notnan returns the opposite of isnan, except if the argument is null, in which case they both return null.

Simple Functions: Nulls

  • isnull(value)

    Returns true if the argument is a null value.

  • notnull(value)

    Returns true if the argument is not a null value. This function always returns the opposite of isnull(value).

  • coalesce(value1, value2...)

    Returns the first argument that is non-null, or a null value if all arguments are null. All arguments must have the same type.

Simple Functions: Errors

  • catchexception(v1, v2, ...)

    Attempts to evaluate all arguments in order, returning the first one that evaluates without an error, or null if all arguments evaluate to an error. All arguments must have the same type.

    For instance, the following attempts to parse str as an interval, but returns a null timestamp if parsing fails:

    catchexception(interval(str))

    The following attempts to divide a by b, but returns -1 if the division fails:

    catchexception(a/b, -1)
  • error(s)

    Throw an error with message s. In the default configuration, this shuts down the server.

Simple Functions: Strings

In addition to the functions in this section, see the string() function that is in the Type Conversions section.

  • indexof(haystack, needle, [start])

    Returns the index of the first instance of the string needle within the string haystack, starting at position start. Returns -1 if the string is not found. If start is unspecified, it is taken to be 0, the start of the string.

  • lastindexof(haystack, needle, [lastStart])

    Returns the index within haystack of the rightmost instance of needle, for which the index is less than or equal to lastStart. Thus, the return value will always be less than or equal to lastStart. Returns -1 if the needle is not found. If lastStart is omitted, the entire string is searched.

  • strlen(s)

    Returns the length of the string s.

  • strresize(string, length)

    All StreamBase fields have a fixed maximum size; this function lets you change the maximum size of a string field. The function returns the original string, unmodified, but with its maximum size set to the given length. If this function is called with a string whose length exceeds the given length, a runtime error is triggered and evaluation is aborted. The length must be an integer constant. Note that this function does not pad the original string with spaces if it is smaller than the given length; the actual string value is returned unmodified.

  • strresizetrunc(string, length)

    Similar in function to strresize, but strings that are too large are truncated, rather than triggering an evaluation error.

  • substr(s, begin, length)

    Returns a substring of the string s, starting at character begin (zero-indexed) with a size of length characters.

    Note: Prior to StreamBase 2.0.0, the substr function used these arguments: substr(s, begin, end). Please check your StreamBase applications, and if substr() is used, make the appropriate modifications to the third argument's value.

Simple Functions: Timestamps

Creating absolute timestamps:
Creating interval timestamps:
  • seconds(x)

    Returns a timestamp representing an interval of x seconds.

  • minutes(x)

    Returns a timestamp representing an interval of x minutes.

  • hours(x)

    Returns a timestamp representing an interval of x hours.

  • days(x)

    Returns a timestamp representing an interval of x days.

  • weeks(x)

    Returns a timestamp representing an interval of x weeks.

Working with timestamp fields:

Simple Functions: Type Conversions

  • bool(e)

    Converts e to a value of type bool where e is a bool, int, double, or string.

    • An int value of 0 will return the Boolean false. All other int values, positive or negative, will return true.
    • A double value of 0 will return the Boolean false. All other double values, positive or negative, will return true.
    • A string value false will return a Boolean false and the string true will return Boolean true. The string is compared in a case-insensitive manner. For example, bool("FALSE") and bool("false") are equivalent. Any string other than false or true is an error.
    • A bool value will return itself.
  • double(e)

    Converts e to a value of type double where e is a bool, int, double, or string.

    • An int value will return the same value, but with type double. For example, double(3) returns 3.0.
    • A double value will return that same double value.
    • A string value will be parsed as a decimal number. For example, double("123.456") returns 123.456, but double("7abc") is an error. Scientific notation is supported, so double("1.2E4") returns 12000.0.
    • A bool value will return 1.0 if true, or 0.0 if false.
    • For timestamp values, you should use the to_seconds() function instead.

    Also See: The to_seconds() function, used to convert timestamps to double.

  • hash(s)

    Uses a hashing algorithm to convert a string s to an integer. For example, you could use the hash function to convert a stock symbol to an integer, so that the StreamBase data parallelism feature can distribute incoming tuples across the multiple instances of a module. Data parallelism bases the distribution on a tuple field that you designate, and that field must be of type integer.

  • int(e)

    Converts e to a value of type int where e is a bool, int, double, or string.

    • An int value will return that same int value.
    • A double value will have its fractional part truncated. For example, int(3.4) returns 3.
    • A string value will be parsed as a decimal number. For example, int("123") returns 123, but int("7abc") is an error.
    • A bool value will return 1 if true, or 0 if false.
  • string(e)

    Converts e to a value of type string where e is a bool, int, double, string, or timestamp.

    • An int value will return the string representation of that value.
    • A double value will return the string representation of that value.
    • A string value will return itself.
    • A bool value will return the string true if true, or false if false.
    • A timestamp value, if represented as a date/time, will return a string which includes the explicit time zone. A timestamp represented as a numeric value will return the string representation of that value.
  • timestamp(s)

    Return a timestamp parsed from the argument s, a string representation of a date.

    Also See: The strptime() function, used to convert strings to timestamps.

Simple Functions: C++ API Call

  • callcpp("function" [,arg...])

    Runs a C++ custom function directly from a StreamBase operator. Custom C++ functions can be accessed by the simple form of callcpp in any expression except in aggregate expressions. (To use callcpp in aggregate expressions, refer to Aggregate Functions: C++ API Call.)

    To learn about coding custom C++ functions, refer to Creating Custom C++ Functions in the API guide.

Simple Functions: Java API Call

  • calljava("class", "method" [,arg...])

    Runs a Java method directly from the StreamBase expression language.

    The first two arguments must be a public class and public static method in a Java application that you have imported.

    Custom Java functions can be accessed by the simple form of calljava in any expression except in aggregate functions. (To use calljava in aggregate functions, refer to Aggregate Functions:Java API Call.)

    To learn about coding custom Java functions, refer to Creating Custom Java Functions in the API guide.

Note: A StreamBase string is not the same as a java.lang.String. No encoding is used when passing it as a byte[]. The raw data that was transmitted through StreamBase is used.

Return Types and Argument Types

There can be any number of arguments (including none). Each argument associates a StreamBase type with one of the following primitive or object Java types:

StreamBase Type Java Primitive Java Object
bool boolean Boolean
int int Integer
double double Double
timestamp com.streambase.sb.Timestamp None: use the Java primitive
string byte[] String

Notes:

  • If any primitive value is null, the custom function is not called. However, object arguments can be used to pass in null values to the custom function.
  • In general, use the primitive form if speed is important; use objects if you need to specify null values.
Using strresize for Java Methods That Return byte

When using calljava to invoke a Java method that returns a byte[], you may need to use the strresize() function provided by StreamBase, so that the output length of the string is known. For example, if you attempt to use the following expression as a field value in a Map operator:

  calljava('com.mycompany.MyMethod', 'myFunction', abc)

...and myFunction is declared as returning a byte[], you will encounter an "Unable to determine string length" typecheck error. Use strresize to specify the maximum length of the output, so the operator's output schema can be determined. For example:

  strresize(calljava('com.mycompany.MyMethod', 'myFunction', abc)

Note: A StreamBase string is not the same as a java.lang.String. No encoding is used when passing it as a byte[]. The raw data that was transmitted through StreamBase is used.

Aggregate Functions

This section is organized as follows:

Introduction

Unlike simple functions, which operate on a single field, aggregate functions evaluate columns of data from windows or tables, and return a single result. Aggregate functions can be used only within aggregate expressions in Aggregate operators, or within output expressions in Query Operators that do read operations. In StreamSQL applications, aggregate functions can be be used in SELECT statements related to aggregate or query read operations.

Aggregate expressions can contain a single aggregate function, or they can combine multiple aggregate functions together with fields from an input stream or an associated Query Table, and constants. The following screen illustrates:

Example of aggregate expression

Here is the equivalent expression in a fragment of a StreamSQL SELECT statement:

SELECT count() * avg(price) AS cost

Aggregate Functions: Windowing

Windowing functions work on columns of data from either aggregate windows or read operations on tables.

  • firstval(f)

    Returns the first value for field f in a window or table column.

  • lastval(f)

    Returns the last value for field f in a window or table column.

  • openval("dimension")

    Can only be used in an Aggregate operator, not in a Query Operator. Returns the lower limit of the specified dimension for the current window. The dimension name is passed as a quoted string, and must match the name of a dimension in the Aggregate operator. This function works even if the dimension specified is not the one that actually opened the current window; in that case the value returned is the value of the dimension at the time the window was created.

    Note that the value returned by openval might different from any of the the actual tuple values included in the window. For example, consider an Aggregate operator with a dimension CountDim, where the windows created by that dimension include a window with a range of 5 to 10. If the actual tuple values in the window are 6, 7, and 9, then openval(CountDim) will return 5 for that window, not 6.

  • closeval("dimension")

    Can only be used in an Aggregate operator, not in a Query Operator. Returns the upper limit of the specified dimension when the window is closed. The dimension name is passed as a quoted string, and must match the name of a dimension in the Aggregate operator. This function works even if the dimension specified is not the one that actually closed the current window; in that case the value returned is the value of the dimension at the time the window was closed.

    Note that the value returned by closeval might different from any of the actual tuple values included in the window. For example, consider an Aggregate operator with a dimension CountDim, where the windows created by that dimension include a window with a range of 5 to 10. If the actual tuple values in the window are 6, 7, and 9, then closeval(CountDim) returns 10 for that window, not 9.

Aggregate Functions: Statistical Calculations

  • alpha(index, price, dividend)

    Returns a double that may indicate the part of a stock's movement that is independent of the Index's movement.

    Examples of stocks increasing in alpha could be those with take-over rumors, under strong syndicate manipulation, or having strong expectations of good results; that is, factors that make them increasingly move independently off the Index.

    The beta of a stock is defined as the slope of a regression line in a scatter graph of paired data points representing percentage changes of an index (return of an index) and the corresponding change (return of stock) in the price of a stock. The StreamBase beta function also includes the stock dividend in the calculation of the stock return.

    The alpha is the point where this regression line cuts the Y-axis. To reiterate, a stock's beta can be described as that part of a stock's movement that is influenced by the Index, while a stock's alpha can be regarded as that part of a stock's movement that is independent of the Index's movement.

    The alpha function's arguments are:

    1. index: the end-of-period market index (required, double)
    2. price: the end-of-period stock price (required, double)
    3. dividend: the stock dividend of the period (optional, double)

    The stock price must have already figured in any stock split (or reverse split). The alpha is usually calculated over a period of 61 months. This aggregate function assumes that the input data is already normalized. That is, index, price, and dividend are of the same period. If they are of different periods, the yield (return) should be normalized first and then use the intercept function to calculate alpha.

  • avg(f)

    Returns the average value computed for field f for all tuples in the Aggregate's window.

  • beta(index, price, dividend)

    Returns a double that may indicate the tendency of a security's returns to respond to swings in the market. A beta of 1 indicates that the security's price will move with the market. A beta of less than 1 means that the security will be less volatile than the market. A beta value that is greater than 1 indicates that the security's price will be more volatile than the market.

    For related information, see the description of the alpha function.

    Beta is a measure of a security's or portfolio's volatility, or systematic risk, in comparison to the market as a whole. For example, if a stock's beta is 1.2, it is theoretically 20% more volatile than the market.

    The function's arguments are:

    1. index: the end-of-period market index (required, double)
    2. price: the end-of-period stock price (required, double)
    3. dividend: the stock dividend of the period (optional, double)

    The stock price must have already figured in any stock split (or reverse split). The beta is usually calculated over a period of 61 months. This aggregate function assumes that the input data is already normalized. That is, index, price, and dividend are of the same period. If they are of different periods, the yield should be normalized first and then use the slope function to calculate beta.

  • correlation_coefficient(price, index)

    Returns the correlation coefficient for two fields, price and index, for all tuples in the Aggregate's window. This function takes two input arguments (both can be an int or a double) and returns a double. The correlations coefficient is a measure that determines the degree to which two variable's movements are associated. The correlation coefficient will vary from -1.0 to 1.0. The value -1.0 indicates perfect negative correlation, and 1.0 indicates perfect positive correlation. 0 means no correlation.

    Notes:

    • The correlation_coefficient of 0 items is null.
    • The correlation_coefficient of 1 item is 1.0.
    • The correlation_coefficient of N identical items is 1.0.
  • correlation_coefficientp(price, index)

    Returns the correlation coefficient for two fields, price and index, for all tuples in the Aggregate's window. The correlation_coefficientp function is similar to the correlation_coefficient function; however you should use correlation_coefficientp when the data provided is the entire population, while the correlation_coefficient function is used for a random sample. The correlation_coefficientp function is calculated using the "biased" (or "n") method. The correlation_coefficient function is calculated using the "unbiased" (or "n-1") method.

    This function takes two input arguments (both can be an int or a double) and returns a double. The correlations coefficient is a measure that determines the degree to which two variable's movements are associated. The correlation coefficient will vary from -1.0 to 1.0. The value -1.0 indicates perfect negative correlation, and 1.0 indicates perfect positive correlation. 0 means no correlation.

    Notes:

    • The correlation_coefficientp of 0 items is null.
    • The correlation_coefficientp of 1 item is 1.0.
    • The correlation_coefficientp of N identical items is 1.0.
  • count([expr])

    Returns the number of tuples in the Aggregate's window. If this function is called with an argument, then tuples for which the argument evaluates to null will not contribute to the count. The argument can be of any type, and it's actual value is ignored, other than the fact that it is non-null. If no argument is specified, then all tuples are included in the count.

  • count_distinct(e1[,...])

    Returns the number of tuples in the Aggregate's window with distinct values for the parameters. This function must be called with at least one argument. Use commas to separate subsequent arguments. For the purpose of this function, null values of a field are considered equivalent to each other.

  • covariance(price, index)

    Returns the covariance for two fields, price and index, for all tuples in the Aggregate's window. This function takes two input arguments (both can be an int or a double) and returns a double. In financial applications, covariance can be used to measure the degree to which returns on two risky assets move in tandem. A positive covariance means that asset returns move together. A negative covariance means returns vary inversely.

    In the arguments, the price field is a stock price, and the index is the industry segment index.

    One method of calculating covariance is by looking at return deviations from the expected return in each scenario. Another method is to multiply correlation between the two variables by the standard deviation of each variable.

    For example, if you owned one asset that had a high covariance with another asset that you did not own, then you would receive very little increased diversification by adding the second asset. Of course, the opposite is true as well, adding assets with low covariance to your portfolio would lower the overall portfolio risk.

  • covariancep(price, index)

    Returns the covariance for two fields, price and index, for all tuples in the Aggregate's window. The covariancep function is similar to the covariance function; however you should use covariancep when the data provided is the entire population, while the covariance function is used for a random sample. The covariancep function is calculated using the "biased" (or "n") method. The covariance function is calculated using the "unbiased" (or "n-1") method.

    This function takes two input arguments (both can be an int or a double) and returns a double. In financial applications, covariance can be used to measure the degree to which returns on two risky assets move in tandem. A positive covariance means that asset returns move together. A negative covariance means returns vary inversely.

    In the arguments, the price field is a stock price, and the index is the industry segment index.

    One method of calculating covariance is by looking at return deviations from the expected return in each scenario. Another method is to multiply correlation between the two variables by the standard deviation of each variable.

    For example, if you owned one asset that had a high covariance with another asset that you did not own, then you would receive very little increased diversification by adding the second asset. Of course, the opposite is true as well, adding assets with low covariance to your portfolio would lower the overall portfolio risk.

  • exp_moving_avg(price, count, weight)

    Returns a type of moving average that is similar to a simple moving average, except that more weight is given to the most recent data. Also known as an exponentially weighted moving average, this type of moving average reacts faster to recent price changes than a simple moving average.

    The function's arguments are:

    1. price: the price of a stock (required, int or double)
    2. count: the number of periods over which the exponential moving average is calculated (required, int)
    3. weight: the weight (0 < weight < 1) of the price of the last tuple. (optional, double)

    The value of the third argument must between 0 and 1. If the value is not between 0 and 1 or if the argument is not provided, the weight is calculated from the following formula:

    weight = 2 / (count + 1)
    

    This function returns a double. This Exponential Moving Average function is implemented using an infinite aggregate window size. When configuring a StreamBase Aggregate operator that will use this exp_moving_avg(price, count, weight) function, do not configure the Advance and Size fields in the IDE, which results in infinite window size. Instead, set the Emit to 1 so that the aggregate emits for every tuple.

    The exp_moving_avg function is calculated using the following formula:

    EMA = (CDV * weight) + (PDA * ( 1 - weight))
    
    • EMA: exponential moving average of current period
    • PDA: exponential moving average of previous period
    • CDV: value of current period

    For the first EMA, there is no PDA. A simple moving average (SMA) is computed instead. When computing the SMA, the Null value is emitted for each input tuple until the aggregate operator receives enough tuples to compute SMA. For example, if you want to compute a 20-day exponential moving average of the prices of a securities, a Null is returned by the exp_moving_avg function for the first 19-days. On the 20th day, the average of the prices of the first 20 days is returned. Starting from the 21st day, the above formula is used to calculate EMA of each day.

  • intercept(x,y)

    For the data set of fields x and y for all tuples in the Aggregate's window, a "best-fit" linear regression line is calculated, and then the function returns the value where the line intersects the y-axis. The method that is used to find the line that best fits a group of points is called "least squares" (or "linear least squares"). This function takes two input arguments (both can be an int or a double) and returns a double.

  • max(f)

    Returns the maximum value computed for field f for all tuples in the Aggregate's window.

  • min(f)

    Returns the minimum value computed for field f for all tuples in the Aggregate's window.

  • product(f)

    Returns the product (multiplication) computed for field f for all tuples in the Aggregate's window. Supports datatypes int and double.

  • slope(x,y)

    For the data set of fields x and y for all tuples in the Aggregate's window, a "best-fit" linear regression line is calculated, and then the function returns the line's slope. The method that is used to find the line that best fits a group of points is called "least squares" (or "linear least squares"). The slope function takes two input arguments (both can be an int or a double) and returns a double.

  • stdev(f)

    Returns the standard deviation for field f for all tuples in the Aggregate's window. The function takes one input argument (an int or a double) and returns a double. Standard deviation is a measure of the dispersion of a set of data from its mean. The more spread apart the data is, the higher the deviation. For example, in financial applications, standard deviation could be applied to the annual rate of return of an investment to measure the investment's volatility (risk). A volatile stock would have a high standard deviation. In mutual funds, the standard deviation indicates how much the return on the fund is deviating from the expected normal returns.

  • stdevp(f)

    Returns the standard deviation for field f for all tuples in the Aggregate's window. The stdevp function is similar to the stdev function; however you should use stdevp when the data provided is the entire population, while the stdev function is used for a random sample. The stdevp function is calculated using the "biased" (or "n") method. The stdev function is calculated using the "unbiased" (or "n-1") method.

    The function takes one input argument (an int or a double) and returns a double. Standard deviation is a measure of the dispersion of a set of data from its mean. The more spread apart the data is, the higher the deviation. For example, in financial applications, standard deviation could be applied to the annual rate of return of an investment to measure the investment's volatility (risk). A volatile stock would have a high standard deviation. In mutual funds, the standard deviation indicates how much the return on the fund is deviating from the expected normal returns.

  • sum(f)

    Returns the sum value computed for field f for all tuples in the Aggregate's window.

  • variance(f)

    Returns the variance for field f for all tuples in the Aggregate's window. The function takes one input argument (an int or a double) and returns a double. Variance is a measure of the dispersion of a set of data points around their mean value. It is a mathematical expectation of the average squared deviations from the mean. Variance measures the variability (volatility) from an average. Volatility is a measure of risk. So for example, this statistic can help determine the risk an investor might take on when purchasing a specific security.

  • variancep(f)

    Returns the variance for field f for all tuples in the Aggregate's window. The variancep function is similar to the variance function; however you should use variancep when the data provided is the entire population, while the variance function is used for a random sample. The variancep function is calculated using the "biased" (or "n") method. The variance function is calculated using the "unbiased" (or "n-1") method. The function takes one input argument (an int or a double) and returns a double. Variance is a measure of the dispersion of a set of data points around their mean value. It is a mathematical expectation of the average squared deviations from the mean. Variance measures the variability (volatility) from an average. Volatility is a measure of risk. So for example, this statistic can help determine the risk an investor might take on when purchasing a specific security.

  • vwap(price, volume)

    Returns the volume-weighted average-price value from (typically) the price and volume values, for all tuples in the Aggregate's window.

Back to Top ^

Aggregate Functions: C++ API Call

  • callcpp("function" [,arg...])

    Runs a C++ custom aggregate function directly from a StreamBase operator. Custom C++ functions can be accessed by the aggregate form of callcpp in any aggregate expression. To use callcpp in simple expressions, refer to Simple Functions: C++ API Call.)

    To learn about coding custom C++ functions, refer to Creating Custom C++ Functions in the API guide.

Aggregate Functions: Java API Call

  • calljava("class" [,arg...])

    In an aggregate function, runs a custom Java method directly from the StreamBase expression language.

    The function's first argument is a public class, extending the StreamBase AggregateWindow class, in a Java application that you have imported.

    Note that you do not specify the method name, as you do with the simple calljava function, because the method names are already known.

    For the valid argument types, please refer to the table in Simple Functions: Java API Call.

    To learn about coding custom Java functions, refer to Creating Custom Java Functions in the API guide.

Using strresize for Java Methods That Return byte

When using calljava to invoke a Java method that returns a byte[], you may need to use the strresize() function provided by StreamBase, so that the output length of the string is known. For example, if you attempt to use the following expression as a field value in a Map operator:

  calljava("com.mycompany.MyClass", "myFunction", "abc")

...and myFunction is declared as returning a byte[], you will encounter an "Unable to determine string length" typecheck error. Use strresize to specify the maximum length of the output, so the operator's output schema can be determined. For example:

  strresize(calljava("com.mycompany.MyClass", "myFunction", "abc"), 20)

Note: A StreamBase string is not the same as a java.lang.String. No encoding is used when passing it as a byte[]. The raw data that was transmitted through StreamBase is used.

Back to Top ^