Scoring functions are defined by mathematic formulas that take data from the document, the query and the textual relevance in order to assign a score to each matching document for a query. The resulting scores are used when searching the index to provide specific orderings for the results.
You can modify these formulas in real-time and they can be as complex as you need them to be.
When writing the formula, have in mind that:
Formulas allow the following operators to work with expressions: +, -, *, /
These are all binary operators except for "-" which can also be used to negate (being a unary operator).
Scoring function formulas are computed for each document matching a given query. Hence there is a list of variables related to the document or the query that can be used in the formula:
Textual Relevance | |
---|---|
Description: For each document matching a query a textual relevance (how relevant is the documents text for the query) is calculated. You may or may not consider this value in your formula, or decide how important it is in the final calculation (e.g.: if you want to sort your results just by creation time, you may discard this variable in your scoring function). | Syntax: relevance |
Short syntax: rel or r or R | |
Values: Relevance is always a positive float number. Because of precision issues, relevance CAN be zero. | |
Sample:
-age * relevance (sorts documents considering how new and how relevant to the query the document is equally)
|
Document's Age | |
---|---|
Description: When indexed, every document is assigned a timestamp, an integer value which usually describes its creation time. The larger the value, the newer the document.
The timestamp field can be provided when adding the document. Otherwise, Searchify automatically assigns a value representing the number of seconds since
Unix Epoch (00:00:00 UTC on 1 January 1970) until the moment the document was indexed.
When writing formulas you can use the document's age, which is the result of subtracting the documents timestamp to the number of seconds since Unix Epoch until the moment the query was executed. When using UNIX time for the documents' timestamps, this variable represents the age of the document in seconds. |
Syntax: doc.age |
Short syntax: age or a or A | |
Values: Since there are no restrictions to the documents' timestamps provided, age can contain negative values. | |
Sample:
-age * relevance (sorts documents considering how new and how relevant to the query the document is equally)
|
Document's Variables | |
---|---|
Description: When a document is indexed, it is possible to assign numeric (float) variables to it. These
variables may represent rapidly changing numeric values that have some implication on the document's possible valuation in a sorting function
(number of positive and negative votes, number of comments, user generated score, review score, number of visits, etc.). Once a document is
indexed, its variables can be changed any time it is necessary with no cost other than the one related to the HTTP communication.
The variables are identified by an integer number from zero to the the variables' limit minus one. The maximum number of variables available for each document will depend on the package of the account. |
Syntax: doc.var[n] (where n is the variable's integer identifier) |
Short syntax: d[n] or D[n] | |
Values: Any float entered when the document was indexed or afterwards. For negative values, NaN (not a number) will be returned. A zero value will return negative infinity. | |
Sample:
log(doc.var[0]) - age/86400 (sorts documents considering natural logarithm of the variable #0 of the document minus its age in days)
|
Query Variables | |
---|---|
Description: When performing a search in the index, it is possible to pass float variables along with
the query (check the searching documentation).
These variables can be later used in the scoring function's formula.
|
Syntax: query.var[n] (where n is the variable's integer identifier) |
Short syntax: q[n] or Q[n] | |
Values: Any float passed as a query variable. |
There is a set of mathematical functions available for writing formulas:
Natural logarithm | |
---|---|
Description: Calculates the natural logarithm of an expression. The logarithm function is useful when there's a need to consider the order of magnitude of an expression instead of its actual value (for example, it is comparable to considering the number of digits of the value). Mathematically speaking, it is the inverse to an exponential function. |
Syntax: log(val) |
Arguments:
val: a float expression to the which apply the logarithm. For negative values, NaN |
|
Sample:
log(doc.var[0]) - age/86400 (sorts documents considering natural logarithm of the variable #0 of the document minus its age in days)
|
Power | |
---|---|
Description: Raises a given float expression to a given power (integer). The power function can be used to create exponential functions or to weight different factors (make one more important than another in a product). |
Syntax: pow(base, exponent) |
Arguments:
base: a float expression, the base. exponent: a integer expression, the exponent. Zero, and negative values can be used. Float expressions (variables, function results) can be used, but will be truncated (the integer value closest to zero will be considered). |
|
Sample:
pow(doc.var[0], 3) * doc.var[1] (sorts documents considering variable #0 three times as important as variable #1)
|
Max | |
---|---|
Description: Returns the greater of two values. | Syntax: max(a, b) |
Arguments:
a: a float expression. b: a float expression. |
|
Sample:
max(doc.var[0], doc.var[1]) (sorts documents considering variable #0 or variable #1 wichever is greater)
|
Min | |
---|---|
Description: Returns the smaller of two values. | Syntax: min(a, b) |
Arguments:
a: a float expression. b: a float expression. |
|
Sample:
min(doc.var[0], doc.var[1]) (sorts documents considering variable #0 or variable #1 wichever is smaller)
|
Absolute | |
---|---|
Description: Returns the absolute value of a double value. For positive values, the argument is returned. For negative values, the negation of the value is returned. | Syntax: abs(value) |
Arguments:
value: a float expression, zero, positive, or negative. |
|
Sample:
abs(doc.var[0]) (sorts documents considering variable #0 equally when its value is 1 or -1)
|
Square root | |
---|---|
Description: Calculates the square root of a double value. This function is a variant of the power function that considers one case of non integer exponent (1/2). | Syntax: sqrt(value) |
Arguments:
value: a float expression. For negative values, NaN (not a number) will be returned. |
If clause | |
---|---|
Description: Evaluates a condition and returns the corresponding expression. This function takes
three arguments: a boolean condition, the expression to evaluate when the condition is met and the expression to consider when it is not. The expressions are regular float expressions (a variable, the result of a function, the result of an operation, a literal). The boolean condition is expressed by comparing two float expressions (no boolean operations allowed) with one of this comparators:
|
Syntax: if(cond, true, false) |
Arguments:
cond: the boolean condition comparing two expressions. true: the expression to evaluate when cond: is met. false: the expression to evaluate when cond: is not met. |
|
Sample:
if(doc.var[0] < 1, doc.var[0], rel) (sorts documents considering variable #0 while its value is less than 1, otherwise considering textual relevance)
|
Kms/miles calculator | |
---|---|
Description: Calculates the distance between two geographical points expressed as longitude/latitude coordinates. The distance can be expressed in kilometers or in miles. | Syntax: km(lat1, long1, lat2, long2) or miles(lat1, long1, lat2, long2) |
Arguments:
lat1: latitude of point 1. long1: longitude of point 1. lat2: latitude of point 2. long2: longitude of point 2. All coordinates are float values and they are expressed in degrees (non integer values ARE considered). |
|
Sample:
miles(query.var[0], query.var[1], doc.var[0], doc.var[1])
(sorts documents considering the distance between doc and a point passed in the query)
|