TTASE

From The Traditional Tune Archive
Jump to: navigation, search

Boolean query syntax

  • Boolean queries allow the following special operators to be used:
    • explicit operator AND: hello & world
    • operator OR: hello | world
    • operator NOT:
      • hello -world
      • hello !world
    • grouping: ( hello world )

Here's an example query which uses all these operators:
( cat -dog ) | ( cat -mouse)
There always is implicit AND operator, so "hello world" query actually means "hello & world".
OR operator precedence is higher than AND, so "looking for cat | dog | mouse" means "looking for ( cat | dog | mouse )" and not "(looking for cat) | dog | mouse".
Queries like "-dog", which implicitly include all documents from the collection, can not be evaluated. This is both for technical and performance reasons.




Extended query syntax


  • The following special operators and modifiers can be used:
    • operator OR: hello | world
    • operator NOT:
      • hello -world
      • hello !world
  • FIELDS IN TTA: page_title - old_text
    • field search operator: @page_title hello @old_text world
    • field position limit modifier: @old_text[50] hello
    • multiple-field search operator:
      • @(page_title,old_text) hello world
      • all-field search operator:
      • @* hello
    • phrase search operator: "hello world"
    • proximity search operator: "hello world"~10
    • quorum matching operator: "the world is a wonderful place"/3
    • strict order operator (aka operator "before"): aaa << bbb << ccc
    • exact form modifier: raining =cats and =dogs
    • field-start and field-end modifier: ^hello world$

Query example: "hello world" @page_title "example program"~5 @old_text python -(php|perl) @* code

The full meaning of this search is:

Find the words 'hello' and 'world' adjacently in any field in a document;

  • Additionally, the same document must also contain the words 'example' and 'program' in the title field, with up to, but not including, 10 words between the words in question; (E.g. "example PHP program" would be matched however "example script to introduce outside data into the correct context for your program" would not because two terms have 10 or more words between them)
  • Additionally, the same document must contain the word 'python' in the body field, but not contain either 'php' or 'perl';
  • Additionally, the same document must contain the word 'code' in any field.

There always is implicit AND operator, so "hello world" means that both "hello" and "world" must be present in matching document.
OR operator precedence is higher than AND, so "looking for cat | dog | mouse" means "looking for ( cat | dog | mouse )" and not "(looking for cat) | dog | mouse".
Field limit operator limits subsequent searching to a given field. Normally, query will fail with an error message if given field name does not exist in the searched index. However, that can be suppressed by specifying "@@relaxed" option at the very beginning of the query:

@@relaxed @nosuchfield my query

This can be helpful when searching through heterogeneous indexes with different schemas.
Field position limit additionaly restricts the searching to first N position within given field (or fields). For example, "@body[50] hello" will not match the documents where the keyword 'hello' occurs at position 51 and below in the body.
Proximity distance is specified in words, adjusted for word count, and applies to all words within quotes. For instance, "cat dog mouse"~5 query means that there must be less than 8-word span which contains all 3 words, ie. "CAT aaa bbb ccc DOG eee fff MOUSE" document will not match this query, because this span is exactly 8 words long.
Quorum matching operator introduces a kind of fuzzy matching. It will only match those documents that pass a given threshold of given words. The example above ("the world is a wonderful place"/3) will match all documents that have at least 3 of the 6 specified words.
Strict order operator (aka operator "before"), introduced in version 0.9.9-rc2, will match the document only if its argument keywords occur in the document exactly in the query order. For instance, "black << cat" query (without quotes) will match the document "black and white cat" but not the "that cat was black" document. Order operator has the lowest priority. It can be applied both to just keywords and more complex expressions, ie. this is a valid query:

(bag of words) << "exact phrase" << red|green|blue


Exact form keyword modifier will match the document only if the keyword occurred in exactly the specified form. The default behaviour is to match the document if the stemmed keyword matches. For instance, "runs" query will match both the document that contains "runs" and the document that contains "running", because both forms stem to just "run" - while "=runs" query will only match the first document. Exact form operator requires index_exact_words option to be enabled. This is a modifier that affects the keyword and thus can be used within operators such as phrase, proximity, and quorum operators.
Field-start and field-end keyword modifiers will make the keyword match only if it occurred at the very start or the very end of a fulltext field, respectively. For instance, the query "^hello world$" (with quotes and thus combining phrase operator and start/end modifiers) will only match documents that contain at least one field that has exactly these two keywords.
Arbitrarily nested brackets and negations are allowed. However, the query must be possible to compute without involving an implicit list of all documents:

// correct query
aaa -(bbb -(ccc ddd))

// queries that are non-computable
-aaa
aaa | -bbb