Search Syntax

Last updated

Revision history

About Search Syntax

The search engine allows users to locate uploads via tags that are associated with an upload and some other metadata such as uploader and user count. It also permits chaining together specified tags and metadata to search for specific logical combinations, to allow more precise filtering. This guide explains the syntax and features of individual terms and then shows how they are combined into more complex queries.

  1. About Search Syntax
  2. Search Terms
    1. Tag Search Behavior
    2. Searching Through Other Fields
    3. Numeric Range Queries
    4. Date/Time Range Queries
    5. Supported Fields
  3. Special Characters and Suffixes
    1. Wildcards
    2. Escaping Special Characters
    3. Approximate (Fuzzy) String Matching
  4. Search Grammar: Term Operators and Combinations
    1. Expressions
    2. Summary Table
    3. Negation
    4. Commas and AND Expressions
    5. OR Expressions
    6. Compound Expressions
      1. Operator Precedence
      2. Defining Subexpressions with Parentheses
      3. Automatic Parentheses Escaping
  5. Boosting Terms

Search Terms

Specific searches require the inclusion of search terms, which individually define the criteria expected of each upload result to be returned by the search engine.

Tag Search Behavior

Searching a single term is obvious: merely type in the term you want. By default, the term you use will be searched among the indexed image tags and aliases. Thus, a search for pinkie pie would, as you may surmise, result in all appropriately tagged and indexed pictures of Pinkie Pie. Aliases are also indexed, so a search for the tag alias ts is the same as one for twilight sparkle.

The default tag search has particular aspects associated with it for your convenience. For tag searches, case is insensitive. This means capitalization is irrelevant for queries. For example, the search queries pinkie pie and Pinkie Pie will share the same result set.

Searching Through Other Fields

Other fields are also indexed, and you can search them using the namespace convention that is also used by tags. Namely, one enters the field name followed by a colon, and finally, the target value. For example, to search for images with a width of 1920, we would search within the width field and so construct the query width:1920. If a tag with namespace were to share the namespace with a given field, it can still be queried via quoting or escaping.

Numeric Range Queries

Numeric fields in particular support queries for ranges of possible values. A qualifier can be added to the end of the field name with a single period to indicate desired results that are greater than or less than the supplied value; the value can be optionally included, too. To find images with a score greater than 100, we would enter score.gt:100. For an inclusive search of scores greater than or equal to 100, we would instead enter score.gte:100. The following table enumerates the supported qualifiers.

Qualifier Meaning Example
gt Values greater than specified, and not including the specified value score.gt:100
gte Values greater than or equal to specified score.gte:100
lt Values less than specified, and not including the specified value width.lt:100
lte Values less than or equal to specified width.lte:100

Date/Time Range Queries

Date and time values are specified using a tweaked subset of the ISO 8601 standard. A full date is specified by four-digit year, followed by two-digit month and date, with each value delimited by a hyphen, i.e., "YYYY-mm-DD". Like in ISO 8601, one can specify just the month or even just the year, as long as the less precise information is included in left-to-right order without dangling hyphens. This is semantically interpreted as the range of the entire period (not just the first day of the month, etc.). For example, 2015-04 represents the entire month of April 2015.

Given a full date, a specification for the time of day can be added. To do so, separate the time with a T or space, followed by the hours, minutes, and seconds, each specified ' with two digits and separated by a colon, i.e., "HH:MM:SS". The hours follow a 24-hour clock. As with date values, one may alternatively specify entire minutes and even hours by truncating the value without a dangling colons. The value 2014-04-20 16 represents the entire hour of 4 PM on 20 April 2014 (UTC). The entire first minute can be specified with 2014-04-20 16:00.

By default, time follows international UTC ("Zulu") time. (In terms of the ISO 8601 standard, a Z suffix is implied.) One may specify an offset for local time by affixing a plus or minus sign, followed by the offset hours as two digits, a colon, and the offset minutes (usually 00), e.g., -04:00 for US Eastern Daylight Time (EDT). Note that unlike ISO 8601, this can be attached to dates as well as times, to ensure date boundaries fit the locale of interest. For example, 2015-05:00 represents the year of 2015 with an offset of minus five hours (US Eastern Standard Time).

Date/time range queries also accept range qualifiers. The gt and lt qualifiers omit everything matching the implied time range of the specified value, whereas gte and lte include the entirety of said time range.

The following examples are valid search queries.

Example Explanation
created_at:2015 Returns all uploads made in 2015 (UTC).
created_at:2015+08:00 Returns all uploads made in 2015 (SGT).
created_at:2015-04 Returns all uploads made in April 2015 (UTC).
created_at:2015-04-03:00 Returns all uploads made in April 2015 (BRT).
created_at:2015-04-01 Returns all uploads made in 1 April 2015 (UTC).
created_at:2015-04-01+08:00 Returns all uploads made in 1 April 2015 (SGT).
created_at:2015-04-01 01 Returns all uploads made in the hour of 1 AM of 1 April 2015 (UTC).
created_at:2015-04-01 01Z Returns all uploads made in the hour of 1 AM on 1 April 2015 (UTC). The zero UTC offset designator ("Zulu") is explicit.
created_at:2015-04-01T01Z Returns all uploads made in the hour of 1 AM on 1 April 2015 (UTC). This uses the standard "T" separator associated with ISO 8601.
created_at:2015-04-01 01-04:00 Returns all uploads made in the hour of 1 AM on 1 April 2015 (EDS).
created_at:2015-04-01 01:00 Returns all uploads made sometime in the minute of 1:00 AM on 1 April 2015 (UTC).
created_at:2015-04-01 01:00Z Returns all uploads made sometime in the minute of 1:00 AM on 1 April 2015 (UTC). The zero UTC offset designator ("Zulu") is explicit.
created_at:2015-04-01 00:00:00 Returns all uploads made exactly at midnight on 1 April 2015 (UTC).
created_at:2015-04-01 00:00:00+08:00 Returns all uploads made exactly at midnight on 1 April 2015 (SGT).
created_at.lt:2015 Returns all uploads before the start of 2015 (UTC).
created_at.gte:2015-04-04 Returns all uploads since and including the entire day of 4 April 2015 (season 5 premiere, UTC).

Supported Fields

The following table enumerates all of the supported fields, with examples.

Field Selector Type Description Example
animated Boolean Returns images that are animated or not. animated:true
aspect_ratio Numeric Range Matches any image with the specified aspect ratio. aspect_ratio:1
comment_count Numeric Range Matches any image with the specified number of comments comment_count.gt:50
created_at Date/Time Range Matches any image posted at the specified date and/or time. created_at:2015-04-01
description Full Text Full-text search against image descriptions with the specified string. description:derp
downvotes Numeric Range Matches any image with the specified downvote count. downvotes:0
duration Numeric Range Matches any image/video with the specified duration in seconds. duration.gte:1800
faved_by Literal Matches any image favorited by the specified user. Case-insensitive. faved_by:roboshi
faves Numeric Range Matches any image with the specified number of favorites. faves:20
file_name Literal Matches any image with the specified original filename. file_name:tumblr*
first_seen_at Date/Time Range Matches any image approved at the specified date and/or time. Merges take the oldest date. first_seen_at.gte:2 days ago
gallery_id Literal Matches any image that saved in the specified Gallery ID. gallery_id:24064
height Numeric Range Matches any image with the specified height. height:1080
id Numeric Range Matches any image with the specified number. id:111111
mime_type Literal Returns images with the specified IANA media type. mime_type:video/webm
orig_sha512_hash Literal Matches the original SHA-512 checksum of an uploaded image.
original_format Literal Returns images with the specified image format. original_format:png
pixels Numeric Range Matches any image with the specified pixels. pixels.gte:5000000
score Numeric Range Matches any image with the specified net score. score.gt:200
sha512_hash Literal Matches any image with the specified SHA-512 checksum. N.B.: Image optimization usually alters the original checksum!
size Numeric Matches any image with the specified file size in bytes. size.lt:1048576
source_count Numeric Range Matches any image with the specified number of sources. source_count:3
source_url Literal Matches image source URLs. Case-insensitive. source_url:*deviantart.com*
tag_count Numeric Range Matches any image with the specified number of tags tag_count.gt:10
updated_at Date/Time Range Matches any image updated (image, source, description, or tag change) at the specified time. updated_at.gte:1 days ago
uploader Literal Matches any image with the specified uploader account. Case-insensitive. uploader:k_a
upvotes Numeric Range Matches any image with the specified upvote count. upvotes.gt:200
width Numeric Range Matches any image with the specified width. width:1920
wilson_score Numeric Range Matches any image with the specified lower bound of a 99.5% Wilson CI. wilson_score.gt:0.9

It is worth noting the absence of certain “fields” such as artist and spoiler. These are tag namespaces, not metadata, but they are functionally the same. Thus, a search for spoiler:s04 performs as expected.

Tag Categories

Additionally you can search by count of specific tag categories on an image. Supported categories are body_type_tag_count, error_tag_count, character_tag_count, content_fanmade_tag_count, content_official_tag_count, oc_tag_count, rating_tag_count, species_tag_count, and spoiler_tag_count. For example, origin_tag_count:2

Note that tag categories may not result the exact results you expect. With the example above, tags like alternate version or edit are currently under the origin_tag category.

Special Characters and Suffixes

Wildcards

Wildcards allow for matching with terms that begin with, end with, or contain a given string of characters, like wildcards used in file management. Two wildcards are recognized: the asterisk (or star) and the question mark.

An asterisk "expands" or matches to any number of characters in its place, including 0. For example, apple* matches to uploads with any of the tags apple bloom, applejack, and simply apple.

A question mark matches to a single character in its place. For example, t?ixie can match to either trixie or twixie.

Wildcard Character Match
* Zero or more characters
? A single character

Escaping Special Characters

The use of special characters that modify search terms or exist outside search terms mandates a facility for “escaping” those characters, so that they are not excluded from search terms themselves. To use special characters within a search term, both of the conventional string escaping mechanisms are used: the backslash and quoting. The following are special characters and sequences that may need to be escaped:

A backslash is placed in front of a special character (and can also be placed in front of a sequence like the ones in the preceding list). This forces a given character to be counted as part of the preceding or following term. In front of any other character, it effectively has no effect. For example, \-_- forces a search for the emoticon -_-, despite it following the syntax for negation if without the backslash. Also consider the search term rose \(flower\), although parentheses have intuitive rules that do not make escaping them necessary in most cases. The backslash is a special character and thus must also be escaped; a literal backslash is indicated with \\.

The alternative to escaping is to simply surround the search query in double quotes ("), e.g., "rose (flower)". When searching with a specified field, quotes must surround the field and colon as well, e.g., "width:1920". Everything in quotes is together treated as a verbatim search term, with one exception. Note that the double quote character itself bounds the search term, so if it appears inside, it must be escaped with a backslash. All other uses of backslash are treated literally.

Approximate (Fuzzy) String Matching

The search engine backend, Apache Lucene, also enables so-called “fuzzy” string matching. Fuzzy string matching can be used with any literal search term, including the default tags field. A fuzzy match is specified using a similarity metric either ranging from 0 to 1.0 or a whole number. The whole number specifies an optimal string alignment edit distance, which is the maximum number of edits done to a string to match a given target, with an edit defined as a deletion, insertion, replacement, or switching two adjacent characters. One may alternatively define a similarity factor ranging from 0 to 1.0, with a 1.0 the least “fuzzy”. The derived edit distance is the length of the term sans the field name prefix, multiplied by the difference of unity minus the similarity factor, all rounded down. To specify either, a term is followed with a tilde followed by the edit distance or similarity factor. Note in both cases that Lucene caps the maximum edit distance at 2, as an optimization. Therefore, very large edit distances or small similarities will not behave as expected.

For example, fluttersho~0.8 searches for uploads with tags that approximately match fluttersho, with a similairty of 0.8. This is an edit distance of ⌊(1 − 0.8)(10)⌋ = 2. Note that uploads tagged fluttershy are included in the result set. The utility of this is obvious: if you are unsure of a character or tag's exact spelling, you can use this as an aid, like a more manual and controlled version of Google's (in)famous spelling correction features.

Fuzziness can also be applied to numeric queries to specify a range. In this case, the fuzziness parameter is the magnitude above and below the specified number that will be included in the result set. For example, width:800~200 specifies images with a width ranging from 600 (800 − 200) to 1000 (800 + 200), inclusive.

Fuzzy matching can be freely applied to any term inside an expression.

Search Grammar: Term Operators and Combinations

Expressions

Terms can be combined to define a search query corresponding to a specific result set. These combinations are formulized as expressions that are constructed from terms, operators, and even other expressions, which are then called subexpressions. Expressions recognized by the search frontend are the negation of a term or subexpression, the requirement of any search term or subexpression, or the requirement of both search terms or subexpressions.

At its core, a search expression is either binary or unary. A binary expression consists of a term or subexpression, an operator indicating the type of expression, and another term or subexpression. Binary expressions can be “chained” by adding the operator followed by another term. A unary expression consists of the operator followed by a single term or subexpression. Both expression types and how to use subexpressions will be covered in the following sections.

Summary Table

Operator Symbols Comments
Negation (NOT)
  • NOT
  • -
  • !
Applied in front of a single term or parenthesized subexpression. The minus sign does not require padding to the right. Specifies that the term or subexpression must not match.
Conjunction (AND)
  • ,
  • &&
  • AND
Applied between two terms. The comma may be optionally padded with space on either side; the other forms must be padded. Specifies that both terms match. Can be chained to more terms.
Disjunction (OR)
  • ||
  • OR
Applied between two terms, with surrounding space. Specifies that either of the terms match. Can be chained to more terms.

Negation

Negation of a term or expression specifies that the the original term or subexpression must not match. The corresponding negation operator is unary, that is, applied to either a single term or to a subexpression. It is specified with the all-capitalized word NOT, a dash of the non-multi-chromatic variety (-), or an exclamation point (!). For example, -fluttershy or NOT fluttershy matches pictures that are not tagged with fluttershy. In set theory terms, this is taking the complement of the original result set, that is, all uploads outside it.

Commas and AND Expressions

An expression that queries for images that meet all specified terms is a conjunction or AND expression. As in the past, you can query images that meet a list of terms by hooking the terms together with commas. For example, fluttershy,pinkie pie results in pictures that contain both the fluttershy and pinkie pie tags. In set theory terms, the result set is the intersection of uploads tagged fluttershy and uploads tagged pinkie pie.

Commas can be padded with spaces however you like. Unlike the past, commas are now plain AND operators, so they are more versatile. As will be discussed, they can be used in subexpressions and alongside the OR operator.

AND operators can also be expressed using && (derived from typical programming notation) or the all-capitalized word AND, e.g., rarity && pinkie pie or rarity AND pinkie pie. These forms, unlike the comma, require padding space on either side.

OR Expressions

A disjunction or OR expression requests for uploads that meet any of the specified search terms. This is markedly different from the aforementioned AND expression, which, to reiterate, mandates that all terms match. OR operators are expressed either with || (also a programming notation) or the all-capitalized word OR, e.g., rarity || pinkie pie or rarity OR pinkie pie. In set theory terms, the result set is the union of uploads tagged rarity and uploads tagged pinkie pie. All forms of the OR operator require padding on either side.

Compound Expressions

Complex combinations of terms, and therefore search criteria, are possible by combining expressions together. Doing so effectively is analogous to arithmetic. Consider multiplication and addition (which in so-called Boolean alegra are respectively analogous to AND and OR operations). We can express an algebraic expression with multiplication and addition several ways. For three terms, A, B, and C, consider the expression A × B + C. Multiplication is evaluated before addition, so this expression is equivalent to (A × B) + C, in which case the order of operations is explicit.

Operator Precedence

Likewise, precedence is applied to determine the order in which chained OR, AND, and NOT operations are evaluated. The order of operations in the search syntax is as follows:

  1. negation (NOT)
  2. conjunction (AND)
  3. disjunction (OR)

Consider the query twilight sparkle || fluttershy && pinkie pie. In this example, fluttershy && pinkie pie is evaluated first, as an implicit subexpression. Then, that result is OR'd together with twilight sparkle. Thus, the query instructs the engine to return uploads either tagged with twilight sparkle or tagged with both fluttershy and pinkie pie. Note how if the OR expression twilight sparkle || fluttershy were evaluated first, the result set would differ.

Defining Subexpressions with Parentheses

Returning to an earlier example with arithmetic, we can trump the order of operations using explicit subexpressions. This requires the use of delimiters that act as boundaries, and most often parentheses are used for this purpose. Hence, A × (B + C) forces B + C to be evaluated, and then multiplied with A, which is contrary to the order otherwise followed. Likewise, (twilight sparkle || fluttershy) && pinkie pie instructs the search engine to return results that have either twilight sparkle or fluttershy and always match the tag pinkie pie.

As was mentioned earlier, the unary NOT operator can be applied to parenthesized subexpressions. The semantics of this is analogous to applying it to a single term: a negated subexpression specifies uploads that do not adhere to what the subexpression specifies. For example, the query -(pinkamena diane pie, grimdark) returns all uploads that are not tagged with both pinkamena diane pie and grimdark. Uploads tagged with either of the two would be returned as long as they do not have both. Thus light-hearted Pinkamena images and grimdark material not involving Pinkamena would be included, yet the intersection of those two sets of images would be excluded, that is, images that are grimdark and feature Pinkamena.

Explicit subexpressions with parentheses allow for complex queries as they can be arbitrarily nested inside other subexpressions, to fine-tune the result set even more.

Automatic Parentheses Escaping

Finally, a footnote about parentheses is warranted. Traditionally, if an expression parser encounters an open parenthesis without a closing parenthesis, or if parentheses are swapped, an error is raised. This is indeed the case with the search engine, as highlighted in the search parsing error page. However, to a limited extent, a term can contain parentheses within. Parentheses are accepted within search terms as long as they are closed and do not cover the entire expression. The first limit is a heuristic to address the typical use of parentheses, and the latter arises from the legal use of parentheses to single out a term. Thus, the search rose (flower) searches for uploads tagged with rose (flower) ; however, the emoticon query ))B-( raises an error, while (q) effectively searches for q, instead. For the latter two examples, simply surround with double quotes to clarify your meaning to the search engine.

Boosting Terms

The search engine also allows the boosting of specific terms when sorting by relevance, so that uploads including or not including the term occur earlier or later in the results. Boosting is done by modifying a term's relevance score with a positive or negative value. This value is affixed to a term with a preceding caret (^) and with a positive or negative decimal number. For example, pinkie pie^1 || tara strong returns uploads tagged either with pinkie pie or tara strong, but when sorting by relevance descending, uploads with pinkie pie are prioritized. A negative value meanwhile reduces the relevance score and deprioritizes the affected term when sorting by relevance, e.g., pinkie pie^-1 || tara strong. Sorting options are found below the search box on this page and must be set to sort by relevance for boosting to take proper effect. Thus, in both cases, pictures with both tags will still appear first.