About Search Syntax
Derpibooru's search engine allows users to locate uploads via tags that are associated with an upload and some other metadata such as uploader and user count. It also permits chaining together specified tags and metadata to search for specific logical combinations, to allow more precise filtering. This guide explains the syntax and features of individual terms and then shows how they are combined into more complex queries.
- About Search Syntax
- Search Terms
- Special Characters and Suffixes
- Search Grammar: Term Operators and Combinations
- Boosting Terms
Specific searches require the inclusion of search terms, which individually define the criteria expected of each upload result to be returned by the search engine.
Tag Search Behavior
Searching a single term is obvious: merely type in the term you want. By default, the term you use will be searched among the indexed image tags and aliases. Thus, a search for
pinkie pie would, as you may surmise, result in all appropriately tagged and indexed pictures of Pinkie Pie. Aliases are also indexed, so a search for the tag alias
ts is the same as one for
The default tag search has particular aspects associated with it for your convenience. For tag searches, case is insensitive. This means capitalization is irrelevant for queries. For example, the search queries
pinkie pie and
Pinkie Pie will share the same result set.
Searching Through Other Fields
Other fields are also indexed, and you can search them using the namespace convention that is also used by tags. Namely, one enters the field name followed by a colon, and finally, the target value. For example, to search for images with a width of 1920, we would search within the
widthfield and so construct the query
width:1920. If a tag with namespace were to share the namespace with a givenfield, it can still be queried via quoting or escaping.
Numeric Range Queries
Numeric fields in particular support queries for ranges of possible values. A qualifier can be added to the end of the field name with a single period to indicate desired results that are greater than or less than the supplied value; the value can be optionally included, too. To find images with a score greater than 100, we would enter
<= link_to 'score.gt:100', '/search?q=score.gt%2a100'. For an inclusive search of scores greater than or equal to 100, we would instead enter
score.gte:100. The following table enumerates the supported qualifiers.
|Values greater than specified, and not including the specified value|
|Values greater than or equal to specified|
|Values less than specified, and not including the specified value|
|Values less than or equal to specified|
Date/Time Range Queries
Date and time values are specified using a tweaked subset of the ISO 8601 standard. A full date is specified by four-digit year, followed by two-digt month and date, with each value delimited by a hyphen, i.e., "YYYY-mm-DD". Like in ISO 8601, one can specify just the month or even just the year, as long as the less precise information is included in left-to-right order without dangling hyphens. This is semantically interpreted as the range of the entire period (not just the first day of the month, etc.). For example,
2015-04 represents the entire month of April 2015.
Given a full date, a specification for the time of day can be added. To do so, separate the time with a
T or space, followed by the hours, minutes, and seconds, each specified ' with two digits and separated by a colon, i.e., "HH:MM:SS". The hours follow a 24-hour clock. As with date values, one may alternatively specify entire minutes and even hours by truncating the value without a dangling colons. The value
2014-04-20 16 represents the entire hour of 4 PM on 20 April 2014 (UTC). The entire first minute can be specified with
By default, time follows international UTC ("Zulu") time. (In terms of the ISO 8601 standard, a
Z suffix is implied.) One may specify an offset for local time by affixing a plus or minus sign, followed by the offset hours as two digits, a colon, and the offset minutes (usually
-04:00 for US Eastern Daylight Time (EDT). Note that unlike ISO 8601, this can be attached to dates as well as times, to ensure date boundaries fit the locale of interest. For example,
2015-05:00 represents the year of 2015 with an offset of minus five hours (US Eastern Standard Time).
Date/time range queries also accept range qualifiers. The
lt qualifiers omit everything matching the implied time range of the specified value, whereas
lte include the entirety of said time range.
The following examples are valid search queries.
|Returns all uploads made in 2015 (UTC).|
|Returns all uploads made in 2015 (SGT).|
|Returns all uploads made in April 2015 (UTC).|
|Returns all uploads made in April 2015 (BRT).|
|Returns all uploads made in 1 April 2015 (UTC).|
|Returns all uploads made in 1 April 2015 (SGT).|
|Returns all uploads made in the hour of 1 AM of 1 April 2015 (UTC).|
|Returns all uploads made in the hour of 1 AM on 1 April 2015 (UTC). The zero UTC offset designator ("Zulu") is explicit.|
|Returns all uploads made in the hour of 1 AM on 1 April 2015 (UTC). This uses the standard "T" separator associated with ISO 8601.|
|Returns all uploads made in the hour of 1 AM on 1 April 2015 (EDS).|
|Returns all uploads made sometime in the minute of 1:00 AM on 1 April 2015 (UTC).|
|Returns all uploads made sometime in the minute of 1:00 AM on 1 April 2015 (UTC). The zero UTC offset designator ("Zulu") is explicit.|
|Returns all uploads made exactly at midnight on 1 April 2015 (UTC).|
|Returns all uploads made exactly at midnight on 1 April 2015 (SGT).|
|Returns all uploads before the start of 2015 (UTC).|
|Returns all uploads since and including the entire day of 4 April 2015 (season 5 premiere, UTC).|
The following table enumerates all of the supported fields, with examples.
|Numeric Range||Matches any image with the specified aspect ratio.|
|Numeric Range||Matches any image with the specified number of comments|
|Date/Time Range||Matches any image posted at the specified date and/or time.|
|Full Text||Full-text search against image descriptions with the specified string.|
|Numeric Range||Matches any image with the specified downvote count.|
|Literal||Matches any image favorited by the specified user. Case-insensitive.|
|Numeric Range||Matches any image with the specified number of favorites.|
|Numeric Range||Matches any image with the specified height.|
|Numeric Range||Matches any image with the specified number.|
|Literal||Matches the original SHA-512 checksum of an uploaded image.|
|Numeric Range||Matches any image with the specified net score.|
|Literal||Matches any image with the specified SHA-512 checkusm. N.B.: Image optimization usually alters the original checksum!|
|Literal||Matches image source URLs. Case-insensitive.|
|Numeric Range||Matches any image with the specified number of tags|
|Literal||Matches any image with the specified uploader account. Case-insensitive.|
|Numeric Range||Matches any image with the specified upvote count.|
|Numeric Range||Matches any image with the specified width.|
It is worth noting the absence of certain “fields” such as
spoiler. These are tag namespaces, not metadata, but they are functionally the same. Thus, a search for
spoiler:s04 performs as expected.
Special Characters and Suffixes
Wildcards allow for matching with terms that begin with, end with, or contain a given string of characters, like wildcards used in file management. Two wildcards are recognized: the asterisk (or star) and the question mark.
An asterisk "expands" or matches to any number of characters in its place, including 0. For example,
apple* matches to uploads with any of the tags
apple bloom ,
applejack , and simply
A question mark matches to a single character in its place. For example,
t?ixie can match to either
|Zero or more characters|
|A single character|
Escaping Special Characters
The use of special characters that modify search terms or exist outside search terms mandates a facility for “escaping” those characters, so that they are not excluded from search terms themselves. To use special characters within a search term, both of the conventional string escaping mechanisms are used: the backslash and quoting. The following are special characters and sequences that may need to be escaped:
-(when placed in front of a term)
!(when placed in front of a term)
~(with fuzzy matching syntax)
A backslash is placed in front of a special character (and can also be placed in front of a sequence like the ones in the preceding list). This forces a given character to be counted as part of the preceding or following term. In front of any other character, it effectively has no effect. For example,
\-_- forces a search for the emoticon
-_-, despite it following the syntax for
negation if without the backslash. Also consider the search term
rose \(flower\), although parentheses have intuitive rules that do not make escaping them necessary in most cases. The backslash is a special character and thus must also be escaped; a literal backslash is indicated with
The alternative to escaping is to simply surround the search query in double quotes (
"rose (flower)". When searching with a specified field, quotes must surround the field and colon as well, e.g.,
"width:1920". Eveything in quotes is together treated as a verbatim search term, with one exception. Note that the double quote character itself bounds the search term, so if it appears inside, it must be escaped with a backslash. All other uses of backslash are treated literally.
Approximate (Fuzzy) String Matching
The search engine backend, Apache Lucene, also enables so-called “fuzzy” string matching. Fuzzy string matching can be used with any literal search term, including the default tags field. A fuzzy match is specified using a similarity metric either ranging from 0 to 1.0 or a whole number. The whole number specifies an optimal string alignment edit distance, which is the maximum number of edits done to a string to match a given target, with an edit defined as a deletion, insertion, replacement, or switching two adjacent characters. One may alternatively define a similarity factor ranging from 0 to 1.0, with a 1.0 the least “fuzzy”. The derived edit distance is the length of the term sans the field name prefix, multiplied by the difference of unity minus the similarity factor, all rounded down. To specify either, a term is followed with a tilde followed by the edit distance or similarity factor. Note in both cases that Lucene caps the maximum edit distance at 2, as an optimization. Therefore, very large edit distances or small similarities will not behave as expected.
fluttersho~0.8 searches for uploads with tags that approximately match
fluttersho, with a similairty of 0.8. This is an edit distance of ⌊(1 − 0.8)(10)⌋ = 2. Note that uploads tagged
fluttershy are included in the result set. The utility of this is obvious: if you are unsure of a character or tag's exact spelling, you can use this as an aid, like a more manual and controlled version of Google's (in)famous spelling correction features.
Fuzziness can also be applied to numeric queries to specify a range. In this case, the fuzziness parameter is the magnitude above and below the specified number that will be included in the result set. For example,
width:800~200 specifies images with a width ranging from 600 (800 − 200) to 1000 (800 + 200), inclusive.
Fuzzy matching can be freely applied to any term inside an
Search Grammar: Term Operators and Combinations
Terms can be combined to define a search query corresponding to a specific result set. These combinations are formulized as expressions that are constructed from terms, operators, and even other expressions, which are then called subexpressions. Expressions recognized by the search frontend are the negation of a term or subexpression, the requirement of any search term or subexpression, or the requirement of both search terms or subexpressions.
At its core, a search expression is either binary or unary. A binary expression consists of a term or subexpression, an operator indicating the type of expression, and another term or subexpression. Binary expressions can be “chained” by adding the operator followed by another term. A unary expression consists of the operator followed by a single term or subexpression. Both expression types and how to use subexpressions will be covered in the following sections.
|Negation (NOT)||Applied in front of a single term or parenthesized subexpression. The minus sign does not require padding to the right. Specifies that the term or subexpression must not match.|
|Conjunction (AND)||Applied between two terms. The comma may be optionally padded with space on either side; the other forms must be padded. Specifies that both terms match. Can be chained to more terms.|
|Disjunction (OR)||Applied between two terms, with surrounding space. Specifies that either of the terms match. Can be chained to more terms.|
Negation of a term or expression specifies that the the original term or subexpression must not match. The corresponding negation operator is unary, that is, applied to either a single term or to a subexpression. It is specified with the all-capitalized word
NOT, a dash of the non-multi-chromatic variety (
-), or an exclamation point (
!). For example,
NOT fluttershy matches pictures that are not tagged with
fluttershy. In set theory terms, this is taking the complementof the original result set, that is, all uploads outside it.
Commas and AND Expressions
An expression that queries for images that meet all specified terms is a conjunction or AND expresssion. As in the past, you can query images that meet a list of termsby hooking the terms together with commas. For example,
fluttersy,pinkie pie results in pictures that contain both the
pinkie pie tags. In set theory terms, the result set is the intersection of uploads tagged
fluttershy and uploads tagged
Commas can be padded with spaces however you like. Unlike the past, commas are now plain AND operators, so they are more versatile. As will be discussed, they can be used in subexpressions and alongside the OR operator.
AND operators can also be expressed using
&& (derived from typical programming notation) or the all-capitalized word
rarity && pinkie pie or
rarity AND pinkie pie. These forms, unlike the comma, require padding space on either side.
A disjunction or OR expression requests for uploads that meet any of the specified search terms. This is markedly different from the aforementioned AND expression, which, to reiterate, mandates that all terms match. OR operators are expressed either with
|| (also a programming notation) or the all-capitalized word
rarity || pinkie pie or
rarity OR pinkie pie. In set theory terms, the result set is the union of uploads tagged
rarity and uploads tagged
pinkie pie. All forms of the OR operator require padding on either side.
Complex combinations of terms, and therefore search criteria, are possible by combining expressions together. Doing so effectively is analogous to arithmetic. Consider multiplication and addition (which in so-called Boolean alegraare respectively analogous to AND and OR operations). We can express an algebraic expression with multiplication and addition several ways. For three terms, A, B, and C, consider the expression A × B + C. Multiplication is evaluated before addition, so this expression is equivalent to (A × B) + C, in which case the order of operations is explicit.
Likewise, precedence is applied to determine the order in which chained OR, AND, and NOT operations are evaluated. The order of operations in the search syntax is as follows:
- negation (NOT)
- conjunction (AND)
- disjunction (OR)
Consider the query
twilight sparkle || fluttershy && pinkie pie. In this example,
fluttershy && pinkie pie is evaluated first, as an implicit subexpression. Then, that result is OR'd together with
twilight sparkle. Thus, the query instructs the engine to return uploads either tagged with
twilight sparkle or tagged with both
pinkie pie. Note how if the OR expression
twilight sparkle || fluttershy were evaluated first, the result set would differ.
Defining Subexpressions with Parentheses
Returning to an earlier example with arithmetic, we can trump the order of operations using explicit subexpressions. This requires the use of delimiters that act as boundaries, and most often parentheses are used for this purpose. Hence, A × (B + C) forces B + C to be evaluated, and then multiplied with A, which is contrary to the order otherwise followed. Likewise,
(twilight sparkle || fluttershy) && pinkie pie instructs the search engine to return results that have either
twilight sparkle or
fluttershy and always match the tag
As was mentioned earlier, the unary NOT operator can be applied to parenthesized subexpressions. The semantics of this is analogous to applying it to a single term: a negated subexpression specifies uploads that do not adhere to what the subexpression specifies. For example, the query
-(pinkamena diane pie, grimdark) returns all uploads that are not tagged with both
pinkamena diane pie and
grimdark. Uploads tagged with either of the two would be returned as long as they do not have both. Thus light-hearted Pinkamena images and grimdark material not involving Pinkamena would be included, yet the intersection of those two sets of images would be excluded, that is, images that are grimdark and feature Pinkamena.
Explicit subexpressions with parentheses allow for complex queries as they can be arbitrarily nested inside other subexpressions, to fine-tune the result set even more.
Automatic Parentheses Escaping
Finally, a footnote about paretheses is warranted. Traditionally, if an expression parser encounters an open parenthesis without a closing parenthesis, or if parentheses are swapped, an error is raised. This is indeed the case with the search engine, as highlighted in the search parsing error page. However, to a limited extent, a term can contain parentheses within. Parentheses are accepted within search terms as long as they are closed and do not cover the entire expression. The first limit is a heuristic to address the typical use of parentheses, and the latter arises from the legal use of parentheses to single out a term. Thus, the search
rose (flower) searches for uploads tagged with
rose (flower); however, the emoticon query
))B-( raises an error, while
(q) effectively searches for
q, instead. For the latter two examples, simply surround with double quotes to clarify your meaning to the search engine.
The search engine also allows the boosting of specific terms when sorting by relevance, so that uploads including or not including the term occur earlier or later in the results. Boosting is done by modifying a term's relevance score with a positive or negative value. This value is affixed to a term with a preceding caret (
^) and with a positive or negative decimal number. For example,
pinkie pie^1 || tara strong returns uploads tagged either with
pinkie pie or
tara strong, but when sorting by relevance descending, uploads with
pinkie pie are prioritized. A negative value meanwhile reduces the relevance score and deprioritizes the affected term when sorting by relevance, e.g.,
pinkie pie^-1 || tara strong. Sorting options are found below the search box on this page and must be set to sort by relevance for boosting to take proper effect. Thus, in both cases, pictures with both tags will still appear first.