Elasticsearch search api example

Elasticsearch search api example DEFAULT

Elasticsearch API queries

Indexing, storing and retrieving the logs in the Deep Log Inspection system is achieved by means of the Elasticsearch backend. The access to indices, mappings and documents is possible via its REST API (for a complete reference visit the Elasticsearch Reference guide). Indices are patterns under which documents are grouped and made searchable; mappings are used to associate fields with data types; documents are individual storage entries, each corresponding to a log event.

As the Deep Log Inspection system receives logs from outside via Monasca Log API, it is not so useful from the user's perspective to explore how to send logs. However, the user should know e.g. how to query Elasticsearch's API for existing indices, for documents matching a certain index, and even for mappings inside documents.

The cat APIs are a powerful querying tool, useful for finding relationships in the data and getting useful information from Elasticsearch.

Here follow a few example queries using . Elasticsearch's REST API should be listening at .

For a newcomer that wants to retrieve documents from Elasticsearch, the first step should be querying the existing indices. This can be done with a request to the API endpoint:

Once the existing indices are known, documents matching an index can be queried via search API. Note that indices can be matched exactly, i.e. by name, or multiple indices can be queried using wildcards.

The most simple query matches all documents. If e.g. the index pattern is (i.e. all indices that start with ), then:

This query will return all documents whose index matches the index pattern.

Querying a particular index is useful to view relevant information about it, like mappings and settings.

More details on querying an index are here.

Sours: https://deep-log-inspection.readthedocs.io/en/latest/user/elasticsearch/

Returns search hits that match the query defined in the request.

GET /my-index-000001/_search
(Optional, string) Comma-separated list of data streams, indices, and aliases to search. Supports wildcards (). To search all data streams and indices, omit this parameter or use or .

Several options for this API can be specified using a query parameter or a request body parameter. If both parameters are specified, only the query parameter is used.

(Optional, Boolean) If , the request returns an error if any wildcard expression, index alias, or value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targeting returns an error if an index starts with but no index starts with .

Defaults to .

(Optional, Boolean) If , returns partial results if there are shard request timeouts or shard failures. If , returns an error with no partial results. Defaults to .

To override the default for this field, set the cluster setting to .

(Optional, integer) The number of shard results that should be reduced at once on the coordinating node. This value should be used as a protection mechanism to reduce the memory overhead per search request if the potential number of shards in the request can be large. Defaults to .
(Optional, Boolean) If , network round-trips between the coordinating node and the remote clusters are minimized when executing cross-cluster search (CCS) requests. See How cross-cluster search handles network delays. Defaults to .
(Optional, string) A comma-separated list of fields to return as the docvalue representation of a field for each hit.

(Optional, string) Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as . Valid values are:

Match any data stream or index, including hidden ones.
Match open, non-hidden indices. Also matches any non-hidden data stream.
Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.
Match hidden data streams and hidden indices. Must be combined with , , or both.
Wildcard patterns are not accepted.

Defaults to .

(Optional, Boolean) If , returns detailed information about score computation as part of a hit. Defaults to .

(Optional, integer) Starting document offset. Defaults to .

By default, you cannot page through more than 10,000 hits using the and parameters. To page through more hits, use the parameter.

(Optional, Boolean) If , concrete, expanded or aliased indices will be ignored when frozen. Defaults to .
(Optional, Boolean) If , missing or closed indices are not included in the response. Defaults to .
(Optional, integer) Defines the number of concurrent shard requests per node this search executes concurrently. This value should be used to limit the impact of the search on the cluster in order to limit the number of concurrent shard requests. Defaults to .

(Optional, integer) Defines a threshold that enforces a pre-filter roundtrip to prefilter search shards based on query rewriting if the number of shards the search request expands to exceeds the threshold. This filter roundtrip can limit the number of shards significantly if for instance a shard can not match any documents based on its rewrite method ie. if date filters are mandatory to match but the shard bounds and the query are disjoint. When unspecified, the pre-filter phase is executed if any of these conditions is met:

  • The request targets more than shards.
  • The request targets one or more read-only index.
  • The primary sort of the query targets an indexed field.

(Optional, string) Nodes and shards used for the search. By default, Elasticsearch selects from eligible nodes and shards using adaptive replica selection, accounting for allocation awareness.

(Optional, string) Query in the Lucene query string syntax.

You can use the parameter to run a query parameter search. Query parameter searches do not support the full Elasticsearch Query DSL but are handy for testing.

The parameter overrides the parameter in the request body. If both parameters are specified, documents matching the request body parameter are not returned.

(Optional, Boolean) If , the caching of search results is enabled for requests where is . See Shard request cache settings. Defaults to index level settings.
(Optional, Boolean) Indicates whether hits.total should be rendered as an integer or an object in the rest search response. Defaults to .
(Optional, string) Custom value used to route operations to a specific shard.

(Optional, time value) Period to retain the search context for scrolling. See Scroll search results.

By default, this value cannot exceed (24 hours). You can change this limit using the cluster-level setting.

(Optional, string) How distributed term frequencies are calculated for relevance scoring.

(Optional, Boolean) If , returns sequence number and primary term of the last modification of each hit. See Optimistic concurrency control.

(Optional, integer) Defines the number of hits to return. Defaults to .

By default, you cannot page through more than 10,000 hits using the and parameters. To page through more hits, use the parameter.

(Optional, string) A comma-separated list of <field>:<direction> pairs.

(Optional) Indicates which source fields are returned for matching documents. These fields are returned in the property of the search response. Defaults to .

(Optional, string) A comma-separated list of source fields to exclude from the response.

You can also use this parameter to exclude fields from the subset specified in query parameter.

If the parameter is , this parameter is ignored.

(Optional, string) A comma-separated list of source fields to include in the response.

If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the query parameter.

If the parameter is , this parameter is ignored.

(Optional, string) Specific of the request for logging and statistical purposes.

(Optional, string) A comma-separated list of stored fields to return as part of a hit. If no fields are specified, no stored fields are included in the response.

If this field is specified, the parameter defaults to . You can pass to return both source fields and stored fields in the search response.

(Optional, string) Specifies which field to use for suggestions.
(Optional, string) The source text for which the suggestions should be returned.

(Optional, integer) Maximum number of documents to collect for each shard. If a query reaches this limit, Elasticsearch terminates the query early. Elasticsearch collects documents before sorting.

Use with caution. Elasticsearch applies this parameter to each shard handling the request. When possible, let Elasticsearch perform early termination automatically. Avoid specifying this parameter for requests that target data streams with backing indices across multiple data tiers.

Defaults to , which does not terminate query execution early.

(Optional, time units) Specifies the period of time to wait for a response from each shard. If no response is received before the timeout expires, the request fails and returns an error. Defaults to no timeout.
(Optional, Boolean) If , calculate and return document scores, even if the scores are not used for sorting. Defaults to .

(Optional, integer or Boolean) Number of hits matching the query to count accurately. Defaults to .

If , the exact number of hits is returned at the cost of some performance. If , the response does not include the total number of hits matching the query.

(Optional, Boolean) If , aggregation and suggester names are be prefixed by their respective types in the response. Defaults to .
(Optional, Boolean) If , returns document version as part of a hit. Defaults to .

(Optional, array of strings and objects) Array of wildcard () patterns. The request returns doc values for field names matching these patterns in the property of the response.

You can specify items in the array as a string or object. See Doc value fields.

(Optional, array of strings and objects) Array of wildcard () patterns. The request returns values for field names matching these patterns in the property of the response.

You can specify items in the array as a string or object.

(Optional, Boolean) If , returns detailed information about score computation as part of a hit. Defaults to .

(Optional, integer) Starting document offset. Defaults to .

By default, you cannot page through more than 10,000 hits using the and parameters. To page through more hits, use the parameter.

(Optional, array of objects) Boosts the of documents from specified indices.

(Optional, float) Minimum for matching documents. Documents with a lower are not included in the search results.

(Optional, object) Limits the search to a point in time (PIT). If you provide a , you cannot specify a in the request path.

(Optional, object of objects) Defines one or more runtime fields in the search request. These fields take precedence over mapped fields with the same name.

(Optional, Boolean) If , returns sequence number and primary term of the last modification of each hit. See Optimistic concurrency control.

(Optional, integer) The number of hits to return. Defaults to .

By default, you cannot page through more than 10,000 hits using the and parameters. To page through more hits, use the parameter.

(Optional) Indicates which source fields are returned for matching documents. These fields are returned in the property of the search response. Defaults to .

(Optional, array of strings) Stats groups to associate with the search. Each group maintains a statistics aggregation for its associated searches. You can retrieve these stats using the indices stats API.

(Optional, integer) Maximum number of documents to collect for each shard. If a query reaches this limit, Elasticsearch terminates the query early. Elasticsearch collects documents before sorting.

Use with caution. Elasticsearch applies this parameter to each shard handling the request. When possible, let Elasticsearch perform early termination automatically. Avoid specifying this parameter for requests that target data streams with backing indices across multiple data tiers.

Defaults to , which does not terminate query execution early.

(Optional, time units) Specifies the period of time to wait for a response from each shard. If no response is received before the timeout expires, the request fails and returns an error. Defaults to no timeout.
(Optional, Boolean) If , returns document version as part of a hit. Defaults to .

(integer) Milliseconds it took Elasticsearch to execute the request.

This value is calculated by measuring the time elapsed between receipt of a request on the coordinating node and the time at which the coordinating node is ready to send the response.

Took time includes:

  • Communication time between the coordinating node and data nodes
  • Time the request spends in the thread pool, queued for execution
  • Actual execution time

Took time does not include:

  • Time needed to send the request to Elasticsearch
  • Time needed to serialize the JSON response
  • Time needed to send the response to a client
(Boolean) If , the request timed out before completion; returned results may be partial or empty.

(object) Contains a count of shards used for the request.

(object) Contains returned documents and metadata.

GET /my-index-000001/_search?from=40&size=20 { "query": { "term": { "user.id": "kimchy" } } }

The API returns the following response:

{ "took": 5, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 20, "relation": "eq" }, "max_score": 1.3862942, "hits": [ { "_index": "my-index-000001", "_type" : "_doc", "_id": "0", "_score": 1.3862942, "_source": { "@timestamp": "2099-11-15T14:12:12", "http": { "request": { "method": "get" }, "response": { "status_code": 200, "bytes": 1070000 }, "version": "1.1" }, "source": { "ip": "127.0.0.1" }, "message": "GET /search HTTP/1.1 200 1070000", "user": { "id": "kimchy" } } }, ... ] } }
Sours: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html
  1. Anthro bunny porn
  2. Cvs carepass
  3. Gohan goes ssj2 japanese
  4. Waxie san diego ca

42 Elasticsearch Query Examples – Hands-on Tutorial

Elasticsearch provides a powerful set of options for querying documents for various use cases so it’s useful to know which query to apply to a specific case. The following is a hands-on tutorial to help you take advantage of the most important queries that Elasticsearch has to offer.

In this guide, you’ll learn 42 popular query examples with detailed explanations. Each query covered here will fall into 2 types:

  1. Structured Queries: queries that are used to retrieve structured data such as dates, numbers, pin codes, etc.
  2. Full-text Queries: queries that are used to query plain text.

Note: For this article and the related operations, we’re using Elasticsearch and Kibana version 7.4.0.

Here’s are the primary query examples covered in the guide, for quick reference:

CategoryTypeMatch CriteriaQueryWill MatchWill Not Match
matchfull-textMatches if any one of the search keywords are present in the field (analyzing is done on the search keywords too)"search better"1. can I search for better results 2. search better please 3. you know, for SEARCH 4. there is a better place out there1. sear for the box 2. I won the bet 3. there are some things 4. some people are good at everything
multi-matchfull-textTo apply match query to multiple fieldskey1: "search" key2: "better"if key1 has the word "search" OR if key2 has the word "better"N/A
match_phrasefull-textWill try to match the exact phrase, in the same ordersearch better1. let me search better1.can I search for better results 2.this is for search betterment
match_phrase_prefixfull-textWill try to match the exact phrase in order, but the last term will match as a prefixsearch better1. let me search better 2. this is for search betterment1. can I search for better results
termtermThe query is applied to the generated tokens Since no analysis is performed, the keyword is searched as an exact matchtasty1. the food was tasty1. the food was Tasty 2. the food was TASTY
existstermReturns documents that contain an indexed value for a fieldexists:{ "field":"name" }returns all the documents that have the field called "name"N/A
rangetermreturns documents containing values within the specified range specified in the field appliedage:{ "gte": 20, "lte":30 }returns all the documents with value of "age" field falling between 20 and 30 (including 20 and 30)N/A
idstermreturns the documents that has the specified document idsN/AN/AN/A
prefixtermsearch for the exact term (including the casing) at the start of a wordMult1. Multi 2. Multiple 3. Multiply 4. Multiplication1. mult
wildcardtermmatches all the terms with the given wild card patternc*a1. china 2. canada 3. cambodia1. cabbage
regexptermmatches the terms with the given regex patternres[a-z]*1. restaurant 2. research1. res123
fuzzytermreturns documents that contain terms similar to that of the search termSao PauloSão PauloChennai
boolcompoundto apply a combination of queries and logical operatorsmust , key1:"search" should, key2:"better" must_not, key3:"silk"1. search will be better 2. search will be there1. search better for silk 2. search for silk
function_score: weightcompoundgives higher scores for higher weightssearch clause1 - weight 50 search clause 2 - weight 25the documents with the search clause 1 gets a higher score than that of search clause 2 matching documentsN/A
function_score: script_scorecompoundmodify the score using custom scriptsN/AN/AN/A
function_score:field_value_factorcompoundmodify the score based on a specific fieldN/AN/AN/A
has_childjoining queriesqueries on child documents and returns the corresponding parent documents(of the matching children)N/AN/AN/A
has_parentjoining queriesqueries on the parent documents and returns the corresponding parent documents (of the matching parents)N/AN/AN/A
query_stringfull-textmulti-purpose query that “can club” the usage of other queries like "match","multi-match","regexp","wildcard" etc. It has strict formatting(position:engineer) OR (salary:(>=10000 AND <=52000))documents with text 'engineer' in the field ‘position’ OR the documents which have a salary range between 10,000 and 52,000 (including 10,000 and 52,000)N/A
simple_query_stringfull-textsame as query_string, but non-strict(position:engineer) | (country:china)documents with 'engineer' in the field ‘position’ OR china in the field ‘country’N/A

Before we dive in and get our hands dirty with the query examples below, remember that with Coralogix you can use any query syntax to explore your data – including these very examples!

Setup The Demo Index

Let’s start by first creating a new index with some sample data so that you can follow along for each search example.
Create an index named “employees”

Define a mapping (schema) for one of the fields (date_of_birth) that will be contained in the ingested document (the following step after this):

Now let’s ingest a few documents into our newly created index, as shown in the example below using Elasticsearch’s _bulk API:

Now that we have an index with documents and a mapping specified, we’re ready to get started with the example searches.

 

1. Match Query

The “match” query is one of the most basic and commonly used queries in Elasticsearch and functions as a full-text query. We can use this query to search for text, numbers or boolean values.

Let us search for the word “heuristic” contained in the field called “phrase” in the documents we ingested earlier.

POST employees/_search { "query": { "match": { "phrase": { "query" : "heuristic" } } } }

Out of the 4 documents in our Index, only 2 documents return containing the word “heuristic” in the “phrase” field:

{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 0.6785374, "hits" : [ { "_index" : "employees", "_type" : "_doc", "_id" : "2", "_score" : 0.6785374, "_source" : { "id" : 2, "name" : "Othilia Cathel", "email" : "[email protected]", "gender" : "Female", "ip_address" : "3.164.153.228", "date_of_birth" : "22/07/1987", "company" : "Edgepulse", "position" : "Structural Engineer", "experience" : 11, "country" : "China", "phrase" : "Grass-roots heuristic help-desk", "salary" : 193530 } }, { "_index" : "employees", "_type" : "_doc", "_id" : "4", "_score" : 0.6257787, "_source" : { "id" : 4, "name" : "Alan Thomas", "email" : "[email protected]", "gender" : "Male", "ip_address" : "200.47.210.95", "date_of_birth" : "11/12/1985", "company" : "Yamaha", "position" : "Resources Manager", "experience" : 12, "country" : "China", "phrase" : "Emulation of roots heuristic coherent systems", "salary" : 300000 } } ] } }

What happens if we want to search for more than one word? Using the same query we just performed, let’s search for “heuristic roots help”:

POST employees/_search { "query": { "match": { "phrase": { "query" : "heuristic roots help" } } } }

This returns the same document as before because by default, Elasticsearch treats each word in the search query with an OR operator. In our case, the query will match any document which contains “heuristic” OR “roots” OR “help”.

Changing The Operator Parameter
The default behavior of the OR operator being applied to multi-word searches can be changed using the “operator” parameter passed along with the “match” query.
We can specify the operator parameter with “OR” or “AND” values.
Let’s see what happens when we provide the operator parameter “AND” in the same query we performed earlier.

POST employees/_search { "query": { "match": { "phrase": { "query" : "heuristic roots help", "operator" : "AND" } } } }

Now the results will return only one document (document id=2) since that is the only document containing all three search keywords in the “phrase” field.

minimum_should_match

Taking things a bit further, we can set a threshold for a minimum amount of matching words that the document must contain. For example, if we set this parameter to 1, the query will check for any documents with a minimum of 1 matching word.
Now if we set the “minium_should_match” parameter to 3, then all three words must appear in the document in order to be classified as a match.

In our case, the following query would return only 1 document (with id=2) as that is the only one matching our criteria

POST employees/_search { "query": { "match": { "phrase": { "query" : "heuristic roots help", "minimum_should_match": 3 } } } }

 

1.1 Multi-Match Query

So far we’ve been dealing with matches on a single field – that is we searched for the keywords inside a single field named “phrase”.
But what if we needed to search keywords across multiple fields in a document? This is where the multi-match query comes into play.
Let’s try an example search for the keyword “research help” in the “position” and “phrase” fields contained in the documents.

POST employees/_search { "query": { "multi_match": { "query" : "research help" , "fields": ["position","phrase"] } } }

This will result in the following response:

{ { "took" : 104, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 1.2613049, "hits" : [ { "_index" : "employees", "_type" : "_doc", "_id" : "1", "_score" : 1.2613049, "_source" : { "id" : 1, "name" : "Huntlee Dargavel", "email" : "[email protected]", "gender" : "Male", "ip_address" : "58.11.89.193", "date_of_birth" : "11/09/1990", "company" : "Talane", "position" : "Research Associate", "experience" : 7, "country" : "China", "phrase" : "Multi-channelled coherent leverage", "salary" : 180025 } }, { "_index" : "employees", "_type" : "_doc", "_id" : "2", "_score" : 1.1785963, "_source" : { "id" : 2, "name" : "Othilia Cathel", "email" : "[email protected]", "gender" : "Female", "ip_address" : "3.164.153.228", "date_of_birth" : "22/07/1987", "company" : "Edgepulse", "position" : "Structural Engineer", "experience" : 11, "country" : "China", "phrase" : "Grass-roots heuristic help-desk", "salary" : 193530 } } ] } }

 

1.2 Match Phrase

Match_phrase is another commonly used query which, like its name indicates, matches phrases in a field.
If we needed to search for the phrase “roots heuristic coherent” in the “phrase” field in the employee index, we can use the “match_phrase” query:

GET employees/_search { "query": { "match_phrase": { "phrase": { "query": "roots heuristic coherent" } } } }

This will return the documents with the exact phrase “roots heuristic coherent”, including the order of the words. In our case, we have only one result matching the above criteria, as shown in the below response

{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.8773359, "hits" : [ { "_index" : "employees", "_type" : "_doc", "_id" : "4", "_score" : 1.8773359, "_source" : { "id" : 4, "name" : "Alan Thomas", "email" : "[email protected]", "gender" : "male", "ip_address" : "200.47.210.95", "date_of_birth" : "11/12/1985", "company" : "Yamaha", "position" : "Resources Manager", "experience" : 12, "country" : "China", "phrase" : "Emulation of roots heuristic coherent systems", "salary" : 300000 } } ] } }

 

Slop Parameter

A useful feature we can make use of in the match_phrase query is the “slop” parameter which allows us to create more flexible searches.
Suppose we searched for “roots coherent” with the match_phrase query. We wouldn’t receive any documents returned from the employee index. This is because for match_phrase to match, the terms need to be in the exact order.
Now, let’s use the slop parameter and see what happens:

GET employees/_search { "query": { "match_phrase": { "phrase": { "query": "roots coherent", "slop": 1 } } } }

With slop=1, the query is indicating that it is okay to move one word for a match, and therefore we’ll receive the following response. In the below response, you can see that the “roots coherent” matched the “roots heuristic coherent” document. This is because the slop parameter allows skipping 1 term.

{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 0.78732485, "hits" : [ { "_index" : "employees", "_type" : "_doc", "_id" : "4", "_score" : 0.78732485, "_source" : { "id" : 4, "name" : "Alan Thomas", "email" : "[email protected]", "gender" : "male", "ip_address" : "200.47.210.95", "date_of_birth" : "11/12/1985", "company" : "Yamaha", "position" : "Resources Manager", "experience" : 12, "country" : "China", "phrase" : "Emulation of roots heuristic coherent systems", "salary" : 300000 } } ] } }

 

1.3 Match Phrase Prefix

The match_phrase_prefix query is similar to the match_phrase query, but here the last term of the search keyword is considered as a prefix and is used to match any term starting with that prefix term.
First, let’s insert a document into our index to better understand the match_phrase_prefix query:

PUT employees/_doc/5 { "id": 4, "name": "Jennifer Lawrence", "email": "[email protected]", "gender": "female", "ip_address": "100.37.110.59", "date_of_birth": "17/05/1995", "company": "Monsnto", "position": "Resources Manager", "experience": 10, "country": "Germany", "phrase": "Emulation of roots heuristic complete systems", "salary": 300000 }

Now let’s apply the match_phrase_prefix:

GET employees/_search { "_source": [ "phrase" ], "query": { "match_phrase_prefix": { "phrase": { "query": "roots heuristic co" } } } }

In the results below, we can see that the documents with coherent and complete matched the query. We can also use the slop parameter in the “match_phrase” query.

{ { "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 3.0871696, "hits" : [ { "_index" : "employees", "_type" : "_doc", "_id" : "4", "_score" : 3.0871696, "_source" : { "phrase" : "Emulation of roots heuristic coherent systems" } }, { "_index" : "employees", "_type" : "_doc", "_id" : "5", "_score" : 3.0871696, "_source" : { "phrase" : "Emulation of roots heuristic complete systems" } } ] } }

Note: “match_phrase_query” tries to match 50 expansions (by default) of the last provided keyword (co in our example). This can be increased or decreased by specifying the “max_expansions” parameter.
Due to this prefix property and the easy to setup property of the match_phrase_prefix query, it is often used for autocomplete functionality.
Now let’s delete the document we just added with id=5:

DELETE employees/_doc/5

 

2. Term Level Queries

Term level queries are used to query structured data, which would usually be the exact values.

 

2.1. Term Query/Terms Query

This is the simplest of the term level queries. This query searches for the exact match of the search keyword against the field in the documents.
For example, if we search for the word “Male” using the term query against the field “gender”, it will search exactly as the word is, even with the casing.
This can be demonstrated by the below two queries:

In the above case, the only difference between the two queries is that of the casing of the search keyword. Case 1 had all lowercase, which was matched because that is how it was saved against the field. But for Case 2, the search didn’t get any result, because there was no such token against the field “gender” with a capitalized “F”

We can also pass multiple terms to be searched on the same field, by using the terms query. Let us search for “female” and “male” in the gender field. For that, we can use the terms query as below:

POST employees/_search { "query": { "terms": { "gender": [ "female", "male" ] } } }

 

2.2 Exists Queries

Sometimes it happens that there is no indexed value for a field, or the field does not exist in the document. In such cases, it helps in identifying such documents and analyzing the impact.
For example, let us index a document like below to the “employees” index

PUT employees/_doc/5 { "id": 5, "name": "Michael Bordon", "email": "[email protected]", "gender": "male", "ip_address": "10.47.210.65", "date_of_birth": "12/12/1995", "position": "Resources Manager", "experience": 12, "country": null, "phrase": "Emulation of roots heuristic coherent systems", "salary": 300000 }

This document has no field named “company” and the value of the “country” field is null.weqweq

Now if we want to find the documents with the field “company”, we can use the exist query as below:

GET employees/_search { "query": { "exists": { "field": "company" } } }

The above query will list all the documents which have the field “company”.
Perhaps a more useful solution would be to list all the documents without the “company” field. This can also be achieved by using the exist query as below

GET employees/_search { "query": { "bool": { "must_not": [ { "exists": { "field": "company" } } ] } } }

The bool query is explained in detail in the following sections.
Let us delete the now inserted document from the index, for the cause of convenience and uniformity by typing in the below request

DELETE employees/_doc/5

 

2.3 Range Queries

Another most commonly used query in the Elasticsearch world is the range query. The range query allows us to get the documents that contain the terms within the specified range. Range query is a term level query (means using to query structured data) and can be used against numerical fields, date fields, etc.

 

Range query on numeric fields

For example, in the data set, we have created, if we need to filter out the people who have experience level between 5 to 10 years, we can apply the following range query for the same:

POST employees/_search { "query": { "range" : { "experience" : { "gte" : 5, "lte" : 10 } } } }

What is gte, gt ,lt and lt?

gteGreater than or equal to.

gte: 5 , means greater than or equal to 5, which includes 5
gtGreater than. 

gt: 5 , means greater than 5, which does not include 5
lteLess than or equal to.

lte: 5 , means less than or equal to 5, which includes 5
ltLess than. 

gt: 5 , means less than 5, which does not includes 5

 

Range query on date fields

Similarly, range queries can be applied to the date fields as well. If we need to find out those who were born after 1986, we can fire a query like the one given below:

GET employees/_search { "query": { "range" : { "date_of_birth" : { "gte" : "01/01/1986" } } } }

This will fetch us the documents which have the date_of_birth fields only after the year 1986

 

2.4 Ids Queries

The ids query is a relatively less used query but is one of the most useful ones and hence qualifies to be in this list. There are occasions when we need to retrieve documents based on their IDs. This can be achieved using a single get request as below:

GET indexname/typename/documentId

This can be a good solution if there is only one document to be fetched by an ID, but what if we have many more?

That is where the ids query comes in very handy. With the Ids query, we can do this in a single request.
In the below example we are fetching documents with ids 1 and 4 from the employee index with a single request.

POST employees/_search { "query": { "ids" : { "values" : ["1", "4"] } } }

 

2.5 Prefix Queries

The prefix query is used to fetch documents that contain the given search string as the prefix in the specified field.
Suppose we need to fetch all documents which contain “al” as the prefix in the field “name”, then we can use the prefix query as below:

GET employees/_search { "query": { "prefix": { "name": "al" } } }

This would result in the below response

{ "hits" : [ { "_index" : "employees", "_type" : "_doc", "_id" : "4", "_score" : 1.0, "_source" : { "id" : 4, "name" : "Alan Thomas", "email" : "[email protected]", "gender" : "male", "ip_address" : "200.47.210.95", "date_of_birth" : "11/12/1985", "company" : "Yamaha", "position" : "Resources Manager", "experience" : 12, "country" : "China", "phrase" : "Emulation of roots heuristic coherent systems", "salary" : 300000 } } ] }

Since the prefix query is a term query, it will pass the search string as it is. That is searching for “al” and “Al” is different. If in the above example, we search for “Al”, we will get 0 results as there is no token starting with “Al” in the inverted index of the field “name”. But if we query on the field “name.keyword”, with “Al” we will get the above result and in this case, querying for “al” will result in zero hits.

 

2.6 Wildcard Queries

Will fetch the documents that have terms that match the given wildcard pattern.
For example, let us search for “c*a” using the wildcard query on the field “country” like below:

GET employees/_search { "query": { "wildcard": { "country": { "value": "c*a" } } } }

The above query will fetch all the documents with the “country” name starting with “c” and ending with “a” (eg: China, Canada, Cambodia, etc).

Here the * operator can match zero or more characters.

 

2.7 Regexp

This is similar to the “wildcard” query we saw above but will accept regular expressions as input and fetch documents matching those.

GET employees/_search { "query": { "regexp": { "position": "res[a-z]*" } } }

The above query will get us the documents matching the words that match the regular expression res[a-z]*

 

2.8 Fuzzy

The Fuzzy query can be used to return documents containing terms similar to that of the search term. This is especially good when dealing with spelling mistakes.
We can get results even if we search for “Chnia” instead of “China”, using the fuzzy query.
Let us have a look at an example:

GET employees/_search { "query": { "fuzzy": { "country": { "value": "Chnia", "fuzziness": "2" } } } }

Here fuzziness is the maximum edit distance allowed for matching. The parameters like “max_expansions” etc, which we saw in the “match_phrase” query can also be used. More documentation on the same can be found here

Fuzzy queries can also come in with the “match” query types. The following example shows the fuzziness being used in a multi_match query

POST employees/_search { "query": { "multi_match" : { "query" : "heursitic reserch", "fields": ["phrase","position"], "fuzziness": 2 } }, "size": 10 }

The above query will return the documents matching either “heuristic” or “research” despite the spelling mistakes in the query.

 

3. Boosting

While querying, it is often helpful to get the more favored results first. The simplest way of doing this is called boosting in Elasticsearch. And this comes in handy when we query multiple fields. For example, consider the following query:

POST employees/_search { "query": { "multi_match" : { "query" : "versatile Engineer", "fields": ["position^3", "phrase"] } } }

This will return the response with the documents matching the “position” field to be in the top rather than with that of the field “phrase”.

4. Sorting

4.1 Default Sorting

When there is no sort parameter specified in the search request, Elasticsearch returns the document based on the descending values of the “_score” field. This “_score” is computed by how well the query has matched using the default scoring methodologies of Elasticsearch. In all the examples we have discussed above you can see the same behavior in the results.
It is only when we use the “filter” context there is no scoring computed, so as to make the return of the results faster.

4.2 How to Sort by a Field

Elasticsearch gives us the option to sort on the basis of a field. Say, let us need to sort the employees based on their descending order of experience. We can use the below query with the sort option enabled to achieve that:

GET employees/_search { "_source": ["name","experience","salary"], "sort": [ { "experience": { "order": "desc" } } ], }

The results of the above query is given below:

As you can see from the above response, the results are ordered based on the descending values of the employee experience.
Also, there are two employees, with the same experience level as 12.

 

4.3 How to Sort by Multiple Fields

In the above example, we saw that there are two employees with the same experience level of 12, but we need to sort again based on the descending order of the salary. We can provide multiple fields for sorting too, as shown in the query demonstrated below:

GET employees/_search { "_source": [ "name", "experience", "salary" ], "sort": [ { "experience": { "order": "desc" } }, { "salary": { "order": "desc" } } ] }

Now we get the below results:

In the above results, you can see that within the employees having same experience levels, the one with the highest salary was promoted early in the order (Alan and Winston had same experience levels, but unlike the previous search results, here Alan was promoted as he had higher salary).

Note: If we change the order of sort parameters in the sorted array, that is if we keep the “salary” parameter first and then the “experience” parameter, then the search results would also change. The results will first be sorted on the basis of the salary parameter and then the experience parameter would be considered, without impacting the salary based sorting.

Let us invert the order of sort of the above query, that is “salary” is kept first and the “experience” as shown below:

GET employees/_search { "_source": [ "name", "experience", "salary" ], "sort": [ { "salary": { "order": "desc" } }, { "experience": { "order": "desc" } } ] }

The results would be like below:

You can see that the candidate with experience value 12 came below the candidate with experience value 7, as the latter had more salary than the former.

 

5. Compound Queries

So far, in the tutorial, we have seen that we fired single queries, like finding a text match or finding the age ranges, etc. But more often in the real world, we need multiple conditions to be checked and documents to be returned based on that. Also, we might need to modify the relevance or score parameter of the queries or to change the behavior of the individual queries, etc. Compound queries are the queries which help us to achieve the above scenarios. In this section, let us have a look into a few of the most helpful compound queries.

5.1. The Bool Query

Bool query provides a way to combine multiple queries in a boolean manner. That is for example if we want to retrieve all the documents with the keyword “researcher” in the field “position” and those who have more than 12 years of experience we need to use the combination of the match query and that of the range query. This kind of query can be formulated using the bool query. The bool query has mainly 4 types of occurrences defined:

mustThe conditions or queries in this must occur in the documents to consider them a match. Also, this contributes to the score value. 

 

Eg: if we keep query A and query B in the must section, each document in the result would satisfy both the queries, ie query A AND query B

shouldThe conditions/queries should match.  

 

Result = query A OR query B

filterSame a the must clause, but the score will be ignored
must_notThe conditions/queries specified must not occur in the documents. Scoring is ignored and kept as 0 as the results are ignored.

A typical bool query structure would be like the below:

POST _search { "query": { "bool" : { "must" : [], "filter": [], "must_not" : [], "should" : [] } } }

Now let’s explore how we can use the bool query for different use cases.

Bool Query Example 1 – Must

In our example, let us say, we need to find all employees who have 12 years’ experience or more AND are also having “manager” word in the “position” field. We can do that with the following bool query

POST employees/_search { "query": { "bool": { "must": [ { "match": { "position": "manager" } }, { "range": { "experience": { "gte": 12 } } } ] } } }

The response for the above query will have documents matching both the queries in the “must” array, and is shown below:

Bool Query Example 2 – Filter

The previous example demonstrated the “must” parameter in the bool query. You can see in the results of the previous example that the results had values in the “_score” field. Now let us use the same query, but this time let us replace the “must” with “filter” and see what happens:

From the above screenshot, it can be seen that the score value is zero for the search results. This is because when using the filter context, the score is not computed by Elasticsearch in order to make the search faster.

If we use a must condition with a filter condition, the scores are calculated for the clauses in must, but no scores are computed for the filter side.

 

Bool Query Example 3 – Should

Now, let us see the effect of the “should” section in the bool query. Let us add a should clause in the above example’s query. This “should” condition is to match documents that contain the text “versatile” in the “phrase” fields of the documents. The query for this would look like below:

POST employees/_search { "query": { "bool": { "must": [ { "match": { "position": "manager" } }, { "range": { "experience": { "gte": 12 } } } ], "should": [ { "match": { "phrase": "versatile" } } ] } } }

Now the results will be the same 2 documents that we received in the previous example, but the document with id=3, which was shown as the last result is shown as the first result. This is because the clause in the “should” array is occurring in that document and hence the score has increased, and so it was promoted as the first document.

Bool Query Example 4 – Multiple Conditions

A real-world example of a bool query might be more complex than the above simple ones. What if users want to get employees who might be from the companies “Yamaha” or “Telane”, and are of the title “manager” or “associate”, with a salary greater than 100,000.
The above-stated condition, when put in short can be shortened as below

(company = Yamaha OR company = Yozio ) AND (position = manager OR position = associate ) AND (salary>=100000)

This can be achieved using multiple bool queries inside a single must clause, as shown in the below query:

 

5.2. Boosting Queries

Sometimes, there are requirements in the search criteria where we need to demote certain search results but do not want to omit them from the search results altogether. In such cases, boosting the query would become handy. Let us go through a simple example to demonstrate this.
Let us search for all the employees from China and then demote the employees from the company “Telane” in search results. We can use the boosting query like the below:

POST employees/_search { "query": { "boosting" : { "positive" : { "match": { "country": "china" } }, "negative" : { "match": { "company": "Talane" } }, "negative_boost" : 0.5 } } }

Now the response of the above query would be as given below, where you can see that the employee of the company “Talane” is ranked the last and has a difference of 0.5 in score with the previous result.

We can apply any query to the “positive” and “negative” sections of the boosting query. This is good when we need to apply multiple conditions with a bool query. An example of such a query is given below:

GET employees/_search { "query": { "boosting": { "positive": { "bool": { "should": [ { "match": { "country": { "query": "china" } } }, { "range": { "experience": { "gte": 10 } } } ] } }, "negative": { "match": { "gender": "female" } }, "negative_boost": 0.5 } } }

 

5.3 Function Score Queries

The function_score query enables us to change the score of the documents that are returned by a query. The function_score query requires a query and one or more functions to compute the score. If no functions are mentioned, the query is executed as normal.

The most simple case of the function score, without any function, is demonstrated below:

 

5.3.1 function_score: weight

As said in the earlier sections, we can use one or more score functions in the “functions” array of the “function_score” query. One of the simplest, yet important functions being the “weight” score function.
According to the documentation,
The weight score allows you to multiply the score by the provided weight. The weight can be defined per function in the functions array (example above) and is multiplied with the score computed by the respective function
Let us demonstrate the example using a simple modification of the above query. Let us include two filters in the “functions” part of the query. The first one would search for the term “coherent” in the “phrase” field of the document and if found will boost the score by a weight of 2. The second clause would search for the term “emulation” in the field “phrase” and will boost by a factor of 10, for such documents. Here is the query for the same:

GET employees/_search { "_source": ["position","phrase"], "query": { "function_score": { "query": { "match": { "position": "manager" } }, "functions": [ { "filter": { "match": { "phrase": "coherent" } }, "weight": 2 }, { "filter": { "match": { "phrase": "emulation" } }, "weight": 10 } ], "score_mode": "multiply", "boost": "5", "boost_mode": "multiply" } } }

The response of the above query is as below:

{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 72.61542, "hits" : [ { "_index" : "employees", "_type" : "_doc", "_id" : "4", "_score" : 72.61542, "_source" : { "phrase" : "Emulation of roots heuristic coherent systems", "position" : "Resources Manager" } }, { "_index" : "employees", "_type" : "_doc", "_id" : "3", "_score" : 30.498476, "_source" : { "phrase" : "Versatile object-oriented emulation", "position" : "Human Resources Manager" } } ] } }

The simple match part of the query on the position field yielded a score of 3.63 and 3.04 for the two documents. When the first function in the functions array was applied (match for the “coherent” keyword), there was only one match, and that was for the document with id = 4.

The current score of that document was multiplied with the weight factor for the match “coherent”, which is 2. Now the new score for the document becomes 3.63*2 = 7.2

After that, the second condition (match for “emulation”) matched for both documents.

So the current score of the document with id=4 is 7.2*10 = 72, where 10 is the weight factor for the second clause.

The document with id=3 matched only for the second clause and hence its score = 3.0*10 = 30.

 

5.3.2  function_score: script_score

It often occurs that we need to compute the score based on one or more fields/fields and for that the default scoring mechanism is not sufficient. Elasticsearch provides us with the “script_score” score function to compute the score based on custom requirements. Here we can provide a script, which will return the score for each document based on the custom logic on the fields.
Say, for example, we need to compute the scores as a function of salary and experience, ie the employees with the highest salary to experience ratio should score more. We can use the following function_score query for the same:

GET employees/_search { "_source": [ "name", "experience", "salary" ], "query": { "function_score": { "query": { "match_all": {} }, "functions": [ { "script_score": { "script": { "source": "(doc['salary'].value/doc['experience'].value)/1000" } } } ], "boost_mode": "replace" } } }

In the above query, the script part:

(doc['salary'].value/doc['experience'].value)/1000

The script part above will generate the scores for the search results. For example, for an employee with a salary = 180025 and experience = 7 the score generated would be:

(180025/7)/1000 = 25

Since we are using the boost_mode: replace the scores computed by a script that is placed exactly as the score for each document. The results for the above query is given in the screenshot below:

 

5.3.3 function_score: field_value_factor

We can make use of a field from the document to influence the score by using the “field_value_factor” function. This is in some ways a simple alternative to “script_score”. In our example, let us make use of the “experience” field value to influence our score as below

GET employees/_search { "_source": ["name","experience"], "query": { "function_score": { "field_value_factor": { "field": "experience", "factor": 0.5, "modifier": "square", "missing": 1 } } } }

The response for the above query is as shown below:

{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 4, "relation" : "eq" }, "max_score" : 36.0, "hits" : [ { "_index" : "employees", "_type" : "_doc", "_id" : "3", "_score" : 36.0, "_source" : { "name" : "Winston Waren", "experience" : 12 } }, { "_index" : "employees", "_type" : "_doc", "_id" : "4", "_score" : 36.0, "_source" : { "name" : "Alan Thomas", "experience" : 12 } }, { "_index" : "employees", "_type" : "_doc", "_id" : "2", "_score" : 30.25, "_source" : { "name" : "Othilia Cathel", "experience" : 11 } }, { "_index" : "employees", "_type" : "_doc", "_id" : "1", "_score" : 12.25, "_source" : { "name" : "Huntlee Dargavel", "experience" : 7 } } ] } }

The score computation for the above would be like below:

Square of (factor*doc[experience].value)

For a document with “experience” containing the value of 12, the score will be:

square of (0.5*12) = square of (6) = 36

 

5.3.4 function_score: Decay Functions

Consider the use case of searching for hotels near a location. For this use case, the nearer the hotel is, the more relevant the search results are, but when it is farther, the search becomes insignificant. Or to refine it further, if the hotel is farther than, say a walkable distance of 1km from the location, the search results should show a rapid decline in the score. Whereas the ones inside the 1km radius should be scored higher.
For this kind of use case, a decaying mode of scoring is the best choice, ie the score will start to decay from the point of interest. We have score functions in Elasticsearch for this purpose and they are called the decay functions. There are three types of decay functions, namely “gauss”, “linear” and “exponential” or “exp”.
Let us take an example of a use case from our scenario. We need to score the employees based on their salaries. The ones near to 200000 and between the ranges 170000 to 230000 should get higher scoring, and the ones below and above the range should have the scores significantly lower.

GET employees/_search { "_source": [ "name", "salary" ], "query": { "function_score": { "query": { "match_all": {} }, "functions": [ { "gauss": { "salary": { "origin": 200000, "scale": 30000 } } } ], "boost_mode": "replace" } } }

Here the ‘origin’ represents the point to start calculating the distance. The scale represents the distance from the origin, up to which the priority should be given for scoring. There are additional parameters that are optional and can be viewed in Elastic’s documentation

The above query results are shown in the image below:

6. Parent-Child Queries

One to many relationships can be handled using the parent-child method (now called the join operation) in Elasticsearch. Let us demonstrate this with an example scenario. Consider we have a forum, where anyone can post any topic (say posts). Users can comment on individual posts. So in this scenario, we can consider that the individual posts as the parent documents and the comments to them as their children. This is best explained in the below figure:

For this operation, we will have a separate index created, with special mapping (schema) applied.
Create the index with join data type with the below request

PUT post-comments { "mappings": { "properties": { "document_type": { "type": "join", "relations": { "post": "comment" } } } } }

In the above schema, you can see there is a type named “join”, which indicates, that this index is going to have parent-child-related documents. Also, the ‘relations’ object has the names of the parent and child identifiers defined.

That is post:comment refers to parent:child relation. Each document will consist of a field named “document_type” which will have the value “post” or “comment”. The value “post” will indicate that the document is a parent and the value “comment” will indicate the document is a “child”.

Let us index some documents for this:

PUT post-comments/_doc/1 { "document_type": { "name": "post" }, "post_title" : "Angel Has Fallen" } PUT post-comments/_doc/2 { "document_type": { "name": "post" }, "post_title" : "Beauty and the beast - a nice movie" }

Indexing child documents for the document with id=1

PPUT post-comments/_doc/A?routing=1 { "document_type": { "name": "comment", "parent": "1" }, "comment_author": "Neil Soans", "comment_description": "'Angel has Fallen' has some redeeming qualities, but they're too few and far in between to justify its existence" } PUT post-comments/_doc/B?routing=1 { "document_type": { "name": "comment", "parent": "1" }, "comment_author": "Exiled Universe", "comment_description": "Best in the trilogy! This movie wasn't better than the Rambo movie but it was very very close." }

Indexing child documents for the document with id=2

PUT post-comments/_doc/D?routing=1 { "document_type": { "name": "comment", "parent": "2" }, "comment_author": "Emma Cochrane", "comment_description": "There's the sublime beauty of a forgotten world and the promise of happily-ever-after to draw you to one of your favourite fairy tales, once again. Give it an encore." } PUT post-comments/_doc/E?routing=1 { "document_type": { "name": "comment", "parent": "2" }, "comment_author": "Common Sense Media Editors", "comment_description": "Stellar music, brisk storytelling, delightful animation, and compelling characters make this both a great animated feature for kids and a great movie for anyone" }

 

6.1 The has_child Query

This will query the child’s documents and then returns the parents associated with them as the results. Suppose we need to query for the term “music” in the field “comments_description” in the child documents, and to get the parent documents corresponding to the search results, we can use the has_child query as below:

GET post-comments/_search { "query": { "has_child": { "type": "comment", "query": { "match": { "comment_description": "music" } } } } }

For the above query, the child documents that matched the search was only the document with id=E, for which the parent is the document with id=2. The search result would get us the parent document as below:

6.2 The has_parent Query

The has_parent query would perform the opposite of the has_child query, that is it will return the child documents of the parent documents that matched the query.
Let us search for the word “Beauty” in the parent document and return the child documents for the matched parents. We can use the below query for that

GET post-comments/_search { "query": { "has_parent": { "parent_type": "post", "query": { "match": { "post_title": "Beauty" } } } } }

The matched parent document for the above query is the one with document id =1. As you can see from the response below, the children documents corresponding to the id=1 document is returned by the above query:

6.3 Fetching Child Documents with Parents

Sometimes, we require both the parent and child documents in the search results. For example, if we are listing the posts, it would also be nice to display a few comments below it as it would be more eye candy.
Elasticsearch allows the same. Let us use the has_child query to return parents and this time, we will fetch the corresponding child documents too.
The following query contains a parameter called “inner_hits” which will allow us to do the exact same.

Sours: https://coralogix.com/blog/42-elasticsearch-query-examples-hands-on-tutorial/

To illustrate the different query types in Elasticsearch, we will be searching a collection of book documents with the following fields: title, authors, summary, release date, and number of reviews.

But first, let’s create a new index and index some documents using the bulk API:

Examples

Basic Match Query

There are two ways of executing a basic full-text (match) query: using the Search Lite API, which expects all the search parameters to be passed in as part of the URL, or using the full JSON request body which allows you use the full Elasticsearch DSL.

Here is a basic match query that searches for the string “guide” in all the fields:

The full body version of this query is shown below and produces the same results as the above search lite.

The  keyword is used in place of the  keyword as a convenient shorthand way of running the same query against multiple fields. The  property specifies what fields to query against and, in this case, we want to query against all the fields in the document.

Note: Prior to ElasticSearch 6 you could use the "" field to find a match in all the fields instead of having to specify each field. The "" field works by concatenating all the fields into one big field, using space as a delimiter and then analyzing and indexing the field. In ES6, this functionality has been deprecated and disabled by default. ES6 provides the "" parameter if you are interested in creating a custom "" field. See the ElasticSearch Guide for more info.

The SearchLite API also allows you to specify what fields you want to search on. For example, to search for books with the words “in Action” in the title field:

However, the full body DSL gives you more flexibility in creating more complicated queries (as we will see later) and in specifying how you want the results back. In the example below, we specify the number of results we want back, the offset to start from (useful for pagination), the document fields we want to be returned, and term highlighting. Note that we use a "" query instead of a "" query because we only care about searching in the title field.

Note: For multi-word queries, the  query lets you specify whether to use the  operator instead of the default  operator. You can also specify the  option to tweak the relevance of the returned results. Details can be found in the Elasticsearch guide.

Boosting

Since we are searching across multiple fields, we may want to boost the scores in a certain field. In the contrived example below, we boost scores from the summary field by a factor of 3 in order to increase the importance of the summary field, which will, in turn, increase the relevance of document .

Note: Boosting does not merely imply that the calculated score gets multiplied by the boost factor. The actual boost value that is applied goes through normalization and some internal optimization. More information on how boosting works can be found in the Elasticsearch guide.

Bool Query

The AND/OR/NOT operators can be used to fine tune our search queries in order to provide more relevant or specific results. This is implemented in the search API as a  query. The  query accepts a  parameter (equivalent to AND), a  parameter (equivalent to NOT), and a  parameter (equivalent to OR). For example, if I want to search for a book with the word “Elasticsearch” OR “Solr” in the title, AND is authored by “clinton gormley” but NOT authored by “radu gheorge”:

Note: As you can see, a bool query can wrap any other query type including other bool queries to create arbitrarily complex or deeply nested queries.

Fuzzy Queries

Fuzzy matching can be enabled on Match and Multi-Match queries to catch spelling errors. The degree of fuzziness is specified based on the Levenshtein distance from the original word, i.e. the number of one-character changes that need to be made to one string to make it the same as another string.

Note: Instead of specifying "AUTO" you can specify the numbers 0, 1, or 2 to indicate the maximum number of edits that can be made to the string to find a match. The benefit of using "AUTO" is that it takes into account the length of the string. For strings that are only 3 characters long, allowing a fuzziness of 2 will result in poor search performance. Therefore it's recommended to stick to "AUTO" in most cases.

Wildcard Query

Wildcard queries allow you to specify a pattern to match instead of the entire term.  matches any character and  matches zero or more characters. For example, to find all records that have an author whose name begins with the letter ‘t’:

Regexp Query

Regexp queries allow you to specify more complex patterns than wildcard queries.

Match Phrase Query

The match phrase query requires that all the terms in the query string be present in the document, be in the order specified in the query string and be close to each other. By default, the terms are required to be exactly beside each other but you can specify the  value which indicates how far apart terms are allowed to be while still considering the document a match.

Note: in the example above, for a non-phrase type query, document  would normally have a higher score and appear ahead of document  because its field length is shorter. However, as a phrase query the proximity of the terms is factored in, so document  scores better.

Note: Also note that, if the slop parameter was reduced to 1 document  would no longer appear in the result set.

Match Phrase Prefix

Match phrase prefix queries provide search-as-you-type or a poor man’s version of autocomplete at query time without needing to prepare your data in any way. Like the  query, it accepts a  parameter to make the word order and relative positions somewhat less rigid. It also accepts the  parameter to limit the number of terms matched in order to reduce resource intensity.

Note: Query-time search-as-you-type has a performance cost. A better solution is index-time search-as-you-type. Check out the Completion Suggester API or the use of Edge-Ngram filters for more information.

Query String

The  query provides a means of executing  queries, bool queries, boosting, fuzzy matching, wildcards, regexp, and range queries in a concise shorthand syntax. In the following example, we execute a fuzzy search for the terms “search algorithm” in which one of the book authors is “grant ingersoll” or “tom morton.” We search all fields but apply a boost of 2 to the summary field.

Simple Query String

The  query is a version of the  query that is more suitable for use in a single search box that is exposed to users because it replaces the use of AND/OR/NOT with +/|/-, respectively, and it discards invalid parts of a query instead of throwing an exception if a user makes a mistake.

Term/Terms Query

The above examples have been examples of full-text search. Sometimes we are more interested in a structured search in which we want to find an exact match and return the results. The  and  queries help us here. In the below example, we are searching for all books in our index published by Manning Publications.

Multiple terms can be specified by using the  keyword instead and passing in an array of search terms.

Term Query - Sorted

Term queries results (like any other query results) can easily be sorted. Multi-level sorting is also allowed.

Note: In ES6, to sort or aggregate by a text field, like a title, for example, you would need to enable fielddata on that field. More details on this can be found in the ElasticSearch Guide. 

Range Query

Another structured query example is the range query. In this example, we search for books published in 2015.

Note: Range queries work on date, number, and string type fields.

Filtered Bool Query

When using a bool query, you can use a filter clause to filter down the results of a query. For our example, we are querying for books with the term “Elasticsearch” in the title or summary but we want to filter our results to only those with 20 or more reviews.

Multiple filters can be combined through the use of the  filter. In the next example, the filter determines that the returned results must have at least 20 reviews, must not be published before 2015 and should be published by O'Reilly.

Function Score: Field Value Factor

There may be a case where you want to factor in the value of a particular field in your document into the calculation of the relevance score. This is typical in scenarios where you want the boost the relevance of a document based on its popularity. In our example, we would like the more popular books (as judged by the number of reviews) to be boosted. This is possible using the  function score.

Note 1: We could have just run a regular  query and sorted by the  field but then we lose the benefits of having relevance scoring.

Note 2: There are a number of additional parameters that tweak the extent of the boosting effect on the original relevance score such as “modifier”, “factor”, “boost_mode”, etc. These are explored in detail in the Elasticsearch guide.

Function Score: Decay Functions

Suppose that instead of wanting to boost incrementally by the value of a field, you have an ideal value you want to target and you want the boost factor to decay the further away you move from the value. This is typically useful in boosts based on lat/long, numeric fields like price, or dates. In our contrived example, we are searching for books on “search engines” ideally published around June 2014.

Function Score: Script Scoring

In the case where the built-in scoring functions do not meet your needs, there is the option to specify a Groovy script to use for scoring. In our example, we want to specify a script that takes into consideration the  before deciding how much to factor in the number of reviews. Newer books may not have as many reviews yet so they should not be penalized for that.

The scoring script looks like this:

To use a scoring script dynamically, we use the  parameter:

Note 1: To use dynamic scripting, it must be enabled for your Elasticsearch instance in the  file. It’s also possible to use scripts that have been stored on the Elasticsearch server. Check out the Elasticsearch reference docs for more information.

Note 2: JSON cannot include embedded newline characters so the semicolon is used to separate statements.


If you enjoyed this post, check out Tim's other Big Data posts here:

Sours: https://dzone.com/articles/23-useful-elasticsearch-example-queries

Api elasticsearch example search

Search your dataedit

A search query, or query, is a request for information about data in Elasticsearch data streams or indices.

You can think of a query as a question, written in a way Elasticsearch understands. Depending on your data, you can use a query to get answers to questions like:

  • What processes on my server take longer than 500 milliseconds to respond?
  • What users on my network ran within the last week?
  • What pages on my website contain a specific word or phrase?

A search consists of one or more queries that are combined and sent to Elasticsearch. Documents that match a search’s queries are returned in the hits, or search results, of the response.

A search may also contain additional information used to better process its queries. For example, a search may be limited to a specific index or only return a specific number of results.

Run a searchedit

You can use the search API to search and aggregate data stored in Elasticsearch data streams or indices. The API’s request body parameter accepts queries written in Query DSL.

The following request searches using a query. This query matches documents with a value of .

GET /my-index-000001/_search { "query": { "match": { "user.id": "kimchy" } } }

The API response returns the top 10 documents matching the query in the property.

{ "took": 5, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 1.3862942, "hits": [ { "_index": "my-index-000001", "_type": "_doc", "_id": "kxWFcnMByiguvud1Z8vC", "_score": 1.3862942, "_source": { "@timestamp": "2099-11-15T14:12:12", "http": { "request": { "method": "get" }, "response": { "bytes": 1070000, "status_code": 200 }, "version": "1.1" }, "message": "GET /search HTTP/1.1 200 1070000", "source": { "ip": "127.0.0.1" }, "user": { "id": "kimchy" } } } ] } }

Define fields that exist only in a queryedit

Instead of indexing your data and then searching it, you can define runtime fields that only exist as part of your search query. You specify a section in your search request to define the runtime field, which can optionally include a Painless script.

For example, the following query defines a runtime field called . The included script calculates the day of the week based on the value of the field, and uses to return the calculated value.

The query also includes a terms aggregation that operates on .

GET /my-index-000001/_search { "runtime_mappings": { "day_of_week": { "type": "keyword", "script": { "source": """emit(doc['@timestamp'].value.dayOfWeekEnum .getDisplayName(TextStyle.FULL, Locale.ROOT))""" } } }, "aggs": { "day_of_week": { "terms": { "field": "day_of_week" } } } }

The response includes an aggregation based on the runtime field. Under is a with a value of . The query dynamically calculated this value based on the script defined in the runtime field without ever indexing the field.

{ ... *** "aggregations" : { "day_of_week" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "Sunday", "doc_count" : 5 } ] } } }

Common search optionsedit

You can use the following options to customize your searches.

Query DSL
Query DSL supports a variety of query types you can mix and match to get the results you want. Query types include:

Aggregations
You can use search aggregations to get statistics and other analytics for your search results. Aggregations help you answer questions like:

  • What’s the average response time for my servers?
  • What are the top IP addresses hit by users on my network?
  • What is the total transaction revenue by customer?

Search multiple data streams and indices
You can use comma-separated values and grep-like index patterns to search several data streams and indices in the same request. You can even boost search results from specific indices. See Search multiple data streams and indices.

Paginate search results
By default, searches return only the top 10 matching hits. To retrieve more or fewer documents, see Paginate search results.

Retrieve selected fields
The search response’s property includes the full document for each hit. To retrieve only a subset of the or other fields, see Retrieve selected fields.

Sort search results
By default, search hits are sorted by , a relevance score that measures how well each document matches the query. To customize the calculation of these scores, use the query. To sort search hits by other field values, see Sort search results.

Run an async search
Elasticsearch searches are designed to run on large volumes of data quickly, often returning results in milliseconds. For this reason, searches are synchronous by default. The search request waits for complete results before returning a response.

However, complete results can take longer for searches across frozen indices or multiple clusters.

To avoid long waits, you can run an asynchronous, or async, search instead. An async search lets you retrieve partial results for a long-running search now and get complete results later.

Search timeoutedit

By default, search requests don’t time out. The request waits for complete results from each shard before returning a response.

While async search is designed for long-running searches, you can also use the parameter to specify a duration you’d like to wait on each shard to complete. Each shard collects hits within the specified time period. If collection isn’t finished when the period ends, Elasticsearch uses only the hits accumulated up to that point. The overall latency of a search request depends on the number of shards needed for the search and the number of concurrent shard requests.

GET /my-index-000001/_search { "timeout": "2s", "query": { "match": { "user.id": "kimchy" } } }

To set a cluster-wide default timeout for all search requests, configure using the cluster settings API. This global timeout duration is used if no argument is passed in the request. If the global search timeout expires before the search request finishes, the request is cancelled using task cancellation. The setting defaults to (no timeout).

Search cancellationedit

You can cancel a search request using the task management API. Elasticsearch also automatically cancels a search request when your client’s HTTP connection closes. We recommend you set up your client to close HTTP connections when a search request is aborted or times out.

Track total hitsedit

Generally the total hit count can’t be computed accurately without visiting all matches, which is costly for queries that match lots of documents. The parameter allows you to control how the total number of hits should be tracked. Given that it is often enough to have a lower bound of the number of hits, such as "there are at least 10000 hits", the default is set to . This means that requests will count the total hit accurately up to hits. It is a good trade off to speed up searches if you don’t need the accurate number of hits after a certain threshold.

When set to the search response will always track the number of hits that match the query accurately (e.g. will always be equal to when is set to true). Otherwise the returned in the object in the search response determines how the should be interpreted. A value of means that the is a lower bound of the total hits that match the query and a value of indicates that is the accurate count.

GET my-index-000001/_search { "track_total_hits": true, "query": { "match" : { "user.id" : "elkbee" } } }

... returns:

{ "_shards": ... "timed_out": false, "took": 100, "hits": { "max_score": 1.0, "total" : { "value": 2048, "relation": "eq" }, "hits": ... } }

The total number of hits that match the query.

The count is accurate (e.g. means equals).

It is also possible to set to an integer. For instance the following query will accurately track the total hit count that match the query up to 100 documents:

GET my-index-000001/_search { "track_total_hits": 100, "query": { "match": { "user.id": "elkbee" } } }

The in the response will indicate if the value returned in is accurate () or a lower bound of the total ().

For instance the following response:

{ "_shards": ... "timed_out": false, "took": 30, "hits": { "max_score": 1.0, "total": { "value": 42, "relation": "eq" }, "hits": ... } }

42 documents match the query

and the count is accurate ()

... indicates that the number of hits returned in the is accurate.

If the total number of hits that match the query is greater than the value set in , the total hits in the response will indicate that the returned value is a lower bound:

{ "_shards": ... "hits": { "max_score": 1.0, "total": { "value": 100, "relation": "gte" }, "hits": ... } }

There are at least 100 documents that match the query

This is a lower bound ().

If you don’t need to track the total number of hits at all you can improve query times by setting this option to :

GET my-index-000001/_search { "track_total_hits": false, "query": { "match": { "user.id": "elkbee" } } }

... returns:

{ "_shards": ... "timed_out": false, "took": 10, "hits": { "max_score": 1.0, "hits": ... } }

The total number of hits is unknown.

Finally you can force an accurate count by setting to in the request.

Quickly check for matching docsedit

If you only want to know if there are any documents matching a specific query, you can set the to to indicate that we are not interested in the search results. You can also set to to indicate that the query execution can be terminated whenever the first matching document was found (per shard).

GET /_search?q=user.id:elkbee&size=0&terminate_after=1

is always applied after the and stops the query as well as the aggregation executions when enough hits have been collected on the shard. Though the doc count on aggregations may not reflect the in the response since aggregations are applied before the post filtering.

The response will not contain any hits as the was set to . The will be either equal to , indicating that there were no matching documents, or greater than meaning that there were at least as many documents matching the query when it was early terminated. Also if the query was terminated early, the flag will be set to in the response.

{ "took": 3, "timed_out": false, "terminated_early": true, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total" : { "value": 1, "relation": "eq" }, "max_score": null, "hits": [] } }

The time in the response contains the milliseconds that this request took for processing, beginning quickly after the node received the query, up until all search related work is done and before the above JSON is returned to the client. This means it includes the time spent waiting in thread pools, executing a distributed search across the whole cluster and gathering all the results.

Sours: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-your-data.html
Learn to Write Elastic Search Query Part 1 Match Filter and Source

One of the great things about Elasticsearch is its extensive REST API which allows you to integrate, manage and query the indexed data in countless different ways. Examples of using this API to integrate with Elasticsearch are abundant, spanning different companies and use cases.

Documentation on the various API calls is extensive, and for some, this wealth of information can be somewhat daunting:

Twitter

This article will try and provide an overview of the main API calls that you should get acquainted with as you get started with Elasticsearch, and will add some usage examples and corresponding cURL commands. The API examples detailed below are Document API, Search API, Indices API, cat API and Cluster API.

This is by no means a full API guide — this would be impossible and is covered in Elastic’s official documentation. Advanced users might find this cheat sheet we put together helpful as it contains some useful tips and best practices on the Elasticsearch Cluster API.

Document API

This category of APIs are used for handling documents in Elasticsearch. Using these APIs, for example, you will create documents in an index, update them, move them to another index, or remove them.
The APIs detailed below are for handling single documents, but you can also make use of certain multi-document APIs for performing bulk actions (e.g. multi get).

CategoryQuerycURL
– Add (or update) a documentPUT /<<indexname>>curl -XPUT 'localhost:9200/twitter/my_index/my_type/1?pretty' -H 'Content-Type: application/json' -d' { "field : "value", ... } '
– Retrieve a specific existing documentGET /<<indexname>>curl -XGET 'localhost:9200/my_index/ my_type/0?pretty'
– Delete a documentDELETE /<<indexname>>curl -XDELETE 'localhost:9200/my_index/ my_type/0?pretty'
– Copies a document from one index to anotherPOST /_reindexcurl -XPOST 'localhost:9200/_reindex?pretty' -H 'Content-Type: application/json' -d' { "source": { "index": "some_old_index" }, "dest": { "index": "some_new_index" } } '
API () – This lets you pull documents from multiple indices, specifying as many docs as necessary per indexGET /<<targetindex>>/_mgetcurl -X GET "localhost:9200/_mget?pretty" -H 'Content-Type: application/json' -d' { "docs": [ { "_index": "index1", "_id": "1" }, { "_index": "index1", "_id": "2" } ] }
– This lets you perform multiple types of requests at once.POST /<<targetindex>>/_bulkcurl -X POST "localhost:9200/_bulk?pretty" -H 'Content-Type: application/json' -d' { "index" : { "_index" : "test", "_id" : "1" } } { "delete" : { "_index" : "test", "_id" : "2" } } { "create" : { "_index" : "test", "_id" : "3" } } { "field1" : "value1" }}
POST /<<targetindex>>/_delete_by_querycurl -X POST "localhost:9200/index1/_delete_by_query?pretty" -H 'Content-Type: application/json' -d' { "query": { "match": { "user.id": "gedalyahreback" } } }
– the parameter at the end tells the query to proceed in the event there is a conflict between versions of a documentPOST /<<targetindex>>/_update_by_querycurl -X POST "localhost:9200/myindex1/ _update_by_query?conflicts=proceed"

Search API

As its name implies, these API calls can be used to query indexed data for specific information. Search APIs can be applied globally, across all available indices and types, or more specifically within an index. Responses will contain matches to the specific query.  The Search API sometimes depends on usage of the Mustache language, which is implemented within Elasticsearch as a scripting language.

CategoryQuerycURL
– Enter a search query
and return hits matching the query
GET /<<targetindex>>/_search POST /<<targetindex>>/_searchcurl -XGET 'localhost:9200/my_index/my_type/_count?q=field: value&pretty'
   
– Validate a potentially heavy
query without actually executing it
GET /<<targetindex>>/_validate/<<query>>curl -XGET 'localhost:9200/my_index/my_type/ _validate?q=field:value’
– Calculate a score for a query
for getting feedback on whether
a document matches the query or not
GET /<<targetindex>>/_explain/<<id>> POST /<<targetindex>>/_explain/<<id>>curl -XGET 'localhost:9200/my_index/my_type/0/ _explain?q=message:search’
GET /_search/scroll POST /_search/scroll DELETE /_search/scrollcurl -X GET "localhost:9200/_search/scroll?pretty" -H 'Content-Type: application/json' -d' {} '
GET /_search/templatecurl -X GET "localhost:9200/_search/scroll?pretty" -H 'Content-Type: application/json' -d' } '
POST _scripts/<<templateid>> DELETE _scripts/<<templateid>>curl -X POST "localhost:9200/_scripts/<<templateid>>?pretty" -H 'Content-Type: application/json' -d' { "script": { "lang": "mustache", "source": { "query": "{{some_template}}" } } } }

 

 

Indices API

This type of Elasticsearch API allows users to manage indices, mappings, and templates. For example, you can use this API to create or delete a new index, check if a specific index exists or not, and define new mapping for an index.

Index Management

CategoryQuerycURL
Create a new Elasticsearch indexPUT /<<indexname>>curl -XPUT 'localhost:9200/indexname?pretty' -H 'Content-Type: application/json' -d' { "settings" : { "index" : { ... } } } '
Delete an indexDELETE /<<indexname>>curl -XDELETE 'localhost:9200/<<indexname>>?pretty'
Open or Close an indexPOST /<<indexname>>/_openPOST /<<indexname>>/_closecurl -XPOST 'localhost:9200/<<indexname>>/_close?pretty'

 

curl -XPOST 'localhost:9200/<<indexname>>/_open?pretty'
ShrinkPOST /<<indexname>>/_shrink/<<indexname>> PUT /<<indexname>>/_shrink/<<indexname>>curl -XPOST "localhost:9200/<<indexname>>/ _shrink/shrunken-indexname"
SplitPOST /<<indexname>>/_split/<<indexname>> PUT /<<indexname>>/_split/<<indexname>>curl -XPOST "localhost:9200/indexname/_split/split-indexname" -H 'Content-Type: application/json' -d' { "settings": { "index.number_of_shards": 4 } } '
POST /<<indexname>>/_clone/<<clonedindexname>> PUT /<<indexname>>/_clone/<<clonedindexname>>curl -X POST "localhost:9200/indexname/_clone/clonedindex"
GET /_resolve/index/<<indexname>>curl -X GET "localhost:9200/_resolve/index/indexname"
POST /<<indextoroll>>/_rollover/<<newindex>> POST /<<indextoroll>>/_rollover/curl -X POST "localhost:9200/indextoroll/_rollover/newindex" -H 'Content-Type: application/json' -d' { "conditions": { "max_age": "14d", "max_docs": 5000, "max_size": "15gb" } } '

Mapping Management

Add a new type to existing mappingPUT /<<indexname>>/_mappingPUT /_mappingcurl -XPUT 'localhost:9200/indexname/_mapping/user?pretty' -H 'Content-Type: application/json' -d' { "properties": { "name": { "type": "text" } } } '
Retrieve mapping for a specific fieldGET /<<indexname>>/_mappingGET /_mappingcurl -XGET 'localhost:9200/indexname/_mapping/ my_type/field/my_field?pretty'

 

API

I personally love the cat API and use it whenever possible. The idea is to return data in a more user-friendly format as opposed to the normal JSON response. You can read about the various string parameters you can add to the cat commands here.

CategoryQuerycURL
Cat Indices – Gives us access
to info & metrics
regarding our indices
Cat Health – Overview
of index health
Cat Nodes – Info on
Elasticsearch nodes
#Tip: You can use headers to retrieve only relevant details on the nodes. Read here for more info.

Besides the ones above, the other cat API options are: , , , , , , , , , , , , , , , , , , , ,

Ingest APIs

CategoryQuerycURL
Manage Pipelines
#
Simulate Pipelines

 

Cluster API

These are cluster-specific API calls that allow you to manage and monitor your Elasticsearch cluster. Most of the APIs allow you to define which Elasticsearch node to call using either the internal node ID, its name or its address.

CategoryQuerycURL
Cluster Health
Cluster State — Filter with parameters
in the call URL.
Cluster Stats — Basic index metrics and node info
– manual
changes to
shard allocation
Settings
Parameters: , , ,

For advanced usage of cluster APIs, read this blog post.

Ending with some tips

It’s time to get your hands dirty! The best way to learn your way around these APIs is experimentation. There are plenty of resources which can help you with this, and a bunch of open source tools as well.

First, read through the API conventions before you start here. These will help you learn about the different options that can be applied to the calls, how to construct the APIs and how to filter responses.

I also recommend using the built-in console for playing around with the APIs — just enter your API in the editor on the left, and see the response from Elasticsearch on the right.

A good thing to remember is that some APIs change and get deprecated from version to version, and it’s a good best practice to keep tabs on breaking changes.

Twitter API

The gradual removal of mapping types will affect the indexing and search APIs — you can see the effect of this change in the different versions here.

The REST API is one of the main reasons why Elasticsearch, and the ELK stack as a whole, is so popular. The list above is merely the tip of the iceberg, but also a good reference point for getting started.

Logz.io API

Despite being a fully managed and hosted ELK solution, Logz.io provides a public API that is based on the Elasticsearch search API, albeit with some limitations. If you are using Logz.io, you can use this API to run search queries on the data you are shipping to your account. The query language used is Elasticsearch Search API DSL.

In addition, the Alerts API allows Logz.io users to create, delete and manage alerts. Again, there are some limitations that you should be aware of pertaining to the amount of concurrent APIs called.

Sours: https://logz.io/blog/elasticsearch-api/

Now discussing:

Getting the Elasticsearch query right down to its syntax can be tough and confounding, even though search is the primary function of Elastic…umm…search.To help, this guide will take you through the ins and outs of common search queries for Elasticsearch and set you up for future querying success.

Lucene Query Syntax

Elasticsearch is part of the ELK Stack and is built on Lucene, the search library from Apache, and exposes Lucene’s query syntax. It’s such an integral part of Elasticsearch that when you query the root of an Elasticsearch cluster, it will tell you the Lucene version:

Knowing the Lucene syntax and operators will go a long way in helping you build queries. Its use is in both the simple and the standard query string query. Here are some of the basics:

The Query DSL

The Query DSL can be invoked using most of Elasticsearch’s search APIs. For simplicity, we’ll look only at the Search API that uses the _search endpoint. When calling the search API, you can specify the index and/or type on which you want to search. You can even search on multiple indices and types by separating their names with commas or using wildcards to match multiple indices and types:

Search on all the Logstash indices:

Or search in the current and legacy indices, in the documents type:

Search in the clients indices, in the and types:

We’ll be using Request Body Searches, so searches should be invoked as follows:

URI Search

The easiest way to search your Elasticsearch cluster is through URI search. You can pass a simple query to Elasticsearch using the q query parameter. The following query will search your whole cluster for documents with a name field equal to “travis”:

With the Lucene syntax, you can build quite impressive searches. Usually you’ll have to URL-encode characters such as spaces (we omitted it in these examples for clarity):

A number of options are available that allow you to customize the URI search, specifically in terms of which analyzer to use (analyzer), whether the query should be fault-tolerant (lenient), and whether an explanation of the scoring should be provided (explain).

Although the URI search is a simple and efficient way to query your cluster, you’ll quickly find that it doesn’t support all of the features ES offers. The full power of Elasticsearch is evidentg through Request Body Search. Using Request Body Search allows you to build a complex search request using various elements and query clauses that will match, filter, and order as well as manipulate documents depending on multiple criteria.

The Request Body Search

Request Body Search uses a JSON document that contains various elements to create a search on your Elasticsearch cluster. Not only can you specify search criteria, you can also specify the range and number of documents that you expect back, the fields that you want, and various other options.

The first element of a search is the query element that uses Query DSL. Using Query DSL can sometimes be confusing because the DSL can be used to combine and build up query clauses into a query that can be nested deeply. Since most of the Elasticsearch documentation only refers to clauses in isolation, it’s easy to lose sight of where clauses should be placed.

To use the Query DSL, you need to include a “query” element in your search body and populate it with a query built using the DSL:

In this case, the “query” element contains a “match” query clause that looks for the term “meaning” in all of the fields in all of the documents in your cluster.

The query element is used along with other elements in the search body:

Here, we’re using the “fields” element to restrict which fields should be returned and the “from” and “size” elements to tell Elasticsearch we’re looking for documents 100 to 119 (starting at 100 and counting 20 documents).

Fields

You might be looking for events where a specific field contains certain terms. You specify the field, type a colon, then a space, then the string in quotation marks or the value without quotes. Here are some Lucene field examples:

  • name: “Ned Stark”
  • status: 404

Be careful with values with spaces such as “Ned Stark.” You’ll need to enclose it in double quotes to ensure that the whole value is used.

Filters vs. Queries

People who have used Elasticsearch before version 2 will be familiar with filters and queries. You used to build up a query body using both filters and queries. The difference between the two was that filters were generally faster because they check only if a document matches at all and not whether it matches well. In other words, filters give a boolean answer whereas queries return a calculated score of how well a document matches a query.

Scoring

We have mentioned the fact that Elasticsearch returns a score along with all of the matching documents from a query:

This score is calculated against the documents in Elasticsearch based on the provided queries. Factors such as the length of a field, how often the specified term appears in the field, and (in the case of wildcard and fuzzy searches) how closely the term matches the specified value all influence the score. The calculated score is then used to order documents, usually from the highest score to lowest, and the highest scoring documents are then returned to the client. There are various ways to influence the scores of different queries such as the boost parameter. This is especially useful if you want certain queries in a complex query to carry more weight than others and you are looking for the most significant documents.

When using a query in a filter context (as explained earlier), no score is calculated. This provides the enhanced performance usually associated with using filters but does not provide the ordering and significance features that come with scoring.

Term Level Queries

1. Range Queries

You can search for fields within a specific range, using square brackets for inclusive range searches and curly braces for exclusive range searches:

  • age:[3 TO 10] — Will return events with age between 3 and 10
  • price:{100 TO 400} — Will return events with prices between 101 and 399
  • name: [Adam TO Ziggy] — Will return names between and including Adam and Ziggy

As you can see in the examples above, you can use ranges in non-numerical fields like strings and dates as well.

2. Wildcard Queries

The search would not be a search without wildcards. You can use the * character for multiple character wildcards or the ? character for single character wildcards:

  • Ma?s — Will match Mars, Mass, and Maps
  • Ma*s — Will match Mars, Matches, and Massachusetts

3. Regex Queries (regexp)

Regex queries () give you even more power. Just place your regex between forward slashes (/):

  • /p[ea]n/ — Will match both pen and pan
  • /<.+>/ — Will match text that resembles an HTML tag

4. Fuzzy Queries

Fuzzy searching uses the Damerau-Levenshtein Distance to match terms that are similar in spelling. This is great when your data set has misspelled words.

Use the tilde (~) to find similar terms:

This will return results like “blew,” “brow,” and “glow.”

Use the tilde (~) along with a number to specify the how big the distance between words can be:

This will match, among other things: “jean,” “johns,” “jhon,” and “horn”

5. Free Text

It’s as simple as it sounds. Just type in the term or value you want to find. This can be a field, a string within a field, etc.

6. Elasticsearch Terms Query

Also just called a term query, this will return an exact match for a given term. Take this example from a database of baseball statistics:

Make sure you are using the term query here, NOT the text query. The term query will search for the exact match; text query will automatically filter punctuation.

7. Elasticsearch Terms Set Query

Similar to the term query, the terms_set query can hunt down multiple values based on certain conditions defined in the PUT request. To further the baseball example:

Compound Queries

Boolean Operators and the Bool Query

As with most computer languages, Elasticsearch supports the AND, OR, and NOT operators:

  • jack AND jill — Will return events that contain both jack and jill
  • ahab NOT moby — Will return events that contain ahab but not moby
  • tom OR jerry — Will return events that contain tom or jerry, or both

Although there are multiple query clause types, the one you’ll use the most is Compound Queries because it’s used to combine multiple clauses to build up complex queries.

The Bool Query is probably used the most because it can combine the features of some of the other compound query clauses such as the And, Or, Filter, and Not clauses. It is used so much that these four clauses have been deprecated in various versions in favor of using the Bool query. Using it is best explained with an example:

Within the query element, we’ve added the bool clause that indicates that this will be a boolean query. There’s quite a lot going in there, so let’s cover it clause-by-clause, starting at the top:

must

All queries within this clause must match a document in order for ES to return it. Think of this as your AND queries. The query that we used here is the fuzzy query, and it will match any documents that have a name field that matches “” in a fuzzy way. The extra “fuzziness” parameter tells Elasticsearch that it should be using a Damerau-Levenshtein Distance of 2 two determine the fuzziness.

must_not

Any documents that match the query within this clause will be outside of the result set. This is the NOT or minus (-) operator of the query DSL. In this case, we do a simple match query, looking for documents that contain the term “city.” Using as the field name indicates that the term can appear in any of the document’s fields. This is the clause, so matching documents will be excluded.

should

Up until now, we have been dealing with absolutes: and . Should is not absolute and is equivalent to the OR operator. Elasticsearch will return any documents that match one or more of the queries in the should clause.

The first query that we provided looks for documents where the age field is between 30 and 40. The second query does a wildcard search on the surname field, looking for values that start with “K.”

The query contained three different clauses, so Elasticsearch will only return documents that match the criteria in all of them. These queries can be nested, so you can build up very complex queries by specifying a bool query as a , , or query.

filter

One clause type we haven’t discussed for a compound query is the clause. Here is an example where we use one:

The query in the must clause tells Elasticsearch that it should return all of the documents. This might not seem to be a very useful search, but it comes in handy when you use it in conjunction with a filter as we have done here. The filter we have specified is a term query, asking for all documents that contain an email field with the value “[email protected]

We have used a filter to specify which documents we want, so they will all be returned with a score of 1. Filters don’t factor into the calculation of scores, so the match_all query gives all documents a score of 1.

More on the subject:

One thing to note is that this query won’t work if the email field is analyzed, which is the default for fields in Elasticsearch fields. The reason is best discussed in another blog post, but it comes down to the fact that Elasticsearch analyzes both fields and queries when they come in. In this case, the email field will break up into three parts: joe, blogs, and com. This means that it will match searches and documents for any three of those terms.

Boosting Queries

There are three kinds of boosting queries in Elasticsearch: , and . Positive queries actually are the main queries that you want to accumulate relevance score points. 

The negative_boost is a value between 0 and 1, against which you would multiply negative query results (if you set the at .25, it reduces the value of the negative query to a quarter of a positive query; .5 to half the value of a positive; .1 a tenth the value of a positive query, etc. This gives you a lot of flexibility in grading your queries.

Constant Score Queries

This is a valuable tool for segmenting certain queries that you want to give a boost in score. The code wrap isolates certain search terms and pairs them with a separate boost value:

So in this instance, you are giving any NGINX logs a greater value than others (presumably than other server logs like apache2 logs or IIS logs).

Disjunction Max Queries

Imagine if your Google results could separate between results that includes multiple things you’re searching for and only a few things. That’s what this does. 

You can group queries together as nested fields within the parameter.

function_score Queries

Function score queries, as their name suggests, exist to make it easier to use a function to compute a score. Define a query and set the rules to how to boost a result score.

Conclusion

The hardest thing about Elasticsearch is the depth and breadth of the available features. We have tried to cover the essential elements in as much detail as possible without drowning you in information. Ask any questions you might have in the comments, and look out for more in-depth posts covering some of the features we have mentioned. You can also read my prior Elasticsearch tutorial to learn more.

Sours: https://logz.io/blog/elasticsearch-queries/


1584 1585 1586 1587 1588