MENU
elastic search tutorial

by • July 10, 2017 • TechnologyComments Off on Talk about Elastic Search258

Talk about Elastic Search

ElasticSearch is an Open Source search engine. It is written in java and available for all platforms. ElasticSearch provides distributed and multitenant capable full-text search engine with an HTTP web interface and schema free JSON documents. It big focuses on scalability and designed to take data from any source, analyze it and search through it.

  • ¬†Communication with server is done through HTTP REST API
    • Curl -X <REST Verb> <NODE>:<PORT>/<Index>/<Type>/<ID>
    • curl -X GET http://localhost:9200/person/employee/123
  • Schema-less JSON documents (like NoSQL database)
  • Near real-time search
  • Developed by Elasticsearch BV

NEAR REALTIME(NRT)
  • ElasticSearch is a near realtime search engine
  • There is only a small latency from a document is indexed until it is searchable
  • The latency is usually one second
CLUSTER
  • A cluster is a collection of nodes (server)
  • Consists of one or more nodes, depending on the scale
    • can contain as many as nodes as you want
  • Together, these nodes contain all data
  • A cluster provides indexing and search capability across all nodes
  • Identified by a unique name (defaults to “elasticsearch”)
NODE
  • A single server that is part of a cluster
  • Stores searchable data
    • Stores all data if there is only one node in the cluster, or part of the data if there are multiple nodes
  • Participates in a cluster’s indexing and search capabilities
  • Identified by a name (defaults to a random Marvel character)
  • A node joins a cluster named “elasticsearch” by default
  • Starting a single node on a network will by default create a new single-node cluster named “elasticsearch”
INDEX
  • A collection of documents (eg. product, account, movie)
    • Each of the above examples would be a type
  • Corresponds to a database within a relational database system
  • Identified by a name, which must be lowercased
    • Used when indexing, searching, updating and deleting documents within the index
  • You can define as many indexes as you want with in a cluster
TYPE
  • Represents a class/category of similar documents, e.g. “user”
  • Consists of a name and a mapping
  • Simplified, you can think of a type as a table within a relational database
  • An index can have one or more types defined, each with their own mapping
  • Stored within a metadata field names _type because Lucene has no concept of document types
    • Searching for specific documents types applies a filter on this field
MAPPING
  • Similar to a database schema for a table in a relational database
  • Describe the fields that a document of a given type may have
    • Includes the datatype for each field, e.g. string, integer, date..
    • Also includes information on how fields should be indexed and stored by Lucene
  • Dynamic mapping means that it is optional to define a mapping explicitly
DOCUMENT
  • A basic unit of information that can be indexed
  • Consists of fields, which are key/value pairs
    • A value can be a string, date, object, etc
  • Corresponds to an object in an object oriented programming language
    • A document can be a single user, order, product, etc
  • Documents are expressed in JSON
  • You can store as many documents within an index as you want
SHARDS
  • An index can be divided into multiple pieces called shards
    • Useful if an index contains more data than the hardware of a node can store (e.g. 1 TB data on 500 GB disk)
  • A shards is a fully functional and independent index
    • Can be stored on any node in a cluster
  • The number of shards can be specified when creating an index
  • Allows to scale horizontally by content volume (index space)
  • Allows to distribute and parallelize operations across shards, which increases performance
 
REPLICAS
  • A replica is a copy of shard
  • Provides high availability in case a shard or node fails
    • A replica never resides on the same node as the original shard
  • Allows scaling search volume, because search queries can be executed on all replicas in parallel
  • By default, Elasticsearch adds 5 primary shards and 1 replica for each index
WHY USE ELASTIC SEARCH
  • Replace documents stores like Mongodb, RavenDB
  • Blazingly fast search performance
  • Highly scalable
  • Denormalized data store
happy wheels

Related Posts