Ausnahme gefangen: SSL certificate problem: certificate is not yet valid 📌 Migrating Elasticsearch’s warm & cold data to object storage with JuiceFS

🏠 Team IT Security News

TSecurity.de ist eine Online-Plattform, die sich auf die Bereitstellung von Informationen,alle 15 Minuten neuste Nachrichten, Bildungsressourcen und Dienstleistungen rund um das Thema IT-Sicherheit spezialisiert hat.
Ob es sich um aktuelle Nachrichten, Fachartikel, Blogbeiträge, Webinare, Tutorials, oder Tipps & Tricks handelt, TSecurity.de bietet seinen Nutzern einen umfassenden Überblick über die wichtigsten Aspekte der IT-Sicherheit in einer sich ständig verändernden digitalen Welt.

16.12.2023 - TIP: Wer den Cookie Consent Banner akzeptiert, kann z.B. von Englisch nach Deutsch übersetzen, erst Englisch auswählen dann wieder Deutsch!

Google Android Playstore Download Button für Team IT Security



📚 Migrating Elasticsearch’s warm & cold data to object storage with JuiceFS


💡 Newskategorie: Programmierung
🔗 Quelle: dev.to

With the development of cloud computing, object storage has gained favor with its low price and elastic scalable space. More and more enterprises are migrating warm and cold data to object storage. However, migrating the index and analysis components directly to object storage will hinder query performance and cause compatibility issues.

This article will elaborate the fundamentals of hot and cold data tiering in Elasticsearch, and introduce how to use JuiceFS to cope with the problems that occur on object storage.

1 Elasticsearch’s data tier architecture

There are three concepts to be known before diving into how ES implements a hot and cold data tiering strategy: Data Stream, Index Lifecycle Management (ILM), and Node Role.

Data Stream

Data Stream is an important concept in ES, which has the following characteristics:

  • Streaming writes. Data Stream is a data set written in stream rather than in block.
  • Append-only writes. Data Stream updates data via append writes and does not require modifying existing data.
  • Timestamp. A timestamp is given to each new piece of data to record when it was created.
  • Multiple indexes: In ES, every piece of data resides in an Index. The data stream is a higher level concept, one data stream may compose of many indexes, which are generated according to different rules. However, only the latest index is writable, while the historical indexes are read-only.

Log data is a typical type of data steam. It is append-only and also has to be timestamped. The user will generate new indexes by different dimensions, such as day or others.

The scheme below is a simple example of index creation for a data stream. In the process of using the data stream, ES will write directly to the latest index. As more data is generated, this index will eventually become an old, read-only index.

Image description

The following graph illustrates writing data to the ES, including two phases.

  • Stage 1: the data is first written to the In-memory buffer.
  • Stage 2: The buffer will fall to the local disk according to certain rules and time, which is shown as green in the graph (persistent data), known as segment in ES.

There may be some time lag in this process, and the newly created segment cannot be searched if a query is triggered during the persistence process. Once the segment is persisted, it can be searched by the upper-level query engine immediately.

Image description

Index Lifecycle Management

Index Lifecycle Management (ILM), is the lifecycle management of an index, and ILM defines the lifecycle of an index as 5 phases.

  • Hot data: needs to be updated or queried frequently.
  • Warm data: no longer updated, but is still frequently queried.
  • Cold data: no longer updated and is queried less frequently.
  • Frozen data: no longer updated and hardly ever queried. It is safe to put this kind of data in a relatively low-speed and cheap storage medium.
  • Deleted data: no longer needed and can be safely deleted. All ES index data will eventually go through these stages, users can manage their data by setting different ILM policies.

Node Role

In ES, each deployment node will have a Node Role. Different roles, such as master, data, ingest, etc., will be assigned to each ES node. Users can combine the Node Roles, and the different lifecycle phases mentioned above, for data management.

The data node has different stages, and it can be a node that stores hot data, warm data, cold data, or even extremely cold data. The node will be assigned different roles based on its tasks, and different nodes are sometimes configured with different hardware depending on roles.

For example, hot data nodes need to be configured with high-performance CPUs or disks, for nodes with warm and cold data, as these data are queried less frequently, the requirement for hardware is not necessarily high for some computing resources.

Node roles are defined based on different stages of the data lifecycle.** It is important to note that each ES node can have multiple roles, and these roles do not need to have a one-to-one relationship**. Here's an example where node.roles is configured in the ES YAML file. You can also configure multiple roles for the node that it should have.

node.roles: ["data_hot", "data_content"] 

Lifecycle Policy

After understanding the concepts of Data Stream, Index Lifecycle Management, and Node Role, you can customize lifecycle policies for your data.

Based on the different dimensions of index characteristics defined in the policy, such as the size of the index, the number of documents in the index, and the time when the index was created, ES can automatically help users roll over data from one lifecycle stage to another, which is known as rollover in ES.

For example, the user can define features based on the size of the index and roll over the hot data to the warm data, or roll over the warm data to the cold data according to some other rules. ES can do the job automatically, while the lifecycle policy needs to be defined by the user.

The screenshot below shows Kibana's administration interface, which allows users to graphically configure lifecycle policies. You can see that there are three phases:hot data, warm data, and cold data.

Image description

Expanding the advanced settings, you can see more details about configuration policies based on different characteristics,which is listed on the right side of the screenshot below.

Image description

Maximum index size. Take an example of 50 GB in the above screenshot. It means that data will be rolled from the hot data phase to the warm data phase when the size of the index exceeds 50GB.

Maximum documents. The basic storage unit of ES index is document, and the user data is written to ES in the form of documents. Thus, the number of documents is a measurable indicator.

Maximum age. As an example of 30 days, i.e., an index has been created for 30 days, it will trigger the rollover of the hot data to the warm data phase as mentioned previously.

However, using Elasticsearch directly on object storage can cause poor write performance and compatibility and other issues. Thus, companies that also want to balance query performance are starting to look for solutions on the cloud. Under this context, JuiceFS is increasingly being used in data tiering architectures.

2 practice of ES + JuiceFS

Step 1: Prepare multiple types of nodes and assign different roles. Each ES node can be assigned different roles, such as storing hot data, warm data, cold data, etc. Users need to prepare different types of nodes to match the needs of different roles.

Step 2: Mount the JuiceFS file system. Generally users use JuiceFS for warm and cold data storage, users need to mount the JuiceFS file system locally on the ES warm data node or cold data node. The user can configure the mount point into ES through symbolic links or other means to make ES think that its data is stored in a local directory, but this directory is actually a JuiceFS file system.

Step 3: Create a lifecycle policy. This needs to be customized by each user, either through the ES API or through Kibana, which provides some relatively easy ways to create and manage lifecycle policies.

Step 4: Set a lifecycle policy for the index. After creating a lifecycle policy, you need to apply the policy to the index, that is, you need to set the policy you just created for the index. You can do this by using index templates, which can be created in Kibana, or explicitly configured through the API via index.lifycycle.name.

Here are a few tips.
Tip 1: The number of copies (replicas) of Warm or Cold nodes can be set to 1. All data is placed on JuiceFS, eventually uploaded to the underlying object storage, so the reliability of the data is high enough. Accordingly, the number of copies can be reduced on the ES side to save storage space.

Image description

Tip 2: Turning on Force merge may cause constant CPU usage on nodes, so turn it off if appropriate. When moving from hot data to warm data, ES will merge all the underlying segments corresponding to the hot data index. If Force merge is enabled, ES will first merge these segments and then store them in the underlying system of warm data. However, merging segments is a very CPU-consuming process. If the data node of warm data also needs to carry some query requests, you can turn off this function as appropriate, that is, keep the data intact and write it to the underlying storage directly.

Tip 3: The index of Warm or Cold phase can be set to read-only. When indexing warm and cold data phases, we can basically assume that the data is read-only and the indexes will not be modified. Setting the index to read-only can reduce some resource usage on the warm and cold data nodes, you can then scale down these nodes and save some hardware resources.

From Juicedata/JuiceFS ! (0ᴗ0✿)

...



📌 Migrating Elasticsearch’s warm & cold data to object storage with JuiceFS


📈 109.06 Punkte

📌 juicedata/juicefs: JuiceFS is an open-source POSIX file system built on top of Redis and object storage in Golang


📈 71.87 Punkte

📌 Li Auto’s practice of migrating data from HDFS to JuiceFS


📈 46.96 Punkte

📌 Migrating to SQL: Discover and Assess SQL Server Data Estate Migrating to Azure SQL (Ep.2) | Data Exposed


📈 38.97 Punkte

📌 A Comprehensive Comparison of JuiceFS and HDFS for Cloud-Based Big Data Storage


📈 38.44 Punkte

📌 Migrating to SQL: Discover and Assess SQL Server Data Estate Migrating to Azure SQL (Ep. 2)


📈 35.62 Punkte

📌 Storage Evolution of Unisound’s HPC Platform with JuiceFS


📈 35.09 Punkte

📌 South Korea's No.1 Search Engine Chose JuiceFS over Alluxio for AI Storage


📈 35.09 Punkte

📌 Object Storage: Object-Storage-Protokoll könnte Posix ablösen


📈 33.84 Punkte

📌 Migrating FileNet Into AWS Cloud and Migrating FileNet Security Into AWS Cloud


📈 32.26 Punkte

📌 Optane als Datenspeicher-Strategie: Cold, Warm und Hot Data optimal bereitstellen


📈 30.7 Punkte

📌 Optane als Datenspeicher-Strategie: Cold, Warm und Hot Data optimal bereitstellen


📈 30.7 Punkte

📌 t3n Daily: Adobe & Figma, Ethereum & NFT, Steuer & Homeoffice, KI & Gruselfrau


📈 28.29 Punkte

📌 CVE-2022-44108 | pdftojson 94204bb Object.cc Object::copy(Object*) stack-based overflow


📈 27.91 Punkte

📌 SeaweedFS vs JuiceFS


📈 27.48 Punkte

📌 GlusterFS vs. JuiceFS


📈 27.48 Punkte

📌 How to implement a distributed /etc directory using etcd and JuiceFS


📈 27.48 Punkte

📌 Comparative Analysis of Major Distributed File System Architectures: GFS vs. Tectonic vs. JuiceFS


📈 27.48 Punkte

📌 CVE-2022-3548 | SourceCodester Simple Cold Storage Management System 1.0 Add New Storage Name cross site scripting


📈 26.64 Punkte

📌 Object Storage: Das Schweizer Taschenmesser des Storage


📈 24.53 Punkte

📌 Understanding Block Storage and Object Storage


📈 24.53 Punkte

📌 Migrating from App Engine Blobstore to Cloud Storage (Module 16)


📈 23.75 Punkte

📌 Migrating Hadoop to the Cloud: 2X Storage Capacity and Fewer Ops Costs


📈 23.75 Punkte

📌 SQL Server 2022: Data Virtualization and Object Storage [Ep. 5] | Data Exposed


📈 23.62 Punkte

📌 CVE-2022-34807 | Elasticsearch Query Plugin up to 1.2 on Jenkins Configuration File credentials storage


📈 23.09 Punkte

📌 How to Assess Readiness of SQL Server Data Estate Migrating to Azure SQL | Data Exposed


📈 22.84 Punkte

📌 Ask the Expert: Smart Data Pipelines to Azure: Ingesting and Migrating Data the DataOps Way


📈 22.84 Punkte

📌 Ask the Expert: Smart Data Pipelines to Azure: Ingesting and migrating data the DataOps way | Ask the Expert


📈 22.84 Punkte

📌 Alien: FoxNext Cold und Cold Iron Studios arbeiten an einem Shooter im Film ...


📈 22.83 Punkte

📌 Call of Duty: Black Ops Cold War - Die baldige Integration von Black Ops Cold War (zusätzlich zu Modern Warfare)


📈 22.83 Punkte

📌 Extracting insights from IoT data using the warm path data flow


📈 22.63 Punkte

📌 Cold storage giant Americold discloses data breach after April malware attack


📈 22.38 Punkte

📌 A Revolutionary Cold Storage Solution for the Data Era


📈 22.38 Punkte











matomo