Cookie Consent by Free Privacy Policy Generator 📌 How to bring custom ML Models into OpenMetadata

🏠 Team IT Security News

TSecurity.de ist eine Online-Plattform, die sich auf die Bereitstellung von Informationen,alle 15 Minuten neuste Nachrichten, Bildungsressourcen und Dienstleistungen rund um das Thema IT-Sicherheit spezialisiert hat.
Ob es sich um aktuelle Nachrichten, Fachartikel, Blogbeiträge, Webinare, Tutorials, oder Tipps & Tricks handelt, TSecurity.de bietet seinen Nutzern einen umfassenden Überblick über die wichtigsten Aspekte der IT-Sicherheit in einer sich ständig verändernden digitalen Welt.

16.12.2023 - TIP: Wer den Cookie Consent Banner akzeptiert, kann z.B. von Englisch nach Deutsch übersetzen, erst Englisch auswählen dann wieder Deutsch!

Google Android Playstore Download Button für Team IT Security



📚 How to bring custom ML Models into OpenMetadata


💡 Newskategorie: AI Nachrichten
🔗 Quelle: towardsdatascience.com

Build custom CICD pipelines to put your ML assets on the map

OpenMetadata is more than a data catalog. Built on standard definitions and APIs, the catalog is just one of many applications exploiting the metadata of your platform. Since the beginning, the goal of OpenMetadata has been to solve the metadata problem in the industry. Not having to figure out essential components such as metadata ingestion or how to bring back collaboration into data, teams can focus on improving their processes and automations.

This post aims to showcase how we can integrate multiple metadata sources, both from existing services and in-house solutions. With every action being powered by APIs, there is no difference between metadata coming from featured connectors such as Postgres or being sent via the Python SDK. This high degree of flexibility allows us to explore the metadata from custom-built ML Models and the tables feeding their features.

Postgres and Custom ML Model ingestion schema. Image by the author.

If this sounds interesting, follow the steps with the material in this repository.

OpenMetadata and ML

One of the main challenges of ML models’ lifecycle is closing the gap between the ML model and the Data Platform. We have tools that help us train, test, tune and deploy ML, but those tools rarely put ML Models in the context of the platform they live in.

How all the pieces fit together is information that is usually held by Data Scientists or ML Engineers but hardly ever shared. Typical causes are:

  1. No generic approach to how to define and maintain the metadata.
  2. No central place to publish the results for users to explore.
  3. This lack of clarity and the work involved in the previous two tasks makes it hard to justify the benefits and measure the impact.

In this demo, we’ll follow a use case where:

  1. We have an ML model using features from Postgres,
  2. The model is regularly updated and deployed,
  3. The documentation of the model is hosted as code,
  4. We’ll use OpenMetadata’s Python SDK to create the ML Model assets and push them to OpenMetadata.

Getting our models in OpenMetadata helps us share the documentation, keep track of metadata changes and versioning, discover lineage with the sources, drive discussions and collaboration… A few quick wins from bringing this holistic view on ML and AI assets are:

  • Teams can quickly start to collaborate instead of trying to reach similar outcomes in different ways,
  • Knowledge gathering of the most used features to start building a Feature Store with the highest possible value for the whole organization.
  • ML teams can start building Data Quality tests and alerts directly in OpenMetadata to prevent feature drifts and performance decreases.

Ingesting Postgres metadata

The first step will be ingesting Postgres metadata, as there, we have the sources for the ML features. You can follow these steps to configure and deploy the Postgres metadata ingestion.

Creating the Postgres service in OpenMetadata. Image by the author.

The OpenMetadata UI will guide us through the two main steps:

  1. Creating the Database Service: A service represents the source system we want to ingest. Here is where we will define the connection to Postgres, and this service will hold the assets that will be sent to OpenMetadata: databases, schemas, and tables.
  2. Creating and deploying the Ingestion Pipelines: which are internally handled by OpenMetadata using the Ingestion Framework, a Python library holding the logic to connect to multiple sources, translate their original metadata into the OpenMetadata standard, and send it to the server using the APIs.
Managing the ingestion pipeline in OpenMetadata. Image by the author.

What’s interesting here is that the Ingestion Framework package can be directly used to configure and host the ingestion processes. Moreover, any operation in the UI or in the Ingestion Framework is entirely open and supported by the server APIs. This means full automation possibilities for any metadata-related activity, which can be achieved directly via REST or the OpenMetadata SDKs.

These are the capabilities we will exploit next when creating the CICD process.

Building a CICD pipeline

In the discussion above, we highlighted two pains that usually become blockers to maintaining updated ML models’ metadata: No generic metadata definition and no single place to publish it. Thankfully, OpenMetadata takes care of both of these aspects.

The missing piece for building a successful process? It should be simple to maintain and evolve. That’s why we base our example on a YAML file checked out in the code repository. Therefore, Data Scientists and ML Engineers can rely on their deployment pipelines to update as well the metadata of their fresh production model.

Example YAML file with the ML model metadata. Image by the author.

The CICD process will then have a specific step that will:

  1. Read the YAML file with the metadata,
  2. Translate the structure of the YAML to the ML Model Entity definition from the OpenMetadata standard,
  3. Push the ML Model asset into OpenMetadata using the Python SDK.

Here you will find an example of such a pipeline. Hopefully, that will help you start putting your ML assets on the map!

Example CICD pipelines. GIF by the author.

After the script has finished running, we’ll see our Revenue Predictions model in OpenMetadata as:

Revenue Predictions model in OpenMetadata. Image by the author.

One key benefit of having the metadata available in the platform is being able to see the lineage information between our models and the sources containing the features. In our example, we already ingested the Postgres metadata. Then, if we check the Lineage tab, we’ll be able to see all of our models’ dependencies.

Revenue Predictions lineage with Postgres tables. Image by the author.

Summary

In this post, we have:

  • Discussed the industry needs for a common approach to define, ingest and exploit metadata and how OpenMetadata covers those.
  • Ingested Postgres metadata directly from the ≈ UI.
  • Built a CICD process that pushes custom-built ML Models metadata during the release process.

Putting ML Models into the context of the Data Platform has essential benefits, such as exploring dependencies and fueling collaboration. If you need a simple approach to putting your ML assets on the map, OpenMetadata has you covered.


How to bring custom ML Models into OpenMetadata was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

...



📌 How to bring custom ML Models into OpenMetadata


📈 73.66 Punkte

📌 AWS Bedrock Beyond the Base Models, Considering Custom Models.


📈 27.07 Punkte

📌 A Deep Dive into the Safety Implications of Custom Fine-Tuning Large Language Models


📈 24.27 Punkte

📌 The year ahead in DevOps and agile: bring on the automation, bring on the business involvement


📈 21.04 Punkte

📌 You bring the can-do attitude, we’ll bring the formulas. 🤝 #Shorts


📈 21.04 Punkte

📌 Apple to Bring Dynamic Island to All iPhone 15 Models


📈 19.59 Punkte

📌 Microsoft partners with Inworld to bring generative AI models for game development


📈 19.59 Punkte

📌 Create Custom Extension To Bring Back ‘www’ in Google Chrome Browser


📈 19.44 Punkte

📌 Razer's new keycaps and wrist rests bring custom flair to your keyboard


📈 19.44 Punkte

📌 Red Teaming Language Models with Language Models


📈 18.14 Punkte

📌 All iPhone 15 models to get Dynamic Island, USB-C; Pro models to feature Titanium frame


📈 18.14 Punkte

📌 Language models can explain neurons in language models


📈 18.14 Punkte

📌 Red Teaming Language Models with Language Models


📈 18.14 Punkte

📌 Large Language Models, GPT-2 — Language Models are Unsupervised Multitask Learners


📈 18.14 Punkte

📌 Discover pre-trained models with Kaggle Models


📈 18.14 Punkte

📌 Large Language Models, GPT-3: Language Models are Few-Shot Learners


📈 18.14 Punkte

📌 GPT-4 + Stable-Diffusion = ?: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models


📈 18.14 Punkte

📌 What is Orca 2? Microsoft’s latest drop could outperform smaller models and rivals larger models


📈 18.14 Punkte

📌 Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models


📈 18.14 Punkte

📌 New ways to train custom language models – effortlessly!


📈 18 Punkte

📌 New tools for finding, training, and using custom machine learning models on Android


📈 18 Punkte

📌 Damage assessment using Amazon SageMaker geospatial capabilities and custom SageMaker models


📈 18 Punkte

📌 ­­How CCC Intelligent Solutions created a custom approach for hosting complex AI models using Amazon SageMaker


📈 18 Punkte

📌 Simplify continuous learning of Amazon Comprehend custom models using Comprehend flywheel


📈 18 Punkte











matomo