Ausnahme gefangen: SSL certificate problem: certificate is not yet valid 📌 Maintaining the Quality of Your Feature Store

🏠 Team IT Security News

TSecurity.de ist eine Online-Plattform, die sich auf die Bereitstellung von Informationen,alle 15 Minuten neuste Nachrichten, Bildungsressourcen und Dienstleistungen rund um das Thema IT-Sicherheit spezialisiert hat.
Ob es sich um aktuelle Nachrichten, Fachartikel, Blogbeiträge, Webinare, Tutorials, oder Tipps & Tricks handelt, TSecurity.de bietet seinen Nutzern einen umfassenden Überblick über die wichtigsten Aspekte der IT-Sicherheit in einer sich ständig verändernden digitalen Welt.

16.12.2023 - TIP: Wer den Cookie Consent Banner akzeptiert, kann z.B. von Englisch nach Deutsch übersetzen, erst Englisch auswählen dann wieder Deutsch!

Google Android Playstore Download Button für Team IT Security



📚 Maintaining the Quality of Your Feature Store


💡 Newskategorie: AI Nachrichten
🔗 Quelle: towardsdatascience.com

Image by author

The fundamentals of feature stores and a few tips on how and why you should monitor them

Since Uber first introduced the concept in 2017, the feature store has been steadily gaining popularity as a tool to support data scientists and machine learning engineers with the ability to define, discover, and access high-quality data for their machine learning projects.

Diagram by author

From Feature Engineering To a Feature Store

In machine learning projects, raw data is collected, cleaned, formatted and mathematically transformed into data called a “feature.” Features are required in many different phases of the model lifecycle including experimentation, model training, and model serving to get predictions from the model deployed in production pipelines.

Once the features are calculated, we can begin to develop our model by experimenting with different modeling techniques and feature sets.

When models are trained, they automatically discover patterns in the feature data, encode these patterns mathematically, and then use this information to make informed predictions.

When the model is finalized, it is deployed into production where it consumes feature data to both retrain and produce predictions. Model predictions are often either produced in batch, or in some cases in real time.

Feature Store Fundamentals

Feature stores can be thought of as a central store of precomputed features. This data store serves features for every step in a machine learning project.

Diagram by author
Diagram by author

Organizations utilize feature stores to streamline a few things across the data and ML lifecycle.

Centralize data

  • Feature stores provide a one-stop-shop for data that has already been collected from different sources and stored in a central location.
  • Without a feature store, the raw data for a ML project often needs to be collected from multiple different data sources across the company, or even from a vendor or third party. This means data scientists have to identify and access multiple data sources.

Clean data

  • Personally identifiable information (PII) or sensitive data that is not required for the ML workflow can be removed before storing it in the feature store.
  • Without a feature store, data scientists would have to access sensitive data or create their own method to remove the sensitive data on their own when it is not required for their use case.

Share features across models

  • The same features can often be used in multiple models across different use cases. The feature store calculates these features once and makes them available for all ML projects. data scientists can add to this feature bank over time to build up a store of features for other teams to reuse.
  • Without a feature store, many of the same feature computations will be rewritten into indifferent models and pipelines. This forces data scientists to perform wasteful rework to recalculate features that may already exist in similar pipelines, and it makes it difficult to maintain consistency of the feature calculations.

Provide a common interface to the features

  • Part of the feature store is a standardized inference to the data itself and shared feature encoders for ML pipelines to help enforce consistent results across online and offline applications.
  • Without a feature store, different code or transformations could result in slightly unexpected or erroneous results in the model. This often manifests in online and offline feature skew.

Reduce feature latency

  • An online feature store provides precomputed features to support serving real-time predictions with with low latency
  • Without a feature store, features will need to be calculated at the time of the inference request, resulting in additional calculations needed at the time of the request — ultimately impacting application latency.

Navigate feature versioning

  • Feature stores apply versioning to the data. Time traveling data snapshotting allows for point-in-time analysis for backtesting a model, or root causing a data bug.
  • Without a feature store, it can be difficult to trace back the exact state or value of a feature for a given point in time. This can make debugging and experimentation challenging if not impossible.

Types of Feature Stores

Generally, feature stores are either offered as standalone, third-party tools or as part of broader cloud offerings. Most Arize customers with feature stores in place use a purpose-built tool like Tecton, however teams wanting to build on top of open-source solutions have several options (i.e. Feast, Feathr). Additionally, cloud offerings are also available as easy add-ons to existing stacks.

Examples of third-party tools:

  • Feast (Open Source)
  • Feathr (Open Source)
  • Tecton
  • Hopsworks

Examples of Cloud Tools

  • Databricks Feature Store
  • SageMaker Feature Store

Maintaining the Quality of Your Feature Store

Feature stores can fail silently. When a model breaks or produces poor results, the root cause is often traced back to the data itself. Many common machine learning issues can be solved by applying the right monitoring and quality checks to the data.

If that data is centralized in a single feature store, it can be easier to maintain it. By applying data quality monitoring to the feature store, practitioners can automatically catch data issues before they impact model performance.

There are several common data issues where monitoring can make a big difference (full disclosure: I work for Arize, which offers monitoring tools, but these best practices draw from my real-world experience and apply equally to a monitoring platform built in-house or elsewhere).

Data quality monitors can catch issues such as missing values, change in data format or unexpected values (change in data cardinality). An ML observability platform can be used to automatically detect and alert on these types of data quality issues, which are common with feature data.

This example shows a triggered data quality monitor for the %-empty metric on a model feature (image by author)

Data drift monitors can catch statistical distribution shifts due to natural changes over time. Drift can be measured using metrics such as PSI, KL divergence, and more. An ML observability platform can be used to automatically detect and alert on the kind of statistical drift that is common in feature data.

This example shows prediction drift between production and training data distributions (image by author)

Additionally, training-serving skew can also be identified by troubleshooting for data consistency between offline and online feature calculations and code.

Conclusion

As feature stores begin to enter the mainstream, teams are creating best practices to enhance their integration into ML workflows. One key area of focus is managing upstream data quality issues. By implementing data quality monitoring and data drift detection, teams can efficiently maintain their feature stores while proactively straying ahead of model performance degradation.


Maintaining the Quality of Your Feature Store was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

...



📌 Maintaining the Quality of Your Feature Store


📈 43.42 Punkte

📌 Is anyone interested in maintaining xplr in snap store?


📈 23.32 Punkte

📌 How to Use WhatsApp While Maintaining Your Privacy


📈 21.98 Punkte

📌 Maintaining bzip2 (needs your help, too!)


📈 21.98 Punkte

📌 Maintaining Your Cyber Hygiene With RAV Antivirus


📈 21.98 Punkte

📌 Mastering GDPR Article 30 Compliance: Conducting, Maintaining and Reporting on your Data Inventory


📈 21.98 Punkte

📌 What are some tips for checking and maintaining overall health of your Linux system?


📈 21.98 Punkte

📌 7 Tips for Building and Maintaining an SRE Team in Your Company


📈 21.98 Punkte

📌 World Quality Report 2022-23: 72% of organizations think Quality Engineering can ...


📈 21.5 Punkte

📌 World Quality Report 2022: Quality Engineering unterstützt nachhaltige IT


📈 21.5 Punkte

📌 Why Is Data Quality Always an Afterthought? Strategies to Master Data Quality Management


📈 21.5 Punkte

📌 Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing


📈 21.5 Punkte

📌 Our weekly API report: Validate Linkedin Inmail, Air Quality and Hourly Air Quality


📈 21.5 Punkte

📌 Quality Assurance VS Quality Control


📈 21.5 Punkte

📌 Meet Feast (Feature Store): An Open-Source Feature Store for Machine Learning


📈 21.39 Punkte

📌 DARKJOKER.cc - YOUR DRUGS STORE - CHEAP - SAFE - BEST QUALITY!


📈 19.09 Punkte

📌 The Trump Security Platform: Maintaining the GOP Status Quo


📈 18.48 Punkte

📌 Pros and Cons of Building and Maintaining In-House Pen Testing Capability


📈 18.48 Punkte

📌 Maintaining a healthy community


📈 18.48 Punkte

📌 The Trump Security Platform: Maintaining the GOP Status Quo


📈 18.48 Punkte

📌 Pros and Cons of Building and Maintaining In-House Pen Testing Capability


📈 18.48 Punkte

📌 Maintaining a healthy community


📈 18.48 Punkte

📌 Maintaining CIA to Keep Health Care Security Threats at Bay


📈 18.48 Punkte

📌 CSS as a Service: Maintaining Style


📈 18.48 Punkte

📌 Can anybody help by reviving and maintaining the KRA ORA thumbnailers for nautilus?


📈 18.48 Punkte

📌 Calibre won't migrate to Python 3, author says: "I am perfectly capable of maintaining python 2 myself"


📈 18.48 Punkte

📌 Vimscript Solution for Maintaining Comment Boxes


📈 18.48 Punkte

📌 Maintaining Integrity and Availability of Data Through Open Source Software


📈 18.48 Punkte

📌 Triton Hackers Focus on Maintaining Access to Compromised Systems: FireEye


📈 18.48 Punkte

📌 Maintaining a journal on Ubuntu


📈 18.48 Punkte

📌 [$] Maintaining the kernel's web of trust


📈 18.48 Punkte

📌 Joy - Maintaining Passion for Programming


📈 18.48 Punkte











matomo