Lädt...

🔧 Study Notes dlt Fundamentals Course: Lesson 3 & 4 - Pagination, Authentication, dlt Configuration, Sources & Destinations


Nachrichtenbereich: 🔧 Programmierung
🔗 Quelle: dev.to

Lesson 3 Pagination & Authentication & dlt Configuration

Introduction to Pagination

  • Pagination is a technique used to retrieve data in pages, especially when an endpoint limits the amount of data that can be fetched at once.
  • The GitHub API returns data in pages, and pagination allows us to retrieve all the data.

GitHub API Pagination

  • The GitHub API provides the per_page and page query parameters to control pagination.
  • The Link header in the response contains URLs for fetching additional pages of data.

Implementing Pagination with dlt's RESTClient

  • dlt's RESTClient can handle pagination seamlessly when working with REST APIs like GitHub.
  • The RESTClient is part of dlt's helpers, which makes it easier to interact with REST APIs by managing repetitive tasks.

Authentication with GitHub API

  • Authentication is required to avoid rate limit errors when fetching data from the GitHub API.
  • To authenticate, create an environment variable for your access token or use dlt's secrets configuration.

dlt Configuration and Secrets

  • Configurations are non-sensitive settings that define the behavior of a data pipeline.
  • Secrets are sensitive data like passwords, API keys, and private keys, which should be kept secure.
  • dlt automatically extracts configuration settings and secrets based on flexible naming conventions.

Exercise 1: Pagination with RESTClient

  • Use dlt's RESTClient to fetch paginated data from the GitHub API.
  • The full list of available paginators can be found in the official dlt documentation.

Exercise 2: Run pipeline with dlt.secrets.value

  • Use the sql_client to query the stargazers table and find the user with id 17202864.
  • Use environment variables to set the ACCESS_TOKEN variable.

Key Takeaways

  • Pagination is essential when working with APIs that return data in pages.
  • dlt's RESTClient can handle pagination seamlessly and manage repetitive tasks.
  • Authentication is required to avoid rate limit errors when fetching data from the GitHub API.
  • dlt configuration and secrets are essential for setting up data pipelines securely.

Further Reading

Lesson 4 Using Pre-built Sources and Destinations

Pre-built Sources

Overview

Pre-built sources are the simplest way to get started with building your stack. They are fully customizable and come with a set of pre-defined configurations.

Types of Pre-built Sources

  • Existing Verified Sources: Use an existing verified source by running the dlt init command.
  • SQL Databases: Load data from SQL databases (PostgreSQL, MySQL, SQLight, Oracle, IBM DB2, etc.) into a destination.
  • Filesystem: Load data from the filesystem, including CSV, Parquet, and JSONL files.
  • REST API: Load data from a REST API using a declarative configuration.

Steps to Use Pre-built Sources

  1. Install dlt: Install dlt using the dlt init command.
  2. List all verified sources: Use the dlt init command to list all available verified sources and their short descriptions.
  3. Initialize the source: Initialize the source using the dlt init command.
  4. Add credentials: Add credentials using environment variables or other methods.
  5. Run the pipeline: Run the pipeline to load data into the destination.

Pre-built Destinations

Overview

Pre-built destinations are used to load data into a specific location. They are customizable and come with a set of pre-defined configurations.

Types of Pre-built Destinations

  • Filesystem destination: Load data into files stored locally or in cloud storage solutions.
  • Delta tables: Write Delta tables using the deltalake library.
  • Iceberg tables: Write Iceberg tables using the pyiceberg library.

Steps to Use Pre-built Destinations

  1. Choose a destination: Choose a destination based on your needs.
  2. Modify the destination parameter: Modify the destination parameter in your pipeline configuration.
  3. Run the pipeline: Run the pipeline to load data into the destination.

Example Use Cases

  • Loading data from a SQL database: Use the sql_database source to load data from a SQL database into a destination.
  • Loading data from a REST API: Use the rest_api source to load data from a REST API into a destination.
  • Loading data from the filesystem: Use the filesystem source to load data from the filesystem into a destination.

Exercise

  • Run the rest_api source: Run the rest_api source to load data from a REST API into a destination.
  • Run the sql_database source: Run the sql_database source to load data from a SQL database into a destination.
  • Run the filesystem source: Run the filesystem source to load data from the filesystem into a destination.

Next Steps

  • Proceed to the next lesson: Proceed to the next lesson to learn more about custom sources and destinations.
  • Explore the dlt documentation: Explore the dlt documentation to learn more about pre-built sources and destinations.
...

🔧 Study Notes dlt Fundamentals Course: Lesson 7 Inspecting & Adjusting Schema


📈 77.55 Punkte
🔧 Programmierung

🔧 Study Notes dlt Fundamentals Course: Lesson 8 Understanding Pipeline Metadata and State


📈 76.16 Punkte
🔧 Programmierung

🕵️ CVE-2022-39837 | COVESA DLT Daemon 2.18.8 DLT File null pointer dereference


📈 40.03 Punkte
🕵️ Sicherheitslücken

🕵️ CVE-2022-39836 | COVESA DLT Daemon 2.18.8 DLT File Parser heap-based overflow


📈 40.03 Punkte
🕵️ Sicherheitslücken

🔧 Study Notes dlt Workshop: API, Warehouses, Data Lakes


📈 37.57 Punkte
🔧 Programmierung

🔧 Prometheus Fundamentals (Lesson-01)


📈 28.57 Punkte
🔧 Programmierung

🔧 AWS Certified Cloud Practitioner Study Course – Pass the Exam With This Free 14-Hour Course


📈 28.37 Punkte
🔧 Programmierung

🔧 Study Notes 6.5-6: Kafka Producer, Consumer & Configuration


📈 27.24 Punkte
🔧 Programmierung

🎥 Full Course (10 Lesson) AI Agents for Beginners


📈 25.87 Punkte
🎥 Video | Youtube

🔧 Building a Dynamic Text Grid with Flexbox: A lesson from Wes Bos’ Course


📈 25.87 Punkte
🔧 Programmierung

🔧 LESSON 6: CREATION OF ALL LAYERS OF COURSE MANAGEMENT MICROSERVICE USING WRAPT


📈 25.87 Punkte
🔧 Programmierung

🕵️ Lesson 174: ARM-64 Course (Part 17 - Hacking Float Primitive Datatype)


📈 25.87 Punkte
🕵️ Reverse Engineering

🕵️ Lesson 169: ARM-64 Course (Part 12 - Boolean Primitive Datatype)


📈 25.87 Punkte
🕵️ Reverse Engineering

📰 Destinations: Reiseplanung mit Google


📈 24.84 Punkte
📰 IT Nachrichten

🔧 Introducing The Event Destinations Initiative


📈 24.84 Punkte
🔧 Programmierung

📰 Uber App Update Adds Snapchat Integration, Ability to Use People as Destinations


📈 24.84 Punkte
📰 IT Security Nachrichten

🔧 API Destinations with Amazon EventBridge


📈 24.84 Punkte
🔧 Programmierung

📰 Destinations: Reiseplanung mit Google


📈 24.84 Punkte
📰 IT Nachrichten

📰 Low-Code Data Connectors and Destinations


📈 24.84 Punkte
🔧 AI Nachrichten

🔧 Discover the Hidden Gems: Top Northern Destinations in Pakistan for 2024


📈 24.84 Punkte
🔧 Programmierung

📰 The Top 10 Riskiest Online Destinations Revealed


📈 24.84 Punkte
📰 IT Security Nachrichten

🎥 How Universal Destinations & Experiences build next generation experiences with #Flutter


📈 24.84 Punkte
🎥 Video | Youtube

📰 Top five Romantic Honeymoon vacation Destinations


📈 24.84 Punkte
🤖 Android Tipps

🔧 Learn serverless on AWS step-by-step: Lambda Destinations


📈 24.84 Punkte
🔧 Programmierung

🔧 Safari Destinations


📈 24.84 Punkte
🔧 Programmierung

🎥 HPR3650: Major Destinations


📈 24.84 Punkte
🎥 Podcasts

🐧 ALL PinePhones sent to New Zealand instead of their actual destinations.


📈 24.84 Punkte
🐧 Linux Tipps

📰 Uber May Stop Letting Drivers See Destinations and Name Prices


📈 24.84 Punkte
📰 IT Security Nachrichten

🎥 Navigation: Dialog destinations - MAD SKills


📈 24.84 Punkte
🎥 Video | Youtube

🔧 API Destinations for Private Endpoints: From Oversight to Insight!


📈 24.84 Punkte
🔧 Programmierung