Lädt...

🔧 How to be Test Driven with Spark: 2 - CI


Nachrichtenbereich: 🔧 Programmierung
🔗 Quelle: dev.to

This goal of this tutorial is to provide a way to easily be test driven with spark on your local setup without using cloud resources.

This is a series of tutorials and the initial chapters can be found in:

Chapter 2: Continuous Integration (ci)

Having a ci is mandatory for any project that aims at having multiple contributors. In the following chapter, a proposal ci will be implemented.

As ci implementation is specific to a collaborative platform being Github, Gitlab, Bitbucket, Azure Devops etc. The following chapter will try to provide a technology agnostic ci as much as possible.

Similar concepts are available in all ci, you will have to transpose the concepts that will be used here.

Content of the ci

The ci here will be very minimal but showcases concepts that you implemented in Chapter 1, namely:

  • Python setup
  • Project setup
  • Code Formatting
  • Test automation

There are many more addition to the continuous integration that will not be tackled here. A minimal ci is required to guarantee non regressions in terms of:

  • code styling rules to guarantee no indivual contributors diverge from the coding style
  • tests, namely all tests must be passing

Implementation

Github provides extensive documentation for you to tweak your ci.

Github is expecting ci files to be provided at a specific location, you can therefore create a file in .github/workflows/ci.yaml.

In this file, you can add

name: Continuous Integration
run-name: Continuous Integration
on: [push]
jobs:
  Continuous-Integration:
    runs-on: ubuntu-latest
  • The name and run-name define the names of the pipeline that will run.
  • The on defines the event that will trigger the pipeline to run, push means that for every commit the pipeline will run.
  • The jobs defines a list of jobs, the ci is made of one job with multiple steps for the sake of simplicity.
  • The runs-on defined the docker image used to run (the runner) the environment against, it's a list of docker images maintained by Github.

Now into the steps section we can add:

steps:
  - name: Check out repository code
    uses: actions/checkout@v4
  - uses: jdx/mise-action@v2
  - name: Run Formatting
    run: |
      uv run ruff check
  - name: Run Tests
    run: |
      uv run pytest
  • The actions/checkout@v4 is the Github action that checkout the current branch of the repository.
  • The jdx/mise-action@v2 is the Github action that will read the mise.toml and install everything for us.
  • The Run Formatting step will install the dependencies and run the formatting. It there is an error, the command will fail and the pipeline too.
  • The Run Tests step will run the tests. It there is an error, the command will fail and the pipeline too.

Ci as documentation

As it was stated, the ci is the only source of truth. If it passes on ci, it should pass on your local setup. If not, it means there are discrepancies between the ci setup and yours.

Going through the ci implementation will help you on reproducibility. Maybe you're not using the same way to install python version, or the same dependency management tool. You need to align your tools and the ones presented in chapter 1 help not to conflict with your local setup. You might have installed python package globally or you might have manually changed PYTHON_HOME or your PATH and this can easily be a mess.

To help on reproducibility, a dev container approach can be used. It means, the ci will run inside a container and this container can be reused as a developer environment. This will not be implemented for the moment.

A better ci structure

To improve readability and segregates between code formatting and testing, Github actions can be implemented as job with interdependencies. Then, the workflow becomes:

name: Continuous Integration
run-name: Continuous Integration
on: [push]
jobs:
  Formatting:
    runs-on: ubuntu-latest
    steps:
      - name: Check out repository code
        uses: actions/checkout@v4
      - uses: jdx/mise-action@v2
      - name: Run Formatting
        run: |
          uv run ruff check
  Tests:
    runs-on: ubuntu-latest
    needs: [Formatting]
    steps:
      - name: Check out repository code
        uses: actions/checkout@v4
      - uses: jdx/mise-action@v2
      - name: Run Tests
        run: |
          uv run pytest

In here we added the needs: [Formatting] to create dependencies between ci job. It means, we will not run the tests until the code style is compliant; this will save some time and resources. Indeed, if the code is not formatted, don't even bother running the tests. The execution graph will be like:

Ci Execution graph

We can see here some duplication, which is not ideal as for future code improvements, you will have to do it at two places at the same time. This is technical debt that one would have to tackle using composite action. We will consider it's ok for now.

Caching dependency resolution

You will see additional steps in the ci.yaml, namely related to cache

    - name: Restore uv cache
        uses: actions/cache@v4
        with:
          path: /tmp/.uv-cache
          key: uv-${{ runner.os }}-${{ hashFiles('uv.lock') }}
          restore-keys: |
            uv-${{ runner.os }}-${{ hashFiles('uv.lock') }}
            uv-${{ runner.os }}

These steps aim at caching the .venv when there are no changes on the uv.lock and reusing it. The intent is to speed up the ci execution as dependency resolution and installation can be time consuming.

An extra step to minimize caching size is added as mise proposes such feature, namely an extra step and an environment variable is added to configure the location of the cache.

      - name: Minimize uv cache
        run: uv cache prune --ci
    env:
      UV_CACHE_DIR: /tmp/.uv-cache

What's next

On the next chapter, you will implement your first spark code and implement a way to guarantee test automation of it. This is long overdue as we spent 3 chapters on setup...

You can find the original materials in spark_tdd. This repository exposes what's the expected repository layout at the end of each chapter in each branch:

...

🔧 How to be Test Driven with Spark: Chapter 3 - First Spark test


📈 42.56 Punkte
🔧 Programmierung

🔧 How to be Test Driven with Spark: Chapter 5: Leverage spark in a container


📈 39.2 Punkte
🔧 Programmierung

🪟 Cisco präsentiert Spark 2.0 und Spark Whiteboard


📈 26.22 Punkte
🪟 Windows Tipps

🔧 Study Notes 5.5.1-2 Operations on Spark RDDs & Spark RDD mapPartition


📈 26.22 Punkte
🔧 Programmierung

🔧 Study Notes 5.4.1-3 Anatomy of a Spark Cluster GroupBy & Joins in Spark


📈 26.22 Punkte
🔧 Programmierung

🔧 Is Spark Still Relevant: Spark vs Dask vs RAPIDS


📈 26.22 Punkte
🔧 Programmierung

🪟 Cisco präsentiert Spark 2.0 und Spark Whiteboard


📈 26.22 Punkte
🪟 Windows Tipps

🔧 How to be Test Driven with Spark: Chapter 4 - Leaning into Property Based Testing


📈 26.09 Punkte
🔧 Programmierung

🔧 How to be Test Driven with Spark: 2 - CI


📈 26.09 Punkte
🔧 Programmierung

🔧 ❌ Test-Driven Development ✅ Jesus-Driven Development


📈 22.59 Punkte
🔧 Programmierung

🔧 Test-Driven Development (TDD) and Behavior-Driven Development (BDD)


📈 22.59 Punkte
🔧 Programmierung

🔧 Observability-Driven Development vs Test-Driven Development


📈 22.59 Punkte
🔧 Programmierung

🔧 The difference between test-driven development and observability-driven development


📈 22.59 Punkte
🔧 Programmierung

🔧 Domain Driven Design in AI-Driven Era


📈 19.23 Punkte
🔧 Programmierung

🔧 Data-Driven and Keyword-Driven Testing in Selenium Python: A Comparative Analysis


📈 19.23 Punkte
🔧 Programmierung

🔧 Data-Driven and Keyword-Driven Framework: Differences, Challenges, and Benefits


📈 19.23 Punkte
🔧 Programmierung

🔧 Code Intelligence launches AI test agent Spark


📈 16.47 Punkte
🔧 Programmierung

📰 heise+ | Positive Grid Spark 2: Mobiler Übungsverstärker für Gitarristen im Test


📈 16.47 Punkte
📰 IT Nachrichten

📰 Test Tecno Spark 20 Pro+ Smartphone – Für unter 200 Euro überraschend gute Kameras


📈 16.47 Punkte
📰 IT Nachrichten

📰 Turtle Beach Recon Spark im Test: 50-Euro-Headset mit nur einem echten Patzer


📈 16.47 Punkte
📰 IT Nachrichten

📰 DJI Spark im Test: intelligente Flugfunktionen, durchschnittliche Akkulaufzeit


📈 16.47 Punkte
📰 IT Nachrichten

🔧 Test-Driven Development (TDD) with Bun Test


📈 16.35 Punkte
🔧 Programmierung

🔧 AI-Driven Test Log Analysis & Reporting for Extracting Test Insights


📈 16.35 Punkte
🔧 Programmierung

🔧 How to test websites: Using SIRV and Playwright for test driven development (TDD)


📈 16.35 Punkte
🔧 Programmierung

🔧 Meet Test, your Agile specialist with a focus on Test Driven Development


📈 16.35 Punkte
🔧 Programmierung

🔧 The importance of having a red test first in test driven development


📈 16.35 Punkte
🔧 Programmierung

🔧 The Test List in Test-Driven Development (TDD)


📈 16.35 Punkte
🔧 Programmierung

🔧 Spark Augmented Reality (AR) Filter Engagement Metrics


📈 13.11 Punkte
🔧 Programmierung

🔧 Exam Prep, a Spark of Curiosity, and a Moonlit Ending


📈 13.11 Punkte
🔧 Programmierung

🔧 Spark Job Optimization


📈 13.11 Punkte
🔧 Programmierung

📰 Adopting Spark Connect


📈 13.11 Punkte
🔧 AI Nachrichten

🔧 Avoid These Top 10 Mistakes When Using Apache Spark


📈 13.11 Punkte
🔧 Programmierung