Cookie Consent by Free Privacy Policy Generator Aktuallisiere deine Cookie Einstellungen ๐Ÿ“Œ Buildkite Agent Bug - fork/exec ... operation not permitted


๐Ÿ“š Buildkite Agent Bug - fork/exec ... operation not permitted


๐Ÿ’ก Newskategorie: Programmierung
๐Ÿ”— Quelle: dev.to

The Problem

In the past months, I was working on tracking down a bug caused by the dependency PTY package that the Buildkite agent depends on. The following are the symptoms:

Symptom 1 - operation not permitted for git checkout:

2024-05-20 11:15:52 PDT $ git clone -v -- ssh...
...
...
2024-05-20 11:15:52 PDT ๐Ÿšจ Error: Failed to checkout plugin docker: error running "/usr/bin/git checkout -f e9efccb": error starting pty: fork/exec /usr/bin/git: operation not permitted
2024-05-20 11:15:52 PDT Running global pre-exit hook

Symptom 2 - operation not permitted for git clean:

2024-05-21 09:28:12 PDT โš ๏ธ Warning: Checkout failed! cleaning repository post-checkout: error running "/usr/bin/git clean -ffxdq": error starting pty: fork/exec /usr/bin/git: operation not permitted (Attempt 1/3 Retrying in 2s)

Symptom 3 - operation not permitted fork/exec bash:

2024-05-20 10:37:40 PDT ๐Ÿšจ Error: error running "/bin/bash -e -c \"trap 'kill -- $$' INT TERM QUIT; if [ -n \\\"$(git diff HEAD~1 --exit-code 'cruise/ai_platform/search')\\\" ]; then\\n  cat <<- ...\"": error starting pty: fork/exec /bin/bash: operation not permitted

The Investigation

After further investigation, I made following observations:

  • this doesnโ€™t happen consistently for a single step. If a step failed once, the next step is likely going to pass the next time
  • this happens at low frequency. At Cruise, we run over 100k build steps on daily basis. And it only occurs about <10 times a day.
  • this happen among different pipelines. This means the likelihood is only 0.01%.
  • the git operation failed even after the previous git operation succeeded (eg. git clone in the previous hooks within the same step worked without the permission issue)
  • it has nothing to do with the +x flag on the binary being executed, both git and bash has the proper permission flag set.
  • the machine status or installed package is not mutated by the CI job in any malicious way. This is guaranteed by how we run CI infra at Cruise, each time a CI job is created, we spin up a branch new ephemeral VM to handle the job, and the job is terminated right after the job finishes. This guarantees that machine is not messed up.

(If you are interested in learning how we design ephemeral CI infrastructure, please comment below :D )

Given all the observation above, this lead me to believe the bug is not specific to any build step, or any pipeline. Hence the build is likely somewhere at the VM level. The next thing come into my radar is the buildkite agent binary itself.

The Buildkite Agent

I've been doing Buildkite agent upgrade from time to time. One thing I noticed is that I recently made an upgrade to the Buildkit agent from v3.36.1 to v3.59.0. From our log, it showed that the bug started showing up after the v3.59.0 upgrade. Hence, the bug must be lying between the difference of two version. I'm close to the truth.

Buildkite Support is very helpful

After reported the problem to Bulidkite support, and working with their support engineers. We decided to compare the strace output between the Buildkite agent version v3.36.1 and v3.59.0. There is a specific line that the Buildkite support pointed out that's interesting:

ioctl(0, TIOCSCTTY, 1)            = -1 EPERM (Operation not permitted)"

According to the Buildkite support:

By default the agent runs the command process inside a pseudo-terminal, which leads to this particular ioctl call. A possible workaround would be to disable PTY mode (buildkite-agent start --no-pty or BUILDKITE_NO_PTY=1).

Then, I tested turning off the PTY. Unfortunately, this is not a backward compatible change we can easily adapt at infrastructure level. Because some of the workloads that depends on the PTY will break, eg docker run -ti.

So the remaining option is to fix the PTY itself. And more digging shows it's the github.com/creack/pty package that Buildkite agent depends on that's doing the setup of PTY for the Buildkite agent. And we found Agent v3.36.1 used v1.1.12 of ptyPTY and Agent v3.59.0 used v1.1.20. The latest is v1.1.21, which is used in Agent v3.60.0 and later. And recently there is a fix for a race condition on Linux in the v1.1.21 release notes.

Testing Buildkite Agent v3.60.0

The next immediate fix is to valid the fix within PTY package by upgrading Buildkite agent to v3.60.0. Although the upgrade helped, it doesn't fully resolve the issue. As shown in the follow graph, the frequency reduced from ~10 times per day to <1 time per day. And it still happens.

Image description

Finding the breaking change

Then question becomes which version of the PTY package introduce the bug. In order to find the breaking change, I did a binary search on the PTY package version. We know v1.1.12 works without the issue, and v1.1.21 doesn't work. So the bug must be lying within the range. And there are about 10 release between v1.1.12 and v1.1.21. I tested v1.1.17, v1.19, and couldn't reproduce the problem is any of the versions. So we concluded that the breaking change is v1.1.20+

What we did was simply path the OSS buildkite agent's go.mod with a downgraded PTY package version as v1.1.19. And we noticed for any Buildkite agent v3.58.0+, they all uses the PTY v1.1.20+. So if you running into the same issue, downgrading the PTY package is likely what you need.

Cheers

...



๐Ÿ“Œ macos Bug for Launch Daemons in launchctl: posix_spawn ... Operation not permitted error 1


๐Ÿ“ˆ 45.68 Punkte

๐Ÿ“Œ How do I handle this error? fanotify_init: Operation not permitted


๐Ÿ“ˆ 40.66 Punkte

๐Ÿ“Œ How I read/write the file "chall", it said operation not permitted whenever I tring to modify the file or file's gid/uid


๐Ÿ“ˆ 40.66 Punkte

๐Ÿ“Œ How to Resolve the โ€œChown Operation Not Permittedโ€ Error in Linux


๐Ÿ“ˆ 40.66 Punkte

๐Ÿ“Œ How to ignore reserved directories and avoid printing operation not permitted


๐Ÿ“ˆ 40.66 Punkte

๐Ÿ“Œ Buildkite Agent Bug - fork/exec ... operation not permitted


๐Ÿ“ˆ 40.66 Punkte

๐Ÿ“Œ CVE-2023-43116 | Buildkite Elastic CI for AWS up to 5.22.4/6.7.0 PIPELINE_PATH symlink (ATREDIS-2023-0003)


๐Ÿ“ˆ 28.37 Punkte

๐Ÿ“Œ Buildkite Raises $21 Million to Invent the Future of DevOps


๐Ÿ“ˆ 28.37 Punkte

๐Ÿ“Œ Buildkite adds native Kubernetes support in latest update


๐Ÿ“ˆ 28.37 Punkte

๐Ÿ“Œ CVE-2023-43741 | Buildkite Elastic CI for AWS up to 5.22.4/6.7.0 PIPELINE_PATH toctou (ATREDIS-2023-0003)


๐Ÿ“ˆ 28.37 Punkte

๐Ÿ“Œ How DRM has permitted Google to have an "open source" browser that is still under its exclusive control


๐Ÿ“ˆ 25.93 Punkte

๐Ÿ“Œ Sadly, the web has brought a whole new meaning to the phrase 'nothing is true; everything is permitted'


๐Ÿ“ˆ 25.93 Punkte

๐Ÿ“Œ Starbucks: Default credentials for the temporary POC site alipoc.stg.starbucks.com.cn permitted WAF bypass and RCE


๐Ÿ“ˆ 25.93 Punkte

๐Ÿ“Œ Starbucks: Misuse of an authentication cookie combined with a path traversal on app.starbucks.com permitted access to restricted data


๐Ÿ“ˆ 25.93 Punkte

๐Ÿ“Œ India To Control Which Lending Apps Are Permitted On App Stores in Latest Crackdown


๐Ÿ“ˆ 25.93 Punkte

๐Ÿ“Œ To Agent or Not to Agent: That Is the Vulnerability Management Question


๐Ÿ“ˆ 22.94 Punkte

๐Ÿ“Œ Scanner Error 0x00000015, Operation could not be completed, The device is not ready


๐Ÿ“ˆ 19.7 Punkte

๐Ÿ“Œ Security Operation Center (SOC) FAQ: Was Unternehmen รผber Security Operation Center wissen mรผssen


๐Ÿ“ˆ 19.48 Punkte

๐Ÿ“Œ Great news!.. Rename and copy operation globally by PCRE2-controlled bash script rename operation


๐Ÿ“ˆ 19.48 Punkte

๐Ÿ“Œ Micro Focus Operation Bridge Manager/Operation Bridge Privileges access control


๐Ÿ“ˆ 19.48 Punkte

๐Ÿ“Œ Operation Cronos: law enforcement disrupted the LockBit operation


๐Ÿ“ˆ 19.48 Punkte

๐Ÿ“Œ Operation Hotel โ€“ Ecuador spent millions on spy operation for Julian Assange


๐Ÿ“ˆ 19.48 Punkte

๐Ÿ“Œ More details about Operation Cronos that disrupted Lockbit operation


๐Ÿ“ˆ 19.48 Punkte

๐Ÿ“Œ Operation Texonto: Information operation targeting Ukrainian speakers in the context of the war


๐Ÿ“ˆ 19.48 Punkte

๐Ÿ“Œ Operation Endgame: 911 S5 Botnet zerschlagen; Administrator in internationaler Operation verhaftet


๐Ÿ“ˆ 19.48 Punkte

๐Ÿ“Œ Operation Endgame, the largest law enforcement operation ever against botnets


๐Ÿ“ˆ 19.48 Punkte

๐Ÿ“Œ Operation Endgame, The Largest Ever Operation Against Botnets


๐Ÿ“ˆ 19.48 Punkte

๐Ÿ“Œ Operation Tango: Geheimnisvolle Koop-Schleich-Action mit Agent und Hacker angekรผndigt


๐Ÿ“ˆ 18.72 Punkte

๐Ÿ“Œ Jenkins bis 2.43 Agent-to-Agent Security Subsystem Blacklist erweiterte Rechte


๐Ÿ“ˆ 17.96 Punkte

๐Ÿ“Œ Cisco Secure Services Client/Trust Agent/Security Agent GUI unknown vulnerability


๐Ÿ“ˆ 17.96 Punkte

๐Ÿ“Œ Qualys erweitert Cloud-Agent-Plattform um den innovativen neuen Dienst Cloud Agent Gateway (CAG)


๐Ÿ“ˆ 17.96 Punkte

๐Ÿ“Œ Puppet Agent up to 1.6.0 pxp-agent Environment Variable privilege escalation


๐Ÿ“ˆ 17.96 Punkte











matomo