JFrog Prevents Supply Chain Attack Through Binary Secret Scanning
Posted: Tuesday, Jul 23

i 3 Table of Contents

JFrog Prevents Supply Chain Attack Through Binary Secret Scanning

The JFrog Security Research team has recently discovered and reported a leaked access token with administrator access to Pythonโ€™s, PyPIโ€™s and Python Software Foundationโ€™s GitHub repositories, which was leaked in a public Docker container hosted on Docker Hub.

As a community service, the JFrog Security Research team continuously scans public repositories such as Docker Hub, NPM, and PyPI to identify malicious packages and leaked secrets. The team reports any findings to the relevant maintainers before attackers can take advantage of them. Although many secrets that are leaked in the same manner are encountered, this case was exceptional because it is difficult to overestimate the potential consequences if it had fallen into the wrong hands โ€“ one could supposedly inject malicious code into PyPI packages (imagine replacing all Python packages with malicious ones), and even to the Python language itself.

The JFrog Security Research team identified the leaked secret and immediately reported it to PyPIโ€™s security team, who revoked the token within a mere 17 minutes!

What Was Found

JFrog’s secrets scanning engine detected a โ€œclassicโ€ GitHub token in one of the public Docker Hub repositories. The risk with โ€œclassicโ€ GitHub tokens is that, unlike the newer fine-grained tokens, they grant similar permissions across all repositories the user has access to.

In JFrog’s case, the user had admin access to the core repositories of Pythonโ€™s infrastructure, including Python Software Foundation (PSF), PyPI, the Python language and CPython.

GitHub Organization # of Repositories with admin access
python 91
pypa 55
psf 42
pypi 21

What Could Have Happened?

The implications of someone finding this leaked token could be extremely severe. The holder of such a token would have had administrator access to all of Pythonโ€™s, PyPIโ€™s and Python Software Foundationโ€™s repositories, supposedly making it possible to carry out an extremely large scale supply chain attack.

Various forms of supply chain attacks were possible in this scenario. One such possible attack would be hiding malicious code in CPython, which is a repository of some of the basic libraries which stand at the core of the Python programming language and are compiled from C code. Due to the popularity of Python, inserting malicious code that would eventually end up in Pythonโ€™s distributables could mean spreading your backdoor to tens of millions of machines worldwide.

Python Language Supply Chain Attack VectorPython Language Supply Chain Attack Vector

Another possible scenario could be inserting malicious code into PyPIโ€™s Warehouse code, which is used to manage the PyPI package manager. Imagine an attacker inserting code that grants them a backdoor to PyPIโ€™s storage, allowing them to manipulate very popular PyPI packages, hiding malicious code inside them, or replacing them altogether. Although this is not the most sophisticated way to carry out an attack that would remain undetected for a long time, itโ€™s certainly a scary scenario.

PyPI Supply Chain Attack VectorPyPI Supply Chain Attack Vector

Why Was the Token Found Only In The Binary?

The authentication token was found inside a Docker container, in a compiled Python file โ€“ __pycache__/build.cpython-311.pyc:

However, the same function in the matching source code file didnโ€™t contain the token.

It seems that the original author โ€“

  1. Briefly added the authorisation token to their source code
  2. Ran the source code (Python script), which got compiled into a .pyc binary with the auth token
  3. Removed the authorization token from the source code, but didnโ€™t clean the .pyc
  4. Pushed both the clean source code and the unclean .pyc binary into the docker image

Here is a comparison of the decompiled build.cpython-311.pyc file vs. the source code that was actually on the Docker container โ€“

Reconstructed source code from the binary โ€œbuild.cpython-311.pycโ€

 

Actual source code of the matching file in the Docker container

The decompiled code from the .pyc cache file was similar to the original, but included an authorization header with a valid GitHub token.

Scanning for Secrets In Source Code Is Not Enough

From what JFrog has seen, itโ€™s clear that the solution in this case wouldโ€™ve been to audit both the source code and the binary data inside the published Docker image. While searching for leaked secrets in binary files is more difficult than text-based files, sometimes the critical data resides only in the binary data โ€“

source code and the binary data

PyPIโ€™s Quick Response

PyPIโ€™s security team handled this issue with the utmost urgency. Leaks are inevitable and as such any organisation cannot be expected to be 100% leak proof, but rather to act quickly when a leak is discovered and assess if any damage occurred due to the leak.

In this case, after discovering the token, JFrog immediately informed the PyPI security team and the tokenโ€™s owner about the incident. PyPIโ€™s security team responded very quickly by revoking it and responding to us just 17 minutes after JFrog’s team reached out to them. Fortunately, PyPI conducted a thorough check and concluded that there was no suspicious activity involving the token.

PyPI also posted more details about the leak and their incident response in their blog.

What Can We Learn About Secret Detection?

While this case was alarming, valuable lessons can be learnt on working with access tokens.

  1. Scanning secrets in source code and even text-based files is simply not enough. Modern IDEs and development tools effectively detect secrets in source code and prevent their leakage. However, their scope is limited to code and doesnโ€™t include binary artifacts created by build and packaging tools. Most secrets we encountered in open-source registries were located in the environment, configuration, and binary files.
  2. Replace old-style GitHub tokens with new ones, for better visibility. Initially, GitHub used a hex-encoded 40-character token string that was indistinguishable from a SHA1 hash string and wasnโ€™t caught by most secret scanning tools -.
    'Authorization': 'Bearer 0d6a9bb5af126f73350a2afc058492765446aaad'
    In 2021, GitHub switched to a new token format but understandably didnโ€™t require all users to regenerate their tokens. Among other features, the new format of the token contains the recognisable prefix ghp_ and even embeds a checksum, allowing secret detection tools to detect them more easily and with perfect accuracy.
  3. Your token should provide access only to the resources required by the application using it. Creating the โ€œone ring to rule them allโ€ is always a bad idea. Two years ago, GitHub introduced new, fine-grained tokens. Unlike the โ€œclassicโ€ ones, they allow users to choose privileges and repositories available to the personal access token and limit its scope to the minimally required for the given task. We highly recommend using this feature, as we frequently encounter situations where a token providing ultimate access to the entire infrastructure gets leaked within a side project or temporary โ€œhello-worldโ€ application.

JFrog Secrets Detection โ€“ The Binary Advantage

JFrogโ€™s secret detection engine was able to find this critical token, even though it was leaked in a compiled Python binary file (pyc). The leaked tokens could be detected due to two important reasons:

  1. JFrog Secrets Detection runs both shift left, such as within a developerโ€™s IDE, and shift right, such as within a deployed Docker container.
  2. JFrog Secrets Detection looks for leaked secrets in text files and binary files โ€“ leaving you covered on all fronts

Detection is based on JFrog Xrayโ€™s scanning of configuration files, text files and binary files for plain text credentials, private keys, tokens, and similar secrets. Leveraging both a constantly-updated list of more than 150 specific types of credentials and a proprietary generic secrets matcher for the best coverage possible.

Stay Up-to-date With JFrog Security Research

The security research teamโ€™s findings and research play an important role in improving the JFrog Software Supply Chain Platformโ€™s application software security capabilities.

Follow the latest discoveries and technical updates from the JFrog Security Research team on our research website, and on X @JFrogSecurity.

Brian Moussalli
Brian is a Malware Research Team Leader at JFrog Security, specialising in supply chain attacks and malicious packages, vulnerability analysis, threat intelligence research and automated threat detection. In addition to his current role, he has over 13 years of experience in cyber security, security research, reverse engineering and malware analysis
Share This