GitHub private repo access control can be circumvented in certain cases

2020-05-05

I found a little security vulnerability in GitHub.com. I reported it to GitHub on 2020-04-30 and they told me “this is both a known issue and an accepted low risk”… which was a bit disappointing. Even if it's a known issue, I think it is interesting, so what follows is an extended version of the report I sent them.

Abstract

For some private repo A owned by Alice, which has a private fork B owned by Bob, commits in B that were never pushed upstream to A are nonetheless accessible by Alice from repo A via their SHA1, even though Bob has revoked Alice's access to his fork. Since a SHA1 can be truncated to as few as four hex digits and still be used to access a commit on GitHub, Alice can brute-force enumerate all commits in Bob's repository in as few as 65,536 requests.

Background

In the general case, when you fork a repository on GitHub, GitHub does not actually make a copy of all of its commits. Instead, GitHub duplicates the refs (names for certain commits, i.e. branches and tags) for the fork, but uses a shared store for commits belonging to both the original and the forked repository. As a consequence, any commit SHA1 accessible through one repository is also accessible through any other repository in the same “network”, as GitHub calls it. For example, an imaginary commit with the SHA1 hash FOOBAR accessible at https://github.com/hikari-no-yume/php-src/commit/FOOBAR is also accessible at https://github.com/php/php-src/commit/FOOBAR, or indeed any other public fork of the php-src repository on GitHub, regardless of whether the commit was ever pushed upstream/pulled downstream. To be clear, this isn't a new discovery, it is a thing many other people are aware of and which has been true about GitHub for a long time.

For public repositories this is an merely an interesting curiosity, with no security implications: the commits are public anyway, so it doesn't really matter if you can access them from an unexpected URL. Someone with a keen eye for security issues might wonder, however, what would happen if a private repository has a public fork, or vice-versa: could you expose private commits via direct SHA1 access? Well, it seems like GitHub must have thought of that, because they go out of their way to prevent private forks of public repositories, and vice-versa, from being in the same “network”. If you try to create that situation, GitHub will either refuse to let you do so and force you to manually mirror the repository, or do so automatically for you. Either way, the result is a truly independent fork which is not part of the same “network”, so there can't be any access control problems.

However, what about private forks of private repositories — do those use “networks”? Turns out yes. Can the original repository and fork be owned by different users and have different access controls, yet still be part of a network? Also yes. So, can a user with access to one repository but not the other therefore use direct SHA1 access on the former to circumvent access controls on the latter? Apparently, also yes! And thus I found this vulnerability.

Steps to reproduce

Alice creates a private repository on her account. She chooses the option to have the repository pre-initialised with a README. Let's call this repo A.
Alice goes to Settings > Manage access > Invite Collaborator on A and invites Bob.
Bob accepts Alice's invite.
Bob goes to repo A and forks it to his account. Let's call this fork repo B.
Bob goes to Settings > Manage access on B and revokes Alice's access to A. Note however that Alice has not removed Bob as a collaborator.
Bob goes to repo B and uses the edit button to create a new commit in B. Note that Bob does not push this commit upstream to A.
Alice by some means finds out the SHA1 hash, or abbreviation of the SHA1 hash, of the new commit Bob has created. (I discuss later some ways this could happen.)
Alice browses to https://github.com/Alice/A/commit/<hash> (where “Alice” is Alice's username, “A” is her repo, “<hash>” is the hash of Bob's commit). To her surprise, she sees Bob's commit from B, despite not having access to B, and Bob never having pushed that commit to A.

How we actually tested it

My friend wareya helped me out in testing this, since I wanted to make sure it could happen across accounts, and GitHub's ToS prohibit someone from having two Free accounts. We followed the steps described above, where I was Alice and they were Bob. The only deviation from the steps above is that prior to step 5 (revoking access) we had already attempted steps 6 through 8 (creating a new commit, obtaining the SHA1, viewing it in the wrong repo) and succeeded, but this was not surprising nor an apparent vulnerability, as without doing step 5 it is expected that Alice and Bob can see eachother's commits. After that we then did do step 5 and then steps 6 through 8 (again), and we saw the vulnerability.

[redacted link] is the commit I as “Alice” could see in “A” despite “Bob” having committed it in “B” after revoking “Alice”'s access.

I also tried an additional step after all the above: revoking “Bob”'s access. This resulted in “Bob” losing access to “B” (GitHub deletes collaborator's forks if you remove them as a collaborator), but I could still see “Bob”'s commits to “B” via “A”, despite “B” being deleted.

Obtaining a SHA1 hash

My initial assumption was that it might be acceptable to be able to access others' commits without permission if their hashes are known to an attacker, because hashing is a one-way function producing a very long unguessable string, so a hacker would presumably only know the SHA1 if they also knew the commit content. However, I quickly realised this is not a reasonable assumption for at least two reasons:

Git commit hashes are often used as part of versioning in various ways. For example, git describe can be used to programmatically generate a version number to embed into an executable or use in its filename. I suspect the use of hashes is particularly likely in CI systems, and an application developer might not even be aware that a binary contains a commit hash if it is hidden in automatically-generated metadata. None of these would normally be considered negligent on the part of the producer of the executable, commit hashes should be harmless information on their own. It is therefore conceivable that Alice might be able to discover the SHA1 of one of Bob's commits by wholly legitimate means, without fault on Bob's part.
The https://github.com/<user>/<repo>/commit/<hash> syntax works not only for full-length hashes, but also for abbreviated hashes. In my testing the shortest abbreviated hash allowed seemed to be 4 characters. Given the hash is hexadecimal, four characters means 16⁴, or 65536 possible combinations. That is a small enough number that it would be reasonable for an attacker to try all of them!

A possible attack scenario: Bob's fork of Alice's code is proprietary software, where Alice is allowing Bob to access and extend her code, but not vice-versa. Alice can use one of the these two methods to find out SHA1s and potentially access the proprietary source code to Bob's extensions, despite Bob not having granted her this access, due to this vulnerability.

Known issue?

When submitting the report to GitHub, I got an automated “are-you-sure” message:

GitHub stores the parent repository along with forks in a "repository network". It is a known behavior that objects from one network member are readable via other network members. Blobs and commits are stored together, while refs are stored separately for each fork. This shared storage model is what allows for pull requests between members of the same network.

I addressed it with the following two paragraphs:

I was aware of this, and indeed it can't be considered a vulnerability where both repositories are public, nor both repositories are private and their respective owners have mutual access. But that made me curious what happens in variations of these. It seems that for forks where one repository is public and the other private or vice-versa, and also changes between public and private, GitHub tries its best to split the “repository network” or prevent one being formed in these cases, so I found no vulnerability.

But the further case I tried, which is the one I am reporting here, does seem like a vulnerability to me, unless GitHub explicitly warns users that revoking collaborator access in one direction only does not provide security on private repositories if it is a fork? It is a clear violation of access controls AFAICT.

hikari's blog