Wed. Dec 18th, 2024

After Chainalysis Head of Investigations Elizabeth Bisbee had to admit to the lack of scientific evidence for the accuracy of Chainalysis’ Reactor software, experts of blockchain surveillance firm CipherTrace lay bare flaws in Chainalysis’ analysis.

An expert report filed on August 8th in the case United States vs. Sterlingov reveals a range of mistakes in Bisbee’s expert report as well as inaccuracies in the heuristics applied by Chainalysis’ Reactor software.

Chainalysis Reactor is a blockchain surveillance tool used to trace funds on the blockchain for law enforcement purposes. The widespread use of Chainalysis’ Reactor could pose a serious threat to democratic justice proceedings if the software’s findings prove to be unsubstantiated.

Roman Sterlingov is an early Bitcoin adopter accused of operating the custodial Bitcoin mixer Bitcoin Fog, who has been awaiting trial in a Virginia jail since 2021. Sterlingov is defended by Tor Ekeland, who is currently challenging the findings of Chainalysis Reactor in court. In Ekeland’s opinion, Chainalysis is “the Theranos of blockchain forensics.” As multiple expert evaluations of Chainalysis’ findings in the case show, he may not be wrong.

In an expert report to determine the viability of the accusations served against Sterlingov regarding the tracing of funds, Jonelle Still, director of investigations and intelligence at CipherTrace, now describes the use of Chainalysis’ behavioral clustering heuristic as “reckless”.

Chainalysis’ behavioral clustering heuristic aims to detect patterns in the structure or timing of transactions to identify a specific wallet software. By investigating a wallet service’s transaction patterns, Chainalysis applies clustering algorithms to map addresses belonging to the service.

In the case of Bitcoin Fog, CipherTrace has calculated a discrepancy in accuracy of roughly 64% for the behavioral clustering heuristic, which Still describes as overly inclusive. The inaccuracy of Chainalysis’ behavioral clustering heuristic would then be compounded by successive runs of co-spend and behavioral heuristics, leading to even more unreliable results.

“Notably,” Still continues in her report, “the heuristics with the highest claimed accuracy rates, FindNext and FindNext2, failed to find a link between Mt Gox [Sterlingov’s] transactions and Bitcoin Fog.” As opposed to behavioral clustering, FindNext heuristics are able to produce false discovery rates of only 0.62% and 0.02%, respectively.

CipherTrace, whose partners include Israeli digital forensics firm Cellebrite as well as the South African open source intelligence firm Maltego, refrains from using behavioral clustering as applied by Chainalysis as it is “not a true representation of the flow of funds on chain”, making it inaccurate and error-prone.

Still further criticizes Chainalysis’ use of single entity clustering, in which a root address is assigned to an entity “which may or may not be the correct address that transacted.” Such “lumping together” of data is described as being non-verifiable and can lead to many tracing errors, including a higher probability of false positives and negatives.

According to the report, “Law enforcement and other customers of Chainalysis have approached CipherTrace on this topic and have expressed frustration related to the errors they experience using Chainalysis Reactor.”

To add insult to injury, Still additionally highlights a non-exhaustive list of errors in Bisbee’s expert report, such as the use of bits instead of bytes leading to incorrect mathematical assumptions as well as multiple apparent incorrect identifications of change addresses. The report further highlights the missing of a number of script types, such as P2PK, P2MS, P2WSH, or P2TR and the incorrect statement that “a SegWit address begins with 3”, which also identifies P2SH addresses.

Citing a lack of data integrity, Still estimates that there are “hundreds of millions of data points that are unverified,” which “may warrant re-examination” of other cases based on these revelations.

To protect the integrity of data in criminal justice proceedings, Still recommends that “Chainalysis attribution data should not be used in court for this case nor any other case: it has not been audited, the model has not been validated, nor has the collection trail been identified.”

The report highlights the importance of model validation, which can be used to verify the accuracy of data enrichment and provide checks on the performance of a model. Providers should have “well documented, auditable processes for attribution and clustering” as opposed to “black-box models,” which use potentially unauthorized customer data” and “unverified user feedback”.

Still concludes that “Blockchain forensics should only be used to generate investigatory leads. Standing alone, they are insufficient as a primary source of evidence. What is striking about this case is the conclusions reached without any corroborating evidence for the blockchain forensics.”

Still further states that “The blockchain forensics and tracing tools used in this case were misused to erroneously conclude that Mr. Sterlingov was the operator of Bitcoin Fog when no such evidence exists on-chain.”

Still calls the failures of the blockchain forensics in this case “structural issues” in the space and calls for an independent audit of Chainalysis and their methodologies to “prevent wrongful arrests like this one, and failures in compliance, like with FTX.”

This is a guest post by L0la L33tz. Opinions expressed are entirely their own and do not necessarily reflect those of BTC Inc or Bitcoin Magazine.