LinkingLion: An entity linking Bitcoin transactions to IPs?

Do dandelions help against lions?

Tuesday, March 28, 2023

This post describes and discusses the behavior of an entity I call LinkingLion. The entity opens connections to many Bitcoin nodes using four IP address ranges and listens to transaction announcements. This might allow the entity to link newly broadcast transactions to node IP addresses. The entity has been active in some capacity since 2018 and is also active on the Monero network using the same IP address ranges. The entity might be a blockchain analysis company collecting data to improve its products.

I previously observed an entity making multiple, short-lived connections per second to many nodes on the Bitcoin P2P network. I called this entity an “Inbound Connection Flooder” and wrote about my initial observation in this post. However, after closer inspection of the entity’s behavior, I think these short-lived connections are only a symptom of the primary goal. I suspect this entity is likely tracking transaction propagation to attempt to determine which node broadcasts which transaction to link transactions to IP addresses.

The entity uses IP addresses from three IPv4 /24 ranges and one IPv6 /32 range to connect to listening nodes on the Bitcoin network. These IP address ranges are all announced by AS54098, LionLink Networks. However, the ranges belong to different companies based on ARIN and RIPE registry information.

162.218.65.0/24: Fork Networking, LLC (forked.net) - ARIN Whois
209.222.252.0/24: Castle VPN, LLC (castlevpn.com) - ARIN Whois
91.198.115.0/24: Linama UAB (?) - RIPE Whois
2604:d500::/32: Data Canopy (datacanopy.com) - ARIN Whois

Fork Networking and Castle VPN are US-based companies owned by the same person. Fork Networking offers hosting and Colocation services, while Castle VPN is a VPN provider. Linama UAB is a Lithuanian company with no web presence. Data Canopy is a US-based company offering cloud and colocation data centers. Since the connections from these IP ranges share very similar behavior, I assume they are controlled or rented by the same entity. I’m calling the entity “LinkingLion” as the AS LionLink Networks is the common factor for these IPs, and I assume the entity is trying to link transactions to IP addresses.

Behavior

To analyze the behavior of LinkingLion, I recorded the network traffic between my node and the entity’s IP ranges for about five days in the first half of March 2023. In this timeframe, about 200.000 connections were opened to my node from the entity. In the following section, I’ll walk through the observed behavior.

Connection establishment and handshake

Out of the four IP ranges, the entity uses the following 812 addresses (list) to open TCP connections to many listening Bitcoin nodes on the network:

162.218.65.11 - 162.218.65.254 (244 addresses)
209.222.252.2 - 209.222.252.254 (253 addresses)
91.198.115.3 - 91.198.115.62 (60 addresses) + 91.198.115.114
2604:d500:4:1::2
2604:d500:4:1::3:2 - 2604:d500:4:1::3:fe (253 addresses)

It uses the full range of ephemeral ports (1024-65535), which deviates from the default behavior of many operating systems (most use a smaller subset). It can be observed that the same IP address repeatedly connects, in some cases more than 50 times, before the entity switches to another IP address in the same address range.

The entity establishes a TCP connection to our Bitcoin node and starts the version handshake by sending a version message. The version messages have obscure user agents like, for example, /bitcoinj:0.14.3/Bitcoin Wallet:4.72/, /Classic:1.3.4(EB8)/, or /Satoshi:0.13.2/. In total, 118 different user agents are used. Nearly all of these appear in version messages with the same frequency, which indicates that the user agents are picked from a list and are likely fake. The entity uses 0 as the nonce for all connections and sets the transaction relay flag to receive information about new transactions we know.

The block height sent in the version message does not match the block height known to the Bitcoin network. About 98% of the connections increment the block height precisely every 10 minutes. Since the average time between blocks has been less than 10 minutes over the last few months, the entity’s height lags behind the network’s best height. In the observed connections, two different height configurations can be identified as lagging by about 700 and 2100 blocks. I estimated that the entity’s and the network’s height for the connections lagging by about 700 blocks matched in late Q4 2022 or early Q1 2023, and the height for connections lagging by 2100 matched in Q3 2022. I assume this is the time the height was last configured. For about 2% of the connections, the height is always set to block 658501. These connections all originate from 2604:d500:4:1::2, 91.198.115.114, or 162.218.65.219.

My node responds with a version and a verack message acknowledging that it understood the entity’s version message. At this point, the entity is expected to respond with a verack to complete the handshake. However, the entity closes about 82% of connections without sending a verack message. These connections are short-lived, with a connection duration of only a few seconds. All IPv4 addresses besides 209.222.252.2 open these connections. However, only the IPv6 address 2604:d500:4:1::2 opens short-lived connections, while the other IPv6 addresses don’t.

Opening a short-lived connection and closing it right after receiving the version message is typical when checking if a node is reachable on a given address. The entity also learns metadata like which network services the node offers, what version the node has, and what height it considers the blockchain to be.

Communication

The remaining 18% of the opened connections receive a verack and stay open longer. After the handshake, a Bitcoin Core node sends a sendcmpct message indicating support for Compact Block Relay, a ping message, a feefilter message with the minimum feerate we’re interested in, and a getheaders message requesting new headers the peer might know. The entity responds with a pong message and continues to respond for the duration of the connection. It never initiates a ping itself.

From here on, two different behaviors can be observed. Either the entity listens for inv messages from us for up to 150 seconds (2 minutes and 30 seconds), or it sends us a getaddr and listens for inv and addr messages from us for up to 600 seconds (10 minutes) before closing the connection. We send 15 inv messages on average to the entity during the shorter, inv-only connections. During the longer inv-and-addr connections, we send an average of six addr-messages and 104 inv-messages. In the Bitcoin protocol, inv (inventory) messages are announcements that new blocks or transactions are available. Upon receiving an inv, a node might request the block or transaction if it doesn’t know about it yet. The entity never requests blocks or transactions.

The inv-only connection duration is similar for the three IPv4 address ranges. Many connections are closed after either 90 seconds or 150 seconds. The connections from the 253 addresses in the 2604:d500:4:1::3 IPv6 range are primarily closed after 150 seconds, while some are closed earlier, between 90 and 150 seconds. The connections from 2604:d500:4:1::2 are closed nearly uniformly between 0 and 90 seconds. Generally, there are no special IP addresses used only for longer or shorter connections. The only outliner is 209.222.252.2, which only makes the longer 150-second connections. The IP 162.218.65.219 is notable for making twice the number of connections than the other IPs in the same IP range. The inv-and-addr connections request addresses with a getaddr message and only stem from 2604:d500:4:1::2 and 91.198.115.114. These are closed just after being open for 600 seconds.

Connection duration per IP range, stacked

Mass inbound-eviction

A Bitcoin Core node has a limited number of inbound connection slots. A new inbound connection might evict an existing one when all slots are full. Some peers are protected from being evicted, for example, peers that send us blocks or transactions we didn’t know about. Bitcoin Core’s eviction logic might choose to evict a peer from the network group with the most connections out of the unprotected peers. Bitcoin Core calculates network groups based on the /16 subnet for IPv4 and /32 subnet for IPv6.

LinkingLion often has multiple open connections to a node simultaneously. Once the inbound connection slots are full, a new inbound connection might cause one of the connections by the entity to be evicted. The entity reacts by opening another connection to the node, causing yet another connection to be evicted. I described this as “Inbound Connection Flooder” due to the high frequency of multiple hundred connections per minute. What I described only happens when a node’s inbound connection slots are full.

Other behavior

As reported in this monero issue, the same IP ranges also open connections to nodes on the Monero network. One user reports that the entity also uses IP addresses like 91.198.115.74, while I only observed connections using the IP addresses 91.198.115.3 to 91.198.115.62. The IP address ranges the entity uses have all been added to an IP block list for Monero nodes.

I’ve only seen connections from the 162.218.65.0/24 IP range starting at 162.218.65.11. However, there are publicly accessible logs, for example, [1], [2], and [3] from Summer 2021 showing requests to web servers from 162.218.65.10 with a Java/1.8.0_292 user agent. It’s unclear if these requests are related to the entity.

Discussion

In the following section I discuss questions that came up after making the above observations.

Are the connections from the same legal entity?

It’s unclear if the described LinkingLion entity is a single entity or a group of legal entities. The connections share patterns across the different IP address ranges. For example, the connection durations for the inv-only and inv-and-addr connection types are similar across the IP address ranges. Additionally, the IP address ranges all use the same fake user agents. While this indicates that the same or similar software is used to open connections through the same IP address ranges, it does not confirm that only one legal entity is behind these connections. Furthermore, the three different height configurations sent via the version message (static at height 658501, lagging by about 2100 and 700 blocks) could indicate three different configurations or versions of the software run by either one or multiple entities.

Are the connections opened through a VPN service?

Based on ARIN registry information, the 209.222.252.0/24 IP range belongs to a company called CastleVPN. This could indicate that the connections are opened through a VPN service. The other IP ranges could also be used as VPN endpoints, which would explain why multiple software configurations share the same IP addresses. However, this theory remains unconfirmed for now.

What information does the entity learn about a node it targets?

The information the entity learns from a node can be categorized into metadata, inventory, and addresses. All connections learn about node metadata, which includes if and when a node is reachable or unreachable, which software version runs on this specific node and when it upgrades, which block height it considers the best and when it changes, and which services the node offers. For example, if the node is pruned or serves bloom or compact block filters.

Connections that complete the version handshake and stay connected learn about our node’s inventory, like transactions and blocks. The timing information, i.e., when a node announces its new inventory, is especially relevant. The entity is likely to first learns about our new wallet transaction from us. As the entity is connected to many listening nodes, it can use that information to link broadcast transactions to IP addresses.

About 2% of the LinkingLion connections also ask our node to send it the network addresses of other nodes on the network. The entity likely uses these to find new targets to connect to and to keep connections to all possible nodes open. There are known ways of trying to infer the network topology, for example, how many connections a node has or who its peers are, based on address propagation. Based on the small number of connections that request and learn about other network addresses, this doesn’t seem to be the goal of the addr messages here. Though, it is also possible to learn about the network topology by tracking transaction relay.

However, why does the entity open multiple short-lived connections from multiple IP ranges to a single node? Similar information could be extracted with less effort and without opening and closing connections frequently. This would have avoided much of the noise that caused me to look at this in detail.

How long has the entity been active for?

I personally first observed the entity in the Summer of 2022. However, the entity has been active for longer. In August 2020, Bitcoin Core developer @jonatack posted a review comment on GitHub, which included the peers currently connected to his node. Four inbound connections from 2604:d500:4:1::2 with fake user agents are visible. Similarly, a screenshot in Bitcoin Core PR #18402: gui: display mapped AS in peers info window from March 2020 shows a connection from the same IP address as peer 43.

On an IP address banlist previously maintained by Greg Maxwell, now only accessible via the Way Back Machine, the IP ranges 162.218.65.0/24 and 2604:d500:4:1::2/128 can be found. They first appeared on the archived list in March 2019 and weren’t present in September 2018. The other two IP ranges are not on the list. However, the IP range 23.92.36.0/24, also announced by AS54098 LionLink Networks, can be found there since September 2018. There are #bitcoin-core-dev IRC logs from February 18, 2018, discussing this IP range with multiple users mentioning that they have multiple connections from that IP range. A screenshot shows two inbound connections (id 214 and 246) from this IP range with the user agents /Satoshi:0.10.2/ and /bitcoinj:0.14.3/. It seems the entity has been active since early 2018 in some capacity. However, it’s unknown whether the entity was active the whole time. The lagging block heights, presumably set in Q3 2022 and Q1 2023, indicate that the entity is still, at least to some degree, maintaining the data collection.

Who is the entity?

Most Bitcoin P2P anomalies originate from individuals playing around with the open network, companies with profit motives, for example, selling data to other companies and law enforcement, or by (academic) researchers. In this case, it seems unlikely that an individual would sustain this over multiple years. The IP address ranges and servers cost money. An academic experiment is usually shorter, too, as papers eventually need to be published. Academic researchers might not use fake user agents. It makes sense for a company to pay for IP address ranges and servers if they can sell the collected data or enhance an existing product. This could be a company doing blockchain analysis.

What are possible preventions and solutions?

A short-term prevention might be to manually ban the IP address ranges used by the entity from making inbound connections to nodes. I’ve published a transparent and Open Source banlist with the first entry being this entity. Node operators that want to protect against the entity making connections to their node can use this banlist. However, it’s important to note that this banlist is entirely optional and centralized. Another possibility is to contact the abuse contacts of the IP range owners or AS54098 LionLink Networking.

Both of these methods, however, don’t solve the root problem. The entity can easily switch to new IP ranges or route traffic through a different AS. The root problem is that transactions can to be linked to IP addresses. Fixing this requires changes in the initial transaction broadcast and transaction rebroadcast logic on the Bitcoin network and in Bitcoin Core. Transactions are transmitted to peers with independent, exponential delays. An entity opening multiple concurrent inv-listening connections to a node can link transactions to the node’s IP address with a high success rate. A solution might be Dandelion (in particular, Dandelion++ or some modification of it), where transactions are first transmitted to another node, which then broadcasts it. Dandelion++ is beeing used in Monero since 2020. An implementation attempt in Bitcoin Core did not succeed primarily due to DoS and complexity concerns.

Transaction broadcast over privacy networks like Tor is not affected if done correctly. A strategy is to broadcast a transaction to a node on the Tor network using a fresh connection and then close the connection right after. Some Bitcoin wallets with a strong focus on privacy implement similar features. It is currently not implemented in the Bitcoin Core wallet. Tools like bitcoin-submittx might be helpful.

To summarize, an entity frequently opens connections from multiple IP ranges to many nodes on the Bitcoin network. Some characteristics, like the fake user agents and the block heights that increase precisely every 10 minutes, confirm that the connections do not originate from some misconfigured Bitcoin node but are custom clients. About 20% of the connections are used to listen to transaction announcements, allowing the entity to link newly broadcast transactions to IP addresses. The same IP addresses connect to nodes on the Monero network too.

Only a few details about the entity are known. The same IP ranges have been making connections since 2018 in some capacity. It’s unclear if the IP ranges are maybe endpoints of a VPN service. Similarly, if the entity is a single entity or a group of legal entities is unknown. The behavior could indicate financial motives. A possibility is a blockchain analysis company that wants to enrich its product with additional data. A short-term solution might be a banlist or reporting the entity’s behavior. Solving the root problem requires deeper changes to the P2P logic in bitcoin.

You can check if LinkingLion is connecting to your listening clearnet Bitcoin Core node by grepping for the addresses in the getpeerinfo output:

bitcoin-cli getpeerinfo | grep -E '162.218.65|209.222.240|91.198.115|2604:d500:4:1'

Update 2024-03-28: One year after publishing this blog post, the LinkLion Networks AS issues a statement that they aren’t affiliated with LinkingLion besides announcing their IP addresses. On the same day, the LinkingLion activity significantly drops. I’ve written an update about it here.

My open-source work is currently funded by an OpenSats LTS grant. You can learn more about my funding and how to support my work on my funding page.

Creative Commons License Text and images on this page are licensed under the Creative Commons Attribution-ShareAlike 4.0 International License