The Problem
ISPs can, do, and will collect information on the web browsing habits (& etc.) of their users. They may package and sell this data, at best in an anonymized state, and ostensibly for the “benign” purpose of marketing and advertising to targeted demographics.
Such data may, of course, be used for nefarious purposes.
The law almost always fails to keep pace with technological advancement, and even more rarely serves the public interest when dealing with issues surrounding the internet and telecommunications. Politicians, regardless of nationality, are beholden to the interests of the state, secret corruption, or public corruption in the form of lobbies and campaign contributions.
As such, we, the internet community, cannot rely on political protection for our private data, nor data generated from our personal usage habits. We also cannot rely on the laughable concept of “good corporate citizenship” when it comes to these matters.
While VPNs, secure proxies, and other methods can be used to circumvent the legally sanctioned spying on users by their ISPs, related corporations, and governments, the bottom line for all of them is monetary gain. This article proposes to deprive them of that, at least when it comes to selling [out] the browsing habits of users.
The Goal
To devalue the product that ISPs are selling. To bury users’ actual habits and preferences in a morass of irrelevant and misleading data. To obfuscate and therefore render unreadable the true browsing habits of end users.
The Method
Botnets have a bad name in the IT community. They are traditionally used to overload servers (DDOS), probe services for vulnerabilities, send spam, and break cryptographic algorithms. It is time to make botnets work for the internet community.
The following proposal of a botnet in no way involves malware.
To be clear: In this context the installation of a “bot” on an end user’s computer would be completely voluntary of and transparent to the user. The bot software should be open source and therefore peer reviewed. It should include a user interface for control of activities and granular alteration of settings. It should include a clear and readable presentation of status, as well as past, current, and future (as fa as known) activities. The end user should have full control over and understanding of the bot that resides on their host.
These bot nodes would have the primary function of generating interactions with arbitrary servers on the internet (primarily with web servers via HTTP[S] thereby generating DNS queries) with those interactions simulating the habits and workflow of a “normal” user.
All such interactions should be benign in nature, in that they should not post or otherwise alter content on the remote hosts involved. They should also — by default and at the end user’s option — avoid adult content and other contentious material.
Further, requests made by these bots should be distributed, in the exact opposite context of the first D in DDOS. Their actions should be coordinated so as to purposefully avoid overloading any particular service or servers.
Command and Control
This proposition is for a botnet with a unified command and control scheme — as opposed to software agents acting independently — for the following reasons:
- Agility. The bots should be able to change tactics rapidly in order to combat detection by the ISPs that they are attempting to fool with false data.
- Good “netizens”. To dredge up an old term, the botnet should be a good netizen in order to avoid being a nuisance (as previously discussed). This requires coordination of all nodes.
- Blacklisting. Web site (& etc.) operators should be able to opt-out of inclusion in botnet traffic. Additionally, services dealing with unacceptable or illegal subjects such as white supremacy, organization of terrorism and child pornography should be globally avoided. (Or perhaps DDOSed into oblivion. :> )
- Request leveling. The botnet should avoid, whenever possible, skewing the statistics and related data of websites. By way of example, the botnet should be coordinated so as to avoid increasing the apparent popularity of a particular product on Amazon or a particular article on Reddit. Such behavior would be a disservice to users of those sites.
The control structure of the botnet should itself be distributed, utilizing multiple servers at multiple service providers. IP addresses of command and control (C&C) nodes should change regularly to avoid detection.
Note that this proposal purposefully avoids the use of a peer-to-peer architecture for C&C. It’s my personal opinion that this would leave it open to corruption by unscrupulous operators. It may be possible to design a scheme that would ensure the integrity of the system as a whole in a P2P architecture, but that’s beyond the scope of this article. I am open to the idea, though!
Of course, all traffic between nodes in the botnet would be encrypted. The method of encapsulation should be implemented in various ways and then rotated to avoid detection by users’ ISPs. For example:
- HTTPS, encapsulating a further-encrypted and signed payload. This secondary level of encryption would avoid the need for the C&C infrastructure to use consistent and/or authoritative web server certificates. It would implement it’s own keysets and internal “certification authorities” for the secondary encryption (or something similar, at any rate).
- Plain ol’ HTTP, with a steganographically-embedded encrypted payload. In other words, C&C messaging could be hidden in apparently banal image or video requests/responses (GET/POST, respectively).
- Apparent SMTP sessions, ostensibly sending and receiving mail which would actually contain encrypted C&C messaging. (This would require a custom implementation; I am not suggesting using Postfix or somesuch for this.)
- Apparent FTP sessions, etcetera as above.
The structure of the C&C request/response data should avoid consistency whenever possible. For example:
- Payload size should vary in requests and responses. This could be accomplished by simply padding payloads with a “random” amount of data.
- Frequency of requests should be varied. C&C nodes should give directives to bot nodes as to when and how to make their next request. As an added benefit, this would allow specific C&C nodes to avoid congestion.
- C&C nodes should be contacted by bot nodes in a randomized fashion. As discussed above, but including the specification of a hostname or IP address in the next-request directive. This would provide the added benefit of balancing traffic across C&C nodes.
Obviously there are many more potential strategies to obscure the operation of the botnet. Critically, and regardless of final design, the software for the nodes should be extensible such that new approaches could be implemented and disseminated rapidly and with minimal end-user impact.
Organizational Structure
Software, development, and C&C facilities of this “guerrilla botnet” should be managed by a central organization, in order to meet its goals in a unified and holistic fashion.
However, the managing organization should, with a clear mandate, be a non-profit entity which is beholden primarily to the needs of the community of internet users for whom the botnet would provide benefit. The structure of the organization need not be formalized by law (in other words, it would not have to be a duly incorporated company), but should be formalized by a community-policed set of rules.
Won’t it be Obvious?
By publishing this information and developing such software in public light, ISPs would of course be aware of the operation and methods of this botnet. They would be likely to ban its use.
Even if that were to come to pass, it would be a victory (of sorts).
ISPs such as Comcast, AT&T, Time Warner and Verizon move slowly and with great inertia. It’s likely that the guerrilla botnet would have to gain quite a bit of traction before garnering their attention. Even after their interest in piqued, it would take significant resources and time to track and disable bot nodes, if that is even possible. The bots, after all, would just appear to be “normal” internet users.
And they will be loathe, were the botnet sufficiently large, to outright rescind subscriptions. It will force a choice, either which hurts their bottom line: Kill the botnet by kicking paying customers off their network, or cease selling users’ data.
All the while they are playing catch up, the botnet will be working to corrupt the usability of the data. The consumers of this data (marketing and advertising entities, for example) will ultimately come to the conclusion that the product is worthless, or at the very least worth less. The hope is that this will have the effect of either hurting the ISPs’ bottom lines, or cause them to scrap the idea of user-data collection entirely.
Even if the guerrilla botnet were not to succeed outright in those goals, its very existence may be enough to taint the product that the ISPs are pushing. It will call into question the validity of the data.
That perception in of itself may render the data worthless.