I have a problem…
I administrate roughly fifteen domains that send email on a regular basis. Outbound email is handled by two corporate (and one personal) email servers running Zimbra and Exchange, as well as a couple of mail exchangers that handle automated email from web servers.
I also don’t send spam. All automated emails include a clear unsubscribe link, which is a single-click mechanism resulting in an immediate blacklisting of the user’s email address. Automated emails also include the name and mailing address of the company from which they were sent, as per US federal law. Corporate and personal emails are used responsibly; In other words they are not used for blind solicitation nor for any other purposes that would be considered spam.
All email, automated or not, is sent with valid DKIM and SPF headers from static IP addresses registered with ARIN under an appropriate corporate entity. Reverse DNS matches the sender domain in some cases, but not in others as we have very small IP ranges.
Despite this correct and responsible use of email, customers, vendors and personal contacts still need to be sometimes told to “check your spam folder”.
Of course the big providers like Google, Microsoft and Yahoo! also do things in a technically correct fashion, but because of their large subscriber bases they receive special treatment by anti-spam filters. They also have technical teams that can chase down and correct IP and other blacklist issues.
Medium- and large-sized organizations can simply pay another company like Return Path to take care of these issues and improve their reputation with the makers of spam filters.
But what’s the little guy with a limited budget to do, and what to Bitcoins have to do with it?
A while back I was very interested in Bitcoins. I never really got into mining, and I haven’t done any significant transactions using the digital currency. However the algorithm of mining Bitcoins and the inherent security is simply elegant, despite its necessarily brute-force nature.
I imagine that such an approach could be used towards email authentication.
I’m not a cryptography expert (nor even nearly so), but I propose that there be a new score for email reputation that’s directly related to the computational power that went into sending the message. Perhaps, to borrow from Bitcoin mining, the score would be proportional to the value of the SHA256 hash that signed the message. In other words, the more computational work that went into finding the lowest possible hash value, the better the message would score as ham.
Needless to say, it would be prohibitively expensive (computationally and therefore fiscally) for spammers to send out millions of emails using such a mechanism, and that’s rather the point.
This approach would likewise be impractical for large organizations and legitimate bulk senders, but they can already afford their good reputations. So I’m not suggesting that we supplant all spam scoring with this approach, but merely that it become one more tool in the box, but one that specifically benefits smaller organizations and individuals with personal email servers.
The implementation should be straight forward, and should not be computationally taxing on the receiver of the message.
My first thought is to leverage DKIM. Imagine that an email contains the headers shown below, the first being a standard DKIM header (from my personal email), with the second a new header. I’ve put the initialism MSMR in the header name as a placeholder. It is short for: Message Signature Mining Reputation. Perhaps not the best name in the world, but it will do for now.
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=rosenber.gs; s=default; t=1348858150; bh=TUWh0I0EJ9E4V2T3OjQaYKWWoMdOzHN8R8m+SrSZG6M=; h=Date:From:To:Message-ID:Subject:MIME-Version:Content-Type; b=PCy4cbsWMeTWfcHO2Xk9wkXpvKp6vgXCfbm1dVoopDBOTnybRZL88PqrQ6LISkMDT 8tmS24ETegBqav81dGAF4YdkzE9DNj623JPB0/lsJsDCPXY8yWTSznG9B0txzs9wJN msqKfx5ImU/w5npzv5M28g88H6mYKRzloJlcHpFo= MSMR-Key: v=1; a=rsa-sha256; n=ba4f33 h=00000000000f15dbc2960b082dbf914e97198c8b12abc542a15f74a2a0fab1d4
In the MSMR-Key header, key/value pairs are used in the same way as in the DKIM-Signature header. v and a are likewise similar to those in DKIM to respectively indicate the version of MSMR and the hashing algorithm used.
The tag n holds the nonce value, an integer that is incremented for each attempt to generate a low-valued h, the hash of the DKIM-Signature’s b (base64-encoded signature) field appended to the nonce value.
As such, the receiving server could easily validate the MSMR header by hashing the nonce value + the DKIM signature (assuming the DKIM signature has included the To: header), and comparing that hash to the value of the h tag. If the hashes match, the MSMR header is valid and can be used for scoring the message.
(Of course, the entirety of the message body and the To: field could be hashed instead of the b tag in the DKIM signature. This could be an optional approach in the implementation. I chose to leverage the DKIM header because it’s computationally easier to deal with for the recipient than the body of a very large message, and because it enforces that DKIM be used to authenticate the source of the message).
Determining the spam/ham score from the hash value must be a relative process.
Computational power will continue to become less costly over time (from the perspective of hashes/sec/joule and therefore hashes/sec/monetary unit), and so it will become increasingly easier to generate ever lower value hashes on outbound messages.
Therefore the spam/ham score cannot be derived in direct proportion to the numeric value of the hash; A scaling factor must be used. That scaling factor must be updated to reflect the state of the art of computing at any given time.
There are three ways to do this, and I propose that all three be put into place. It should be up to the recipient email server’s spam filter to decide which source of data to use:
- A central authority decides the current scaling factor and makes it available publicly via API and on their website.
- Large ISPs and/or spam filter vendors determine an appropriate scaling factor using real-world metrics (users reporting spam and ham that was erroneously categorized as spam) and makes it available publicly via API and/or on their websites.
- Individual mail servers at small- and medium-sized organizations determine an appropriate scaling factor using their own data, in the same fashion as the large ISPs.
Ideally, no fee would be charged for either of the first two approaches, but the third is guaranteed to be usable without a fee (using open-source software).
Both senders and recipients would need to obtain the scaling factor so that the senders can meet recpients’ expectations of hash values.
To summarize:
- Small organizations and individuals can gain control over their email reputation with insignificant cost and without the intervention of for-profit businesses.
- A brute-force hashing implementation to establish reputation would be easy to implement. It would put the computational onus on the sender, but the recipient would require very little CPU time to validate the hash.
- Spammers would have a high, real-world monetary cost for sending bulk email, but low-volume senders would suffer a negligible increase in the cost of running their mail servers.
- As average computing power increases, so too can the difficulty of generating low-value hashes.
- The organization or individual sending mail would not need to affiliate themselves with any central authority, nor pay fees to any third party.
In my defense…
I believe, and have always believed, that email should be free (both of cost and censorship, but this is about cost). I vehemently oppose any government taxes or ISP fees levied on a per-message basis. I also oppose the cartels of “legitimate” email senders that bombard my inbox with promotional material that they insist I desire merely because I’ve used their services once or twice in the past.
Though my suggested system of scoring email reputation does, in effect, put into place a fairly definitive per-message cost, I do not believe that it amounts to taxation. This is because 1) it is not a requirement of sending email, 2) the per-message cost will be controllable by the administrator (to some extent), and 3) the messaging cost is not a direct function of arbitrary government or corporate decisions.
Based upon a paper napkin estimate using figures from Bitcoin mining efficiencies, my own infrastructure’s email volume, and power costs in my area, this system would cost me something on the order of US$2/month on the corporate side, and a matter of pennies/month for my personal email. This is far cheaper and easier than other solutions and/or fighting each and every recipient ISP personally.
I encourage comments, because this is only a draft of an idea I thought up this afternoon.