Primitive language analysis Primitive language analysis is the process of matching specific phrases of text to an email. The very first spam filters were quite primitive and were not really filters at all. Before real spam filters emerged, these primitive tools used a very simple approach to language analysis by simply scanning email for known senders in the message headers, multiple Usenet cross-postings (messages sent to a large number of different forums), or for phrases that were indicative of spam such as “Call Now!” and “Free Trial!” Between 1994 and 1997, there wasn’t much else the world could do about spam. The technology to fight it was too limited.39908
In the early days of spam, before spammers learned many of the nasty tricks they use today , filters that cross-checked against lists of junk mail sender lists were somewhat effective only because it was possible to filter based on a known list of words and phrases which most of the world believed only existed in spam. Filtering based on a single word alone had a potential success rate of around 80 percent, with very little chance of catching legitimate messages.
Most of these early spam-catching tools were home-brewed solutions. Some tools, such as procmail (a utility for mail processing), were used to create recipes to filter for certain basic words and phrases in incoming email. As new spams were being distributed on a daily basis, the word and phrase lists required continued maintenance. Many inpiduals found themselves with better things to do with their time, and so a few commercial solutions became popular during this period. These commercial solutions were sold in conjunction with a subscription nightly-update service to automatically download and apply new filtering phrases.
This primitive approach to spam filtering became mainstream for a while. Early versions of some email clients such as Microsoft Outlook implemented simple junk mail sender lists that would check for known sender addresses in message headers. Although this approach had some weaknesses, it worked well enough up until it didn’t.
One major weakness of primitive language filtering is that a single guilty phrase, such as the words “toll free,” could condemn an entire legitimate message to a user’s junk mailbox. The simplicity of these spam filters eventually led to a high false-positive rate. Although some of these guilty phrases applied to a majority of spam, they also began to apply to a small collection of legitimate mail. As a result, a user had to check their bit-bucket frequently to make sure that no legitimate messages were caught. It also became clear that there were some gray areas to spam filtering: As the phrase “one man’s spam is another man’s ham” (which grew out of this period) suggests, some messages (like a bulk invitation to a conference) might be considered wanted mail by some users and spam by others.
One of the advantages of primitive filtering was that it was so simple to implement that users generally custom-tailored these filters to match their own email behavior. But this was also one of the biggest problems; it required a lot of maintenance in order to work properly. As spam began to grow, the lists of guilty words and phrases became longer and less manageable. The complexity of filtering spiked around 1997, when struggling spammers began to look for new ways to trick users into reading their emails. The new ways they found to change and obfuscate messages left most filters ineffective. Updating them took too much work, and primitive filters were effectively extinguished.
A blacklisted network may appeal its blacklisting after taking care of the problem (namely tossing the spammer off their network), at which point it may be removed from the list at the maintainer’s discretion (and sometimes the maintainer’s mood). Blacklists allowed people to filter spam based on its origin rather than its contents. Subscribers no longer needed to maintain their own content-based filtering lists (although many did) and could instead rely on the blackhole to tell them what mail to filter. 原始语言分析英文文献和中文翻译:http://www.youerw.com/fanyi/lunwen_40620.html