The growing problem of unsolicited bulk e-mail, also known as "spam" has generated a need for reliable anti-spam filters. This paper presents the idea and implementation details of a highly effective and reliable e-mail filtering technique. Three-component architecture for the classification and filtering of unsolicited bulk and commercial e-mail is introduced. At the core of architecture there is a novel combination of an enhanced self-learning variant of greylisting with reputation-based trust mechanism. The first greylisting component sets the stage for the following feature extraction and classification components. Through the temporary rejection of selected messages by the greylisting component time becomes available for an "offline" in-depth examination of the e-mail content before the message is accepted and delivered to the final recipient. Within the feature extraction component a set of features for each newly arriving e-mail message is determined. These features are then used for the categorization of a message within the classification engine. The approach presented features a very high spam-blocking rate and also minimizes the workload on the client side. The reputation -- based trust mechanism decreases the delay in the transfer process of email messages from reliable senders and also reduces the number of erroneously blocked legimate messages.
Ronald BhuleskarAnoop SherlekarAnala Pandit