The Bayesian auto-filter build uses four email folders for building. They are the two primary folders, labeled "Good Emails" and "Bad Emails", for good and bad emails respectively, and the two secondary email folders, "Store Good" and "Store Bad", also for good and bad emails respectively.
You pick and choose new emails from the "New Emails" folder after they are downloaded from your server, and place the ones you consider bad in the "Store Bad" folder, and the ones you consider good in the "Store Good" folder. In other words, you move selected emails into the secondary folders. The Bayesian filter "trains" based on your selections. In effect, it "learns" how to distinguish good emails from bad this way.
The primary folders are "primary", because these are the emails used to directly build the Bayesian filter. The "secondary" folders are used by the auto-filter builder as a repository, so that it can add emails to the primary folders as needed, in order to improve the accuracy of the filter. You can't directly place emails in the primary folders, although you can move emails out of those folders.
Initially, you will supply about 20 emails into each of the secondary folders. The auto-builder will move any it needs into the primary folders during builds.
Your only task aside from pressing the "build now" button, or scheduling builds, is to supply good and bad emails to the folders.
You run the Auto-Filter Builder panel from the menu selection "Filtering/Schedule Auto-Filter Build", or you can press the toolbar button labeled "S". There you can schedule a build for each day of the week, at any time you specify. You can also press the "Build Now" button, and upon exiting the panel by pressing "OK", the auto-filter builder will run.
The more times the filter builder builds, the more accurate the filter will become, up to a certain limit (over 99%). For instance, if you have about 900 total emails in the primary and secondary folders, the filter will be able to use the same data to make a more and more accurate filter the more it builds. In other words, the first built will not be as accurate as the 10th build. The data will eventually become exhausted, and the filter will reach a peak accuracy level.
Theoretically, the more emails you supply, the more accurate the filter will be. However, as a practical limit, only 600 emails per folder are allowed, and no more than 300 or 400 per folder should be used as a maximum.
The Bayesian filter really works based on pattern matching. The data you've received up to day will have a certain pattern. The Bayesian filter finds an optimal way to match the pattern for good emails and bad emails according to a set of criterion. It is assumed that future emails will match that basic criterion, and that the Bayesian filter will be able to filter accordingly. This method is extremely effective. The one thing that is true is that the pattern will change over time, so that the filter will need to be trained regularly. The first two weeks you'll want to build daily and supply new emails daily. After that you can build once a week, and supply emails as they come. This is a rule of thumb, because it all depends upon the data you receive.
To train the filter to recognize good emails from bad emails (spam) by your standards, you do the following:
Any email identified as spam by the filter will have its subject line prefixed with the Spam Flag Word "*SPAMSQUASHED!*" (you can change this word in the registered version, see "Set Up"). In your email application you can filter for this word and place any such emails in the appropriate folder, e.g. a folder titled "Spam". It's advisable not to delete these emails right away, because there will be false positives and false negatives from time to time
This is a spam email that is designated as a non-spam email.
This is a non-spam email that is designated as a spam email.
These are the two folders labeled "Good Emails" and "Bad Emails/Spam". These are the emails that SpamSquash directly uses to build the Bayesian filter.
The "Bayes Auto-Filter Builder" automatically moves emails into these folders from the secondary folders. You can't manually move emails into them, but you can move emails out.
These are the two folders labeled "Store Good" and "Store Bad". To train the Bayesian filter, you move emails into these folders. Good emails are moved into the "Sore Good" and bad emails into the "Store Bad" folders, respectively.
Once the emails are in the "Store Good" and "Store Bad" folders (see "How To Train" above), you can build the filter. Make sure there are at least 10 emails in the "Store Good" and "Store Bad" folders, although 20 or more is best at the outset. To build, from the toolbar select the "S" button, or from the menu select the item "Filtering/Bayes Auto-Filter Build Panel". This will launch the "Bayes Auto-Filter Build Panel". You can schedule a build for each day of the week at a specified time, or you can build now, by pressing the "Build Now" button.
To Build Now, press the build now button, then exit the "Bayes Auto-Filter Build Panel" by pressing "OK". The build will start. It may take several minutes, depending upon how many emails it has to process.
To schedule a build, select any days you want the build to occur, and select a time for the builds. Press "OK" so that the changes are accepted. The build then will occur on the scheduled times, so long as SpamSquash is running.
The more times the filter is built, and the more emails it has to work with, the more accurate the filter will become. You can get up to 600 emails per folder, but you should keep the number at no more than 400 or so, for faster filter builds. Note, this refers to the "Store Good" and "Store Bad" folders. It is vital that you not have too few emails in those folders. If you keep it at around 300, then you should end up with a more effective filer. Also, the "Bayes Auto-Filter Builder" will move emails from the secondary into the primary folders.
The accuracy of the Bayesian filter will be displayed in the "Bayes Auto-Filter Build Panel". It will give you the accuracy in percentage, along with the number of emails used to generate this accuracy. The number of emails is simply the sum of the emails in the primary and secondary folders. The accuracy is what percentage of those emails the filter correctly evaluated as spam or non-spam. The more emails used for training, the more predictive this percentage will be of any future emails the Bayesian filter evaluates.
Note, this accuracy level applies to the Bayesian filter only. The DNSBL's accuracy is separate from this.
You want to guard against accidentally placing a good email in a bad folder, and a bad email in a good folder. If there are any such emails in any of the primary or secondary folders, they should be manually removed to improve the filter's performance.
After you have your filter humming along (usually when the sum of the emails in the "Good Emails" and "Bad Emails/Spam" folders is about 120, and the "Store Good" and "Store Bad" have about 200 or more each), you will have to do negligible training. You can feed the filter the occasional false positive and false negative as they are downloaded from your POP3 server, and rely on scheduled auto-filter builds to train your filter as needed.
Make sure that you train with a good cross section of the emails you receive. The filter needs to be "aware" of the different kinds of emails it will filter.
Early on in training, it will make lots of mistakes, i.e. falsely identify good emails as spam, and bad emails as non-spam. See these mistakes as feedback to you and train accordingly. Feed the filter the missed emails so that it learns from the mistakes. In other words, if it falsely tags a good email as spam, place that email into the good folder and rebuild the filter. If it falsely tags a spam email as good, place that email into the bad folder and rebuild the filter.
The filter will make fewer mistakes as you train it. The performance should be good when there are around 60 emails in each of the primary folders, and around 100 in each of the secondary folders. This is a rough estimate. It will get better as you continue to train it incrementally. It should get well over 99% of spam with sufficient training.
Make sure that both the good and bad email folders are supplied with emails. The filter needs to know what is good and what is bad according to your standards.
Once you have your filter humming along smoothly, there are still likely to be new kinds of emails sent your way, both good and bad, which you will have to train the filter to recognize. So you will need to feed emails to the filter on occasions when SpamSquash misses. This makes the filter dynamic, so that it can adjust to new input.
You can identify how the Bayesian filter evaluates a particular email, by clicking on it, or by running "Auto-Test Folder" on a folder. See "Testing and Rating Emails" and "Auto-Test Folder". This can also help you determine what emails should be used to train the filters, i.e. the ones that are rated wrong by the filter.
To test the current rating of an email, make sure that the filter is turned on (see "Turn Filter On/Off"), then simply select the email you want to test. In the application, where the list of email headers is displayed, for the currently loaded folder, under the heading "Spam" will be a "Yes" or a "No". "Yes" means the email is currently being identified by the filter as spam. If it reads "Off" this means the filter is off. Under the heading "Rating" you will see the current spam rating of the email. This will range from 0.0 to 1.0. The value 0.0 is the highest "good" rating an email can have. The value 1.0 is the highest "Spam" rating an email can have. Any value over 0.8 is considered spam (the filter is skewed toward good emails to reduce false positives). 0.5 is considered to be a neutral value.
You can use this tool to quickly see how an email is rated. By doing this you can also see how far from spam an email is rated, and train the filter accordingly. Or, if you see that an email is just barely rated as spam (close to a 0.8 rating), then you can reinforce its spam value by feeding it to the filter to give it a stronger spam rating. The same applies to good emails that you want to give a stronger good rating to.
You can also auto test an entire folder by selecting the menu item "Filtering/Auto Test Folder". This will test the entire folder automatically. Or, if you have selected two or more emails, it will automatically test those emails. Note: If you run "Auto Test Folder" on a folder with lots of emails, it might take a few minutes to test them all. After auto testing, you can click the "Spam" header column, or "Rating" header column to order the email header entries by value. This will give you a sense of which emails are currently rated as spam and which are not and by how much.
The emails in the primary folders are used to directly build the filter. You can retrain the filter by moving them out. You can move all emails out of both the primary folders and secondary folders and start from scratch. See "How To Train" for training the filter. Or, if you want, move out the older emails, from both the primary and secondary folders, since they are the ones more likely to no longer be pertinent to the filter over the long haul. This is a judgment call.
Notice that the emails are in "raw" form, which is to say you can see the full header info before the actual email body, and anything in the body which may be, say, a binary image (jpeg, gif, wav, etc) is seen in its non-decoded format. In many emails you may see MIME formatting, which is present if the header contains the "Mime-Version" entry, and in the body you'll see headings such as "Content-Type", much of which are not visible in your usual email program. So, you need to read the header, and readable text in the body to get an idea of what the email is about. It should be pretty straight forward in most cases to decipher if email is spam or not.
Only the first 5K bytes of the email are used for filtering, which is all that is stored in SpamSquash. When viewing an e-mail's body in SpamSquash, you are looking at its first 5K bytes.
Think of the emails you placed in the primary and secondary folders as the foundation of your filter. This means that you should not delete any that you want to have as part of your filter.
A note about the email headers. You can order, or sort, the email headers by their headings "From", "Subject", "Date/Time", "Spam" and "Rating". Simply click on the title bar of each column. Also, when you first click on a folder to load email headers in, the emails are organized in the order in which they were downloaded, with the most recent at the top of the list. To refresh a list in this order, click on another folder, then the current one again.
You can select multiple emails in a folder in two ways.
One way is block select. Select a contiguous block of emails by left clicking on the first email you want to select, and then hold down the "Shift" key and left mouse click on the last email you want to select. This will select a block of emails.
The other way is random select. To select multiple emails throughout the email list, use the "Ctrl" key with the left mouse button each time you select an email.
Note, you can only move or delete up to 50 emails at a time.
To move selected emails, press the right mouse button (while the mouse cursor is over the email list), and a context menu appears with selections indicating to which folder you want to move the emails. Accelerator keys also allow you to do it without using the menu:
"F2" moves them to the "Store Good" folder. "F3" moves them to the "Store Bad" folder. "F4" moves them to the "New Emails" folder. "Del" moves them to the "Trash Bin", or, if you press "Del" from the "Trash Bin", the email is deleted completely from the database.
You can also move selected emails by using drag and drop. Select the emails you want to move, and then with the mouse, press and hold the left mouse button down, and move the emails into the appropriate folder, by hovering the mouse icon over it and releasing the left mouse button.
Note, you can't move emails directly into the primary folders. You can move emails out of these folders.
To delete emails, you first must move them to the "Trash Bin" folder. When in the "Trash Bin" folder, upon deleting them they are removed from the database completely.
You delete selected emails by pressing the right mouse button and selecting the "Move To Trash/Del" selection. Or, by pressing the "Del" button, or by using the drag & drop feature. To drag & drop, select the email(s) you want to move, and press and hold the left mouse button, and drag the email over the "Trash Bin" and drop it.
To select a folder click on the folder name. In the status bar at the bottom of the screen you will see the currently selected folder, e.g. "NEW FOLDER", "GOOD FOLDER", "SPAM FOLDER", "STORE GOOD", "STORE BAD", or "TRASH FOLDER".
To turn the filter on, go to the menu selection "Filtering/Bayes Auto-Filter Build Panel", and press the "Filter Off/On" Button. Make sure it's in the "On" state. You can also see this in the status bar, at the lower right corner of the application. You can only turn the filter on if it has been created.
The maximum size of each folder is set at 600 emails.
If the new folder is full, and new emails are downloaded loaded into the new folder, the older emails are moved to the trash folder. If the trash folder is full, the oldest ones in that folder are deleted from the database, i.e. completely erased. The age of an email is measured by the time it was downloaded into SpamSquash, not by the date stamp on the email itself. See "Ordering Email Headers" for order of download.
to Contents