Existing users, log in.  New users, create a free account.  Lost password?

Mac OS X  |  Internet  |  Email  |  SpamSieve  |  How long to train?

SpamSieve

SpamSieve

Bayesian spam filter for most email clients.

Version:  2.7.7

   [ Views: 1000 ]

How long to train?

Feedback Type:  Commentary

Contributed by: Felix01 Wednesday, November 01 2006 @ 01:01 PM PST

Product Platform: MacOSX

Used Product For: Less than a month

Recommend Product: YES

How long does it take to get SpamSieve "trained" so it will approach the 99+% spam ID rates you users are citing? I downloaded v2.5 the day it came out so I'm still in the trial period and don't (yet) have much experience with it. But it's only correctly identifying ~50% of my daily spam.

Unfortunately (?), I didn't have a bunch of spam messages to train it on since I typically deleted them every day in batches of 10 - 15.

But since starting to use SpamSieve, I've trained it on every spam received. Still, many, many are still ending up in my inbox versus being correctly identified and sent to the Spam folder.

I would have thought the preloaded words in the Corpus would have allowed SpamSieve to do better than the ~50% I'm seeing. For example, a search in the initial Corpus didn't show "viagra" or the many predictive spelling variations. Having those words on the list would seem to have been a no brainer but maybe I don't understand exactly how the SpamSieve Bayesian classifier works.

Quite frankly, in my limited experience with SpamSieve, the free JunkMatcher does a much better job initially.

I guess only time will tell whether SpamSieve turns out to be better with aggressive training.   

3 of 3 users found this helpful.

Rate this Commentary

Was this Commentary helpful? Yes | No

Comments

4 comments |

How long to train? - michaeltsai

It generally takes about 500-1000 messages in the corpus to reach 99+% accuracy, but it should get to 90-95% with only about 200 messages. Since you're only getting 50% accuracy, I'm guessing that something might not be set up properly. Please contact spamsieve@c-command.com if you're still having problems, and I'll help you check the setup.

I used to include some seed spam in the corpus, but it turned out that this decreased the accuracy, so I removed it. Plus, with some changes to make it learn more quickly, most everyone should have enough spam to start out with decent accuracy.

Reply to This

Wednesday, November 01 2006 @ 07:29 PM PST


How long to train? - Felix01

Thanks for the reply, Michael. I went through your help files (moved a spam message back to the inbox and then clicked on "Apply Rules") and determined that it was set up properly.

But I think I figured out the problem. I initially only had about a dozen spam messages for training since I delete them several times/day. But I had several hundred "good" messages which I had used for training.

Apparently, all those "good" messages overwhelmed the corpus and it kept sending spam to my inbox.

So I "Reset Corpus" and started over...this time with the correct percentage of "good" messages and spam specified in your help files.

And today I haven't had a single spam slip in to my inbox.

Looks like this could be a winner and I'll soon be a paying customer.

As 'they' say, RTFM...Read The F***ing Manual!!

Reply to This

Thursday, November 02 2006 @ 06:12 AM PST


Follow-up report - Felix01

I just wanted to post a follow-up. After experiencing the problems detailed in my first post (failure to correctly ID ~50% of my spam), I used the "Reset Corpus" feature and retrained SpamSieve with the correct percentage (35%-65%) of current 'good' and 'spam' messages recommended in the developer's Help files.

I'm now pleased to report that after 10 days of SpamSieve usage, according to the SpamSieve statistics window, I've had a grand total of one false positive message ('good' message incorrectly identified as spam) and no false negatives.

It just doesn't get much better than that, folks!!

And that's on multiple e-mail accounts using Apple's Mail client.

My daily message tally is usually in the 335 range, of which SpamSieve reports 54% are spam, so SpamSieve is certainly getting a good trial use by an active user.

I can now highly recommended this software and can assure potential users that the dozens and dozens of five-star ratings are certainly warranted.

BTW, that one false positive was from Apple...an iTunes message filled with graphics. As you Apple Mail users know, Mail has a Rule which is installed by default and color codes all Apple 'propaganda' and then stops evaluating any additional Rules. This particular Apple mailing address (discover@newmusic.itunes.com) is apparently new and wasn't on their pre-set Rule list; thus, continued on to be evaluated by SpamSieve and was tagged as spam.

I've now added that Apple mailing address to the "News From Apple" Rule list. Alternatively, I could have used SpamSieve to train it as a 'good' message but figured it's better to correctly identify Apple mailings as 'good' higher up in the rule-checking hierarchy for expediency sake.

Reply to This

Friday, November 10 2006 @ 01:42 AM PST


How long to train? - Tortoise

Try going to the developers site and post your experience on his forum, Michael monitors the site and replies promptly with help for people experiencing trouble with the application. He will assist you in sorting out why it is not working as designed. I have used it for several years now and it was losing accuracy gradually, I posted on his forum and he replied the same day and assisted me over two days getting things sorted out and it is once again nearly 100% accurate and working like a charm.

Reply to This

Wednesday, November 29 2006 @ 03:49 PM PST