| Mail::Classifier::Trivial - a trivial subclass example |
Mail::Classifier::Trivial - a trivial subclass example
use Mail::Classifier::Trivial;
$bb = Mail::Classifier::Trivial->new();
$bb->train( { SPAM => 'spam.mbox', NOTSPAM => 'notspam.mbox'} );
%xval = $bb->crossval(2, .8, {SPAM => 'spam.mbox', NOTSPAM => 'notspam.mbox'} );
Mail::Classifier::Trivial is a trivial subclass implementation.
This class demonstrates an example of subclassing Mail::Classifier to actually classify mail. It provides crude random categorization based on training set category frequencies.
* new
* init
* forget
* isvalid
* parse
* learn
* score
Create a new classifier object, setting any class options by passing a hash-reference to key/value pairs. Alternatively, can be called with a filename from a previous saved classifier, or another classifier object, in which case the classifier will be cloned, duplicating all data and datafiles.
$bb = Mail::Classifier->new();
$bb = Mail::Classifier->new( { OPTION1 => 'foo', OPTION2 => 'bar' } );
$bb = Mail::Classifier->new( "/tmp/saved-classifier" );
$cc = Mail::Classifier->new( $bb );
This subclass method has no additional options and only adds a data
table to use for frequency counting. Though it doesn't really need to,
this subclass uses an MLDBM::Sync file.
=cut
sub new {
my $class = shift;
$class = ref($class) || $class;
my $self = $class->SUPER::new( @_ );
# create the data structure for this subclass to use
return $self;
}
Called during new to initialize the class with data tables.
$self->init( {%options} );
Blanks out the frequency data.
$bb->forget;
=cut
sub forget { my $self = shift; $self->{categories} = {}; }
Confirm that a message can be handled -- e.g. text vs attachment, etc. MESSAGE is a Mail::Message object. In this subclass version, all messages are still valid.
$bb->isvalid($msg);
Breaks up a message into tokens -- this is just a stub for where/how class extensions should place parsing. In this subclass, no parsing takes place and the function is still a stub.
$bb->parse($msg);
learn processes a message as an example of a category according to some algorithm. MESSAGE is a Mail::Message.
unlearn reverses the process, for example to "unlearn" a message that has been falsely classified.
In this subclass, these functions only updates a frequency count of messages by category.
$bb->learn('SPAM', $msg);
$bb->unlearn('SPAM', $msg);
Takes a message and returns a list of categories and probabilities in decending order. MESSAGE is a Mail::Message
In this subclasses returns a single category randomly.
($best-cat, $best-cat-prob, @rest) = $bb->score($msg);
%probs = $bb->score($msg);
=cut
sub score { my ($self, $msg) = @_; my $n = 0;
$self->ReadLock('categories');
my ($key, $val);
while ( ($key, $val) = each %{$self->{categories}} ) {
$n += $val;
}
if ( $n == 0 ) { return ('UNK',1) } # if there's no examples, give up
my $random = int(rand($n)) + 1;
my $i = 0;
while ( ($key, $val) = each %{$self->{categories}} ) {
$i += $val;
last if $random <= $i;
}
$self->UnLock('categories');
return ($key,1);
}
1; __END__
######## END OF CODE #####################
MLDBM
MLDBM::Sync
Mail::Box::Manager
Mail::Address
There are always bugs...
Mail::Classifier
David Golden, <david@hyperbolic.net>
Copyright 2002 by David Golden
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
| Mail::Classifier::Trivial - a trivial subclass example |