Mail::Classifier::Trivial - a trivial subclass example


NAME

Mail::Classifier::Trivial - a trivial subclass example


SYNOPSIS

    use Mail::Classifier::Trivial;
    $bb = Mail::Classifier::Trivial->new();
    $bb->train( { SPAM => 'spam.mbox', NOTSPAM => 'notspam.mbox'} );
    %xval = $bb->crossval(2, .8, {SPAM => 'spam.mbox', NOTSPAM => 'notspam.mbox'} );


ABSTRACT

Mail::Classifier::Trivial is a trivial subclass implementation.


DESCRIPTION

This class demonstrates an example of subclassing Mail::Classifier to actually classify mail. It provides crude random categorization based on training set category frequencies.


METHODS THAT ARE EXTENDED IN THIS SUBCLASS

    * new 
    * init
    * forget
    * isvalid
    * parse
    * learn
    * score
new [OPTIONS|FILENAME|CLASSIFIER]

Create a new classifier object, setting any class options by passing a hash-reference to key/value pairs. Alternatively, can be called with a filename from a previous saved classifier, or another classifier object, in which case the classifier will be cloned, duplicating all data and datafiles.

    $bb = Mail::Classifier->new();
    $bb = Mail::Classifier->new( { OPTION1 => 'foo', OPTION2 => 'bar' } );
    $bb = Mail::Classifier->new( "/tmp/saved-classifier" );
    $cc = Mail::Classifier->new( $bb );
    
    This subclass method has no additional options and only adds a data
    table to use for frequency counting.  Though it doesn't really need to,
    this subclass uses an MLDBM::Sync file.
     
    =cut
    

sub new { my $class = shift; $class = ref($class) || $class; my $self = $class->SUPER::new( @_ ); # create the data structure for this subclass to use return $self; }

init

Called during new to initialize the class with data tables.

    $self->init( {%options} );
forget

Blanks out the frequency data.

    $bb->forget;
    
    =cut
    

sub forget { my $self = shift; $self->{categories} = {}; }

isvalid MESSAGE

Confirm that a message can be handled -- e.g. text vs attachment, etc.  MESSAGE
is a Mail::Message object.  In this subclass version, all messages are still valid.
    $bb->isvalid($msg);
parse MESSAGE

Breaks up a message into tokens -- this is just a stub for where/how class extensions should place parsing. In this subclass, no parsing takes place and the function is still a stub.

    $bb->parse($msg);
learn CATEGORY, MESSAGE
unlearn CATEGORY, MESSAGE

learn processes a message as an example of a category according to some algorithm. MESSAGE is a Mail::Message.

unlearn reverses the process, for example to "unlearn" a message that has been falsely classified.

In this subclass, these functions only updates a frequency count of messages by category.

    $bb->learn('SPAM', $msg);
    $bb->unlearn('SPAM', $msg);
score MESSAGE

Takes a message and returns a list of categories and probabilities in decending order. MESSAGE is a Mail::Message

In this subclasses returns a single category randomly.

    ($best-cat, $best-cat-prob, @rest) = $bb->score($msg);
    %probs = $bb->score($msg);
    
    =cut
    

sub score { my ($self, $msg) = @_; my $n = 0;

    $self->ReadLock('categories');
    my ($key, $val);
    while ( ($key, $val) = each %{$self->{categories}} ) {
        $n += $val;
    }
    if ( $n == 0 ) { return ('UNK',1) } # if there's no examples, give up
    my $random = int(rand($n)) + 1;
    my $i = 0;
    while ( ($key, $val) = each %{$self->{categories}} ) {
        $i += $val;
        last if $random <= $i;
    }
    $self->UnLock('categories');
    return ($key,1);
    }

1; __END__

######## END OF CODE #####################


PREREQUISITES

    MLDBM
    MLDBM::Sync
    Mail::Box::Manager
    Mail::Address


BUGS

There are always bugs...


SEE ALSO

Mail::Classifier


AUTHOR

David Golden, <david@hyperbolic.net>


COPYRIGHT AND LICENSE

Copyright 2002 by David Golden

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

 Mail::Classifier::Trivial - a trivial subclass example