| Bio::Grep::Backend::BackendI - Superclass for all back-ends |
Bio::Grep::Backend::BackendI - Superclass for all back-ends
Bio::Grep::Backend::BackendI is the superclass for all back-ends. Don't use this class directly.
See the Bio::Grep::Root manpage for inherited methods.
new()
This method constructs a the Bio::Grep::Backend::BackendI manpage object and is never used directly. Rather, all other back-ends in this package inherit the methods of this interface and call its constructor internally.
$sbe->next_res
Returns next result as a the Bio::Grep::SearchResult manpage object after search() was
called.
$sbe->search();
while ( my $res = $sbe->next_res ) {
# output result
}
=item C<$sbe-E<gt>settings()>
Get the settings. This is a the Bio::Grep::SearchSettings manpage object
# search for the reverse complement and allow 4 mismatches $sbe->settings->database('ATH1.cdna'); $sbe->settings->query('UGAACAGAAAGCUCAUGAGCC'); $sbe->settings->reverse_complement(1); $sbe->settings->mismatches(4);
$sbe->features()
Get available features. This is a hash. Valid features are MISMATCHES, GUMISMATCHES, EDITDISTANCE, INSERTIONS, DELETIONS, FILTERS, NATIVE_ALIGNMENTS, PROTEINS, UPSTREAM, DOWNSTREAM, MAXHITS, COMPLETE, QUERY, QUERY_FILE, QUERY_LENGTH, DIRECT_AND_REV_COM, SHOWDESC, QSPEEDUP, HXDROP, EXDROP, EVALUE and PERCENT_IDENTITY.
if (defined($sbe->features->{GUMISMATCHES})) { # $sbe->settings->gumismatches(0); $sbe->settings->gumismatches(0.5); } else { print "\nBack-end does not support wobble pairs\n"; }
$sbe->get_alphabet_of_database($db)
Returns 'dna' if the specified database is a DNA database, 'protein' otherwise.
Every back-end must implement these methods.
$sbe->search
This method starts the back-end with the settings specified in the
the Bio::Grep::SearchSettings manpage object $sbe->settings.
$sbe->search();
This method also accepts an hash reference with settings. In this case, all previous defined options except all paths and the database are set to their default values.
$sbe->search({ mismatches => 2, reverse_complement => 0, query => $query });
$sbe->generate_database({ file => $file })
Creates a symlink to the specified file in the datapath directory
($sbe->settings->datapath) and generates a database. You have to
do this only once for every file. Returns 1 if database generation was
successful.
Optional arguments in the hash reference are:
format
The format of file. Default is Fasta. See the documentation of your
back-end and the Bio::SeqIO manpage for supported formats. Only Fasta is thoroughly
tested.
description
The description of the database. Later, you can access these descriptions with
get_databases(). Default is the filename of file.
copy
Instead of adding a symlink to file in the data directory, copy the
file. Useful for platforms that don't support symbolic links. Default 0
(create a symlink).
datapath
$sbe->settings->datapath is called with this value.
prefix_length
Vmatch option: prefix length for bucket sort. If not defined, then
mkvtree automatically determines a reasonable prefix length.
verbose
Generate the database more verbosely. Default is 0.
Example:
$sbe->generate_database({ file => 'ATH1.cdna', format => 'Fasta', description => 'AGI Transcripts', copy => 1, datapath => 'data', verbose => 1, });
$sbe->get_databases
Returns a hash with all available databases. The keys are the filenames, the values are descriptions (or the filename if no description is available).
my %local_dbs_description = $sbe->get_databases(); my @local_dbs = sort keys %local_dbs_description; # take first available database $sbe->settings->database($local_dbs[0]);
$sbe->get_sequences
This method returns all sequences with the ids in the specified array reference as a the Bio::SeqIO manpage object.
my $seqio = $sbe->get_sequences([$id]); my $string; my $stringio = IO::String->new($string); my $out = Bio::SeqIO->new('-fh' => $stringio, '-format' => 'fasta');
while ( my $seq = $seqio->next_seq() ) { # write the sequences in a string $out->write_seq($seq); } print $string;
$sbe->available_sort_modes()
Returns a hash with the available result sort modes. Keys are the modes you
can set with $sbe->settings->sort($mode), values a short
description.
Only back-ends should call them directly. These internal methods are documented for authors of new back-ends.
_check_search_settings
Performs some basic error checking. Important security checks, because
we use system(). So we should check, if we get what we assume.
Because every back-end should call this method at the top of its search method, we clean things like old search results here up.
_prepare_query
Another important method that every back-end must call.
Prepares the query, for example calculating the reverse complement if
necessary, returns the prepared query. settings->query is unchanged!
_prepare_generate_database(@args)
The method generate_database() should call this internal method as first step.
It checks if the first argument is a valid hash reference (see
generate_database()), sets default values for undefined keys and returns this
modifed argument hash. Creates a symlink of the specified file (or copies this
file) in the data directory. Generates a .nfo file with the description
of the database.
An implementation of generate_database should look like this:
sub generate_database {
my ( $self, @args ) = @_;
my %args = $self->_prepare_generate_database(@args);
if (defined $args{skip}) {
return 0;
}
# create the back-end specific indices
...
$self->_create_index_and_alphabet_file( $args{filename} );
return $args{filename};
}
_get_alignment( $seq_query, $seq_subject )
Calculates and returns an alignment of two the Bio::Seq manpage objects. Requires EMBOSS and bioperl-run.
_get_databases($suffix)
This method searches the data directory for files ending with $suffix
and returns this list of files in an array.
Substitutes $suffix with .nfo from all found files and searches for an
info file with that name. The content of that file will be used as description.
When no file is found, the description will be the filename without the suffix:
%dbs = _get_databases('.al1'); # finds file ATH1.cdna.al1, searches # for ATH1.cdna.nfo print $dbs{'ATH1.cdna'}; # prints content of ATH1.cdna.nfo # or 'ATH1.cdna'
_get_sequences_from_bio_index($id)
GUUGle, RE and Agrep back-ends use the Bio::Index manpage for sequence id queries
(implemented in this this method). Returns a the Bio::SeqIO manpage object.
_create_tmp_query_file()
Examines query, query_file and reverse_complement and generates a
temporary Fasta file that is passed in the system() call to the back-end. If the
environment variable BIOGREPDEBUG is not set, then this file will be deleted
when the script exits.
_create_index_and_alphabet_file($fastafile)
Creates an index of the specified Fasta file with the Bio::Index::Fasta manpage.
Creates an Vmatch alphabet file.
_parse_regions($hash_ref)
Takes as argument a hash reference with the gene sequence and the hit
coordinates (keys complete_seq, subject_begin and subject_end,
respectively). Calculates the upstream, subject and downstream sequence.
my ( $upstream_seq, $subject_seq, $downstream_seq ) =
$self->_parse_regions({
complete_seq => $complete_seq,
subject_begin=> $subject_begin,
subject_end => $subject_end,
});
$sbe->results()
DEPRECATED. Get the results after search() was called. This is an array of
the Bio::Grep::SearchResult manpage objects.
$sbe->search();
foreach my $res (@{$sbe->results}) { # output result }
Use next_res() instead.
$sbe->generate_database_out_of_fastafile($fastafile)
DEPRECATED. Use generate_database() instead.
See the Bio::Grep::Root manpage for other diagnostics.
Alphabet of query and database not equal
You tried to search with DNA/RNA query in protein database or vice versa.
Bio::Root::BadParameter.
Back-end does not support protein data
You tried to generate a protein database with a back-end that does not support
protein data. Bio::Root::BadParameter.
Can't combine editdistance and mismatches.
Set either editdistance or mismatches, not both. Bio::Root::BadParameter
Can't copy ... to ...
It was not possible to copy the Fasta file in generate_database() in the
data directory. Check path and permissions. Bio::Root::IOException.
Database not defined.
You forgot to define a database. You have to build a database with
$sbe->generate_database (once) and set it with
$sbe->settings->database. Example:
$sbe->generate_database('ATH1.cdna"); $sbe->settings->database('ATH1.cdna');
Bio::Root::BadParameter.
Database not found.
The specified database was not found. Check name and
$sbe->settings->datapath. Bio::Root::BadParameter.
=item C<Database not valid (insecure characters).>
The database name is not valid. Allowed characters are 'a-z', 'A-z','0-9', '.'
, '-' and '_'. Bio::Root::BadParameter.
Invalid arguments. Hashref assumed.
The method generate_database() assumes as first argument a hash reference.
Bio::Root::BadParameter.
No such file.
The file you have specified in generate_database() does not exist.
Bio::Root::BadParameter.
Query and query_file are set. I am confused...
You specified a query and a query file. Bio::Root::BadParameter.
Query not defined.
You forgot to define a query or a query file. Bio::Root::BadParameter.
Reverse complement only available for DNA databases.
Either reverse_complement or direct_and_rev_com is set and the specified
database is a protein database. These two feature are only available for DNA
databases.
Sort mode not valid.
The specified sort mode ($sbe->settings->sort) is not valid.
You can get all valid sort modes with $sbe->available_sort_modes()
See the Bio::Grep::Backend::Vmatch manpage, the Bio::Grep::Backend::GUUGle manpage,
the Bio::Grep::Backend::RE manpage and the Bio::Grep::Backend::Agrep manpage for details.
Bio::Root::BadParameter.
Requires EMBOSS and the Bio::Factory::EMBOSS manpage for the Needleman-Wunsch local
alignment implementation from EMBOSS. The internal method
_get_alignment($seq_a, $seq_b) can then calculate an alignment for
back-ends that do not generate an alignment (like the Bio::Grep::Backend::Agrep manpage).
the Bio::Grep::SearchSettings manpage the Bio::Grep::SearchResults manpage
Markus Riester, <mriester@gmx.de>
Copyright (C) 2007-2008 by M. Riester.
Based on Weigel::Search v0.13, Copyright (C) 2005-2006 by Max Planck Institute for Developmental Biology, Tuebingen.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
| Bio::Grep::Backend::BackendI - Superclass for all back-ends |