| Bio::Phylo::Util::DOM - Drop-in XML DOM support for C<Bio::Phylo> |
Bio::Phylo::Util::DOM - Drop-in XML DOM support for Bio::Phylo
use Bio::Phylo::Util::DOM; use Bio::Phylo::IO qw( parse ); Bio::Phylo::Util::DOM->new(-format => 'twig'); my $project = parse( -file=>'my.nex', -format=>'nexus' ); my $nex_twig = $project->doc();
This module adds to_dom methods to the Bio::Phylo::Util::XMLWritable manpage
classes, which provide NeXML-valid objects for document object model
manipulation. DOM formats currently available are XML::Twig and
XML::LibXML. For any XMLWritable object, use to_dom in place
of to_xml to create DOM nodes.
The doc() method is also added to the Bio::Phylo::Project class. It returns a NeXML document as a DOM object populated by the current contents of the Bio::Phylo::Project object.
The NeXML parsing/writing capability of Bio::Phylo goes a long way
towards wider adoption of this useful standard.
However, while Bio::Phylo can write NeXML-valid XML, the way in
which it does this natively is somewhat hard-coded and therefore
restricted, and is essentially oriented toward text file output. As
such, there is a mismatch between the sophisticated Bio::Phylo data
structure and its own ability to manipulate and serialize that
structure in sophisticated but interoperable ways. Finer manipulations
of XML-represented data are possible via through a variety of Perl
packages that can store and control XML according to a document
object model (DOM). Many of these packages allow extremely flexible
computation over large datasets stored in XML format, and admit the
use of XML-related facilities such as XPath and XSLT programmatically.
The purpose of Bio::Phylo::Util::DOM is to introduce integrated DOM
object creation and manipulation to Bio::Phylo, both to make DOM
computation in Bio::Phylo more convenient, and also to provide a
platform for potentially more sophisticated Bio::Phylo modules to
come.
Besides the notion that DOM capability should be optional for the user,
there are two main design ideas. First, for each Bio::Phylo object
that can be parsed/written as NeXML (i.e., for each
Bio::Phylo::Util::XMLWritable object), we provide analogous method
for creating a representative DOM object, or element. These elements
are aggregatable in a DOM document object, whose native stringifying
method can be used to generate valid NeXML.
Second, we allow flexibility and extensibility in the choice of the
underlying DOM package, while maintaining a consistent DOM interface
that is similar in semantic and syntactic style to the accessors and
mutators that act on the Bio::Phylo objects themselves. This is
achieved through the DOM::DocumentI and DOM::ElementI interfaces,
which define a minimal subset of DOM accessors and mutators, their
inputs and outputs. Concrete instances of these interface classes
provide the bindings between the abstract methods and their
counterparts in the desired DOM implementation. Currently, there are
bindings for two popular packages, XML::Twig and XML::LibXML.
Another priority was simplicity of use; most of the details remain
under the hood in practice. The Bio/Phylo/Util/DOM.pm file defines the
to_dom() method for each XMLWritable package, as well as the
Bio::Phylo::Util::DOM package proper. The DOM object is a
factory that is used to create Element and Document objects; it is an
inside-out object that subclasses Bio::Phylo. To curb the
proliferation of method arguments, a DOM factory instance (set by the
latest invocation of Bio::Phylo::Util::DOM->new()) is maintained in
a package global. This is used by default for object creation with DOM
methods if a DOM factory object is not explicitly provided in the
argument list.
The underlying DOM implementation is set with the DOM factory
constructor's single argument, -format. Even this can be left out;
the default implementation is XML::Twig, which is already required
by Bio::Phylo. Thus, for example, one can use the DOM to convert
a Nexus file to a DOM representation as follows:
use Bio::Phylo::Util::DOM; use Bio::Phylo::IO qw( parse ); Bio::Phylo::Util::DOM->new(); my $project = parse( -file=>'my.nex', -format=>'nexus' ); my $nex_twig = $project->doc(); # The end.
Underlying DOM packages are loaded at runtime as specified by the
-format argument. Packages for unused formats do not need to be
installed.
The minimal DOM interface specifies the following methods. Details can be obtained from the ElementI and DocumentI POD.
get_tagname() set_tagname() get_attributes() set_attributes() clear_attributes() get_text() set_text() clear_text()
get_parent() get_children() get_first_child() get_last_child() get_next_sibling() get_prev_sibling() get_elements_by_tagname()
set_child() prune_child()
to_xml_string()
get_encoding() set_encoding()
get_root() set_root()
get_element_by_id() get_elements_by_tagname()
to_xml_string() to_xml_file()
new()
Type : Factory constructor Title : new Usage : $dom = Bio::Phylo::Util::DOM->new(-format=>$format) Function: Create a new DOM factory Returns : DOM object Args : format - DOM format (defaults to 'twig')
create_element()
Type : Creator
Title : create_element
Usage : $elt = Bio::Phylo::Util::DOM->new_document(-format=>$format)
Function: Create a new XML DOM element
Returns : DOM document
Args : Optional:
-tag => $tag_name
-attr => \%attr_hash
create_document()
Type : Creator Title : create_document Usage : $doc = Bio::Phylo::Util::DOM->new_document(-format=>$format) Function: Create a new XML DOM document Returns : DOM document Args : Package-specific args
set_format()
Type : Mutator Title : set_format Usage : $dom->set_format($format) Function: Set the format (underlying DOM package bindings) for this object Returns : format designator as string Args : format designator as string
The DOM creator interfaces: the Bio::Phylo::Util::DOM::ElementI manpage, the Bio::Phylo::Util::DOM::DocumentI manpage
Mark A. Jensen (maj -at- fortinbras -dot- us)
The Bio::Phylo::Annotation class is not yet DOMized.
| Bio::Phylo::Util::DOM - Drop-in XML DOM support for C<Bio::Phylo> |