Vol. 17 no. 4 2001
Pages 383–384
BIOINFORMATICS APPLICATIONS NOTE
ATV: display and manipulation of annotated
phylogenetic trees
Christian M. Zmasek and Sean R. Eddy
Howard Hughes Medical Institute, Department of Genetics, Washington University
School of Medicine, St Louis, MO 63110, USA
Received on November 1, 2000; revised on December 19, 2000; accepted on December 21, 2000
ABSTRACT
duplication or a speciation, and subtrees are annotated
Summary: A Tree Viewer (ATV) is a Java tool for the
display and manipulation of annotated phylogenetic trees.
It can be utilized both as a standalone application and as
an applet in a web browser.
Availability: ATV is available via WWW at http:
//ftp.genetics.wustl.edu/pub/eddy/software/forester.tar.Z
according to sequence function (as description and/or
EC number). In addition, information about species (as
name and/or taxonomy ID) and sequence names, branch
lengths, and bootstrap values are likely to be present.
We needed a tool for visualizing heavily annotated
phylogenetic trees. Although a variety of excellent tree
browsers exist, including DRAWTREE from the PHYLIP
package (Felsenstein, 1993), TREEVIEW (Page, 1996),
NIFAS (http://www.cgr.ki.se/Pfam/nifas.html), NJPLOT
(Perriere and Gouy, 1996), and Phylodendron (http:
//www.iubio.bio.indiana.edu/soft/molbio/java/apps/trees/)
none of them exactly suited our annotation needs. Hence,
we developed our own design.
INTRODUCTION
Many proteins belong to large families consisting of
subfamilies with different biological functions. This com-
plicates efforts to infer the function of new proteins by
computational sequence analysis. Neither of the two main
sequence analysis methods handle large protein families
satisfactorily in high-throughput automated annotation.
Pairwise sequence similarity searches, exemplified by
BLAST (Altschul et al., 1990), lead to overly specific
annotations. A new sequence in a protein family is always
‘most similar’ to something, so it is difficult to recognize
when the new sequence is the pioneer member of a novel
functional subfamily. Profile search methods, exemplified
by HMMER (Eddy, 2000), lead to overly general annota-
tions. They recognize that a new sequence fits a general
profile of a family, but do not attempt to subclassify the
sequence at all.
Phylogenetic inference is a sensible approach to sub-
classifying sequences, by grouping them hierarchically
into evolutionary clades. The use of phylogenetic infer-
ence to improve genome sequence annotation has been
termed ‘phylogenomics’ by Eisen (1998). A key idea
of phylogenomics is to distinguish sequences that have
diverged by speciation (orthologues) from sequences
that have diverged by duplication (paralogues). Although
orthology does not equate with functional conservation,
as is sometimes assumed, orthologues often do conserve
more aspects of a protein’s function than paralogues do.
During phylogenomic analysis, gene trees are annotated
with various data. Nodes are annotated as either a gene
FEATURES
ATV is mouse and menu driven. The user can choose
which data elements to display on the tree. All the data
fields associated with nodes can be edited. The tree can
be rerooted on any branch. ATV allows visualization of
very large trees (>500 sequences): the user can display
any subtree of the tree, zoom in or out, or collapse
any subtree into a single node. The applet hyperlinks to
SwissProt entries for sequences with a SwissProt name.
Branches can be colored according to likelihood values
associated with them. The Swing version (see below) of
the application allows printing trees in color. Depending
on the user’s environment, it also allows tree images to be
exported as PostScript or PDF files. An example of ATV
displaying an annotated tree is shown in Figure 1.
Trees can be read and saved in the standard ‘New
Hampshire’ format (Felsenstein, 1993), but this format
is not suitable for storing annotated trees. Currently we
use a simple extension of the format that we call ‘New
Hampshire eXtended’ format (NHX). In NHX, additional
tag/value pairs are used to associate annotation with
nodes. In the long term, we envision replacing NHX
with a structured markup language, such as the XML
document type definition for the description of taxonomic
relationships described in Gilmour (2000).
383