Google
More docs on the ARB website.
See also index of helppages.
Last update on 04. May 2016 .
Main topics:
Related topics:

    kitsch.doc

    DISCLAIMER

    This file has been automatically converted from the original documentation for easy use inside the ARB help system. Differences compared with the original documentation are unintentionally caused by the conversion process. In doubt please refer to the original documentation!

     

    DOCUMENTATION

    # generated from ../../GDE/PHYLIP/doc/kitsch.html

    version 3.6
    KITSCH -- Fitch-Margoliash and Least Squares Methods
                  with Evolutionary Clock
    (C)  Copyright  1986-2002  by  the  University of Washington. Written by
    Joseph  Felsenstein.  Permission  is  granted  to  copy  this document
    provided  that no fee is charged for it and that this copyright notice
    is not removed.
    This  program  carries  out  the  Fitch-Margoliash  and  Least Squares
    methods,  plus  a  variety  of  others  of  the  same family, with the
    assumption that all tip species are contemporaneous, and that there is
    an  evolutionary clock (in effect, a molecular clock). This means that
    branches   of  the  tree  cannot  be  of  arbitrary  length,  but  are
    constrained  so that the total length from the root of the tree to any
    species  is  the same. The quantity minimized is the same weighted sum
    of  squares  described  in  the  Distance Matrix Methods documentation
    file.
    The options are set using the menu:

    Fitch-Margoliash method with contemporary tips, version 3.6a3

    Settings for this run:
      D      Method (F-M, Minimum Evolution)?  Fitch-Margoliash
      U                 Search for best tree?  Yes
      P                                Power?  2.00000
      -      Negative branch lengths allowed?  No
      L         Lower-triangular data matrix?  No
      R         Upper-triangular data matrix?  No
      S                        Subreplicates?  No
      J     Randomize input order of species?  No. Use input order
      M           Analyze multiple data sets?  No
      0   Terminal type (IBM PC, ANSI, none)?  (none)
      1    Print out the data at start of run  No
      2  Print indications of progress of run  Yes
      3                        Print out tree  Yes
      4       Write out trees onto tree file?  Yes
    Y to accept these or type the letter for one to change
    Most  of  the  options  are  described in the Distance Matrix Programs
    documentation file.
    The  D  (methods)  option  allows  choice between the Fitch-Margoliash
    criterion and the Minimum Evolution method (Kidd and Sgaramella-Zonta,
    1971;  Rzhetsky  and Nei, 1993). Minimum Evolution (not to be confused
    with  parsimony)  uses  the  Fitch-Margoliash  criterion to fit branch
    lengths  to  each topology, but then chooses topologies based on their
    total  branch length (rather than the goodness of fit sum of squares).
    There  is  no  constraint  on  negative  branch lengths in the Minimum
    Evolution method; it sometimes gives rather strange results, as it can
    like  solutions  that  have  large  negative  branch lengths, as these
    reduce the total sum of branch lengths!
    Note that the User Trees (used by option U) must be rooted trees (with
    a  bifurcation  at their base). If you take a user tree from FITCH and
    try  to  evaluate  it  in KITSCH, it must first be rooted. This can be
    done  using RETREE. Of the options available in FITCH, the O option is
    not  available,  as  KITSCH  estimates  a  rooted tree which cannot be
    rerooted,  and  the G option is not available, as global rearrangement
    is  the  default  condition anyway. It is also not possible to specify
    that  specific  branch  lengths  of a user tree be retained when it is
    read  into  KITSCH,  unless  all of them are present. In that case the
    tree  should be properly clocklike. Readers who wonder why we have not
    provided  the  feature of holding some of the user tree branch lengths
    constant  while iterating others are invited to tell us how they would
    do  it. As you consider particular possible patterns of branch lengths
    you will find that the matter is not at all simple.
    If you use a User Tree (option U) with branch lengths with KITSCH, and
    the  tree  is  not clocklike, when two branch lengths give conflicting
    positions for a node, KITSCH will use the first of them and ignore the
    other. Thus the user tree:
    ((A:0.1,B:0.2):0.4,(C:0.06,D:0.01):43);
    is  nonclocklike,  so  it  will  be treated as if it were actually the
    tree:
    ((A:0.1,B:0.1):0.4,(C:0.06,D:0.06):44);
    The  input  is  exactly  the  same as described in the Distance Matrix
    Methods documentation file. The output is a rooted tree, together with
    the  sum  of  squares, the number of tree topologies searched, and, if
    the  power  P  is  at  its  default  value of 2.0, the Average Percent
    Standard  Deviation  is  also supplied. The lengths of the branches of
    the  tree  are  given  in a table, that also shows for each branch the
    time  at  the  upper  end  of  the  branch.  "Time"  here really means
    cumulative  branch length from the root, going upwards (on the printed
    diagram,  rightwards).  For  each  branch, the "time" given is for the
    node  at  the  right  (upper)  end  of  the branch. It is important to
    realize  that  the  branch lengths are not exactly proportional to the
    lengths  drawn  on  the  printed  tree  diagram!  In particular, short
    branches  are  exaggerated  in the length on that diagram so that they
    are more visible.
    The  method  may  be  considered  as  providing  an  estimate  of  the
    phylogeny.   Alternatively,   it  can  be  considered  as  a  phenetic
    clustering  of  the  tip  species.  This method minimizes an objective
    function,  the  sum  of  squares,  not  only setting the levels of the
    clusters  so as to do so, but rearranging the hierarchy of clusters to
    try  to  find alternative clusterings that give a lower overall sum of
    squares. When the power option P is set to a value of P = 0.0, so that
    we  are  minimizing a simple sum of squares of the differences between
    the  observed distance matrix and the expected one, the method is very
    close in spirit to Unweighted Pair Group Arithmetic Average Clustering
    (UPGMA),  also  called  Average-Linkage Clustering. If the topology of
    the  tree  is  fixed  and there turn out to be no branches of negative
    length, its result should be the same as UPGMA in that case. But since
    it  tries  alternative  topologies and (unless the N option is set) it
    combines nodes that otherwise could result in a reversal of levels, it
    is possible for it to give a different, and better, result than simple
    sequential  clustering.  Of  course  UPGMA  itself  is available as an
    option in program NEIGHBOR.
    The  U  (User  Tree) option requires a bifurcating tree, unlike FITCH,
    which  requires an unrooted tree with a trifurcation at its base. Thus
    the tree shown below would be written:
    ((D,E),(C,(A,B)));
    If a tree with a trifurcation at the base is by mistake fed into the U
    option  of KITSCH then some of its species (the entire rightmost furc,
    in  fact)  will  be  ignored and too small a tree read in. This should
    result  in  an  error  message  and  the  program  should  stop. It is
    important  to  understand the difference between the User Tree formats
    for  KITSCH  and  FITCH.  You may want to use RETREE to convert a user
    tree  that  is suitable for FITCH into one suitable for KITSCH or vice
    versa.
    An  important  use  of  this method will be to do a formal statistical
    test  of  the  evolutionary  clock  hypothesis.  This  can  be done by
    comparing  the  sums  of  squares achieved by FITCH and by KITSCH, BUT
    SOME CAVEATS ARE NECESSARY. First, the assumption is that the observed
    distances   are   truly   independent,  that  no  original  data  item
    contributes  to more than one of them (not counting the two reciprocal
    distances  from  i  to  j  and from j to i). THIS WILL NOT HOLD IF THE
    DISTANCES  ARE  OBTAINED  FROM  GENE  FREQUENCIES,  FROM MORPHOLOGICAL
    CHARACTERS,  OR  FROM  MOLECULAR SEQUENCES. It may be invalid even for
    immunological distances and levels of DNA hybridization, provided that
    the  use  of common standard for all members of a row or column allows
    an  error  in  the  measurement  of  the  standard to affect all these
    distances  simultaneously. It will also be invalid if the numbers have
    been  collected  in  experimental  groups,  each  measured  by  taking
    differences  from  a  common  standard  which  itself is measured with
    error.  Only  if  the  numbers  in  different  cells are measured from
    independent  standards  can  we  depend  on the statistical model. The
    details  of  the  test  and the assumptions are discussed in my review
    paper  on  distance  methods  (Felsenstein,  1984a).  For  further and
    sometimes  irrelevant  controversy  on these matters see the papers by
    Farris (1981, 1985, 1986) and myself (Felsenstein, 1986, 1988b).
    A  second  caveat  is  that  the  distances  must  be expected to rise
    linearly  with  time, not according to any other curve. Thus it may be
    necessary to transform the distances to achieve an expected linearity.
    If  the  distances have an upper limit beyond which they could not go,
    this  is  a  signal  that  linearity  may  not  hold.  It is also VERY
    important  to  choose  the  power  P  at  a  value that results in the
    standard  deviation of the variation of the observed from the expected
    distances being the P/2-th power of the expected distance.
    To  carry  out the test, fit the same data with both FITCH and KITSCH,
    and record the two sums of squares. If the topology has turned out the
    same,  we  have  N  = n(n-1)/2 distances which have been fit with 2n-3
    parameters  in  FITCH,  and  with  n-1  parameters in KITSCH. Then the
    difference between S(K) and S(F) has d[1] = n-2 degrees of freedom. It
    is  statistically  independent  of the value of S(F), which has d[2] =
    N-(2n-3) degrees of freedom. The ratio of mean squares
     [S(K)-S(F)]/d[1]
    ----------------
         S(F)/d[2]
    should,  under the evolutionary clock, have an F distribution with n-2
    and N-(2n-3) degrees of freedom respectively. The test desired is that
    the  F  ratio  is  in  the  upper  tail  (say  the  upper  5%)  of its
    distribution. If the S (subreplication) option is in effect, the above
    degrees  of  freedom must be modified by noting that N is not n(n-1)/2
    but  is  the  sum  of  the  numbers  of replicates of all cells in the
    distance  matrix  read in, which may be either square or triangular. A
    further explanation of the statistical test of the clock is given in a
    paper of mine (Felsenstein, 1986).
    The  program  uses  a  similar  tree  construction method to the other
    programs  in the package and, like them, is not guaranteed to give the
    best-fitting  tree.  The  assignment of the branch lengths for a given
    topology  is  a  least squares fit, subject to the constraints against
    negative  branch  lengths, and should not be able to be improved upon.
    KITSCH runs more quickly than FITCH.
    The  constant  available  for  modification  at  the  beginning of the
    program is "epsilon", which defines a small quantity needed in some of
    the  calculations.  There is no feature saving multiply trees tied for
    best,  because  exact  ties are not expected, except in cases where it
    should  be obvious from the tree printed out what is the nature of the
    tie (as when an interior branch is of length zero).
      _________________________________________________________________
    TEST DATA SET
        7
    Bovine      0.0000  1.6866  1.7198  1.6606  1.5243  1.6043  1.5905
    Mouse       1.6866  0.0000  1.5232  1.4841  1.4465  1.4389  1.4629
    Gibbon      1.7198  1.5232  0.0000  0.7115  0.5958  0.6179  0.5583
    Orang       1.6606  1.4841  0.7115  0.0000  0.4631  0.5061  0.4710
    Gorilla     1.5243  1.4465  0.5958  0.4631  0.0000  0.3484  0.3083
    Chimp       1.6043  1.4389  0.6179  0.5061  0.3484  0.0000  0.2692
    Human       1.5905  1.4629  0.5583  0.4710  0.3083  0.2692  0.0000
         _________________________________________________________________
    TEST SET OUTPUT FILE (with all numerical options on)
    7 Populations

    Fitch-Margoliash method with contemporary tips, version 3.6a3

                      __ __             2
                      \  \   (Obs - Exp)
    Sum of squares =  /_ /_  ------------
                                    2
                       i  j      Obs

    negative branch lengths not allowed

    Name                       Distances
    ----                       ---------
    Bovine        0.00000   1.68660   1.71980   1.66060   1.52430   1.60430
                  1.59050
    Mouse         1.68660   0.00000   1.52320   1.48410   1.44650   1.43890
                  1.46290
    Gibbon        1.71980   1.52320   0.00000   0.71150   0.59580   0.61790
                  0.55830
    Orang         1.66060   1.48410   0.71150   0.00000   0.46310   0.50610
                  0.47100
    Gorilla       1.52430   1.44650   0.59580   0.46310   0.00000   0.34840
                  0.30830
    Chimp         1.60430   1.43890   0.61790   0.50610   0.34840   0.00000
                  0.26920
    Human         1.59050   1.46290   0.55830   0.47100   0.30830   0.26920
                  0.00000
                                               +-------Human
                                             +-6
                                        +----5 +-------Chimp
                                        !    !
                                    +---4    +---------Gorilla
                                    !   !
           +------------------------3   +--------------Orang
           !                        !
      +----2                        +------------------Gibbon
      !    !
    --1    +-------------------------------------------Mouse
      !
      +------------------------------------------------Bovine
    Sum of squares =      0.107
    Average percent standard deviation =   5.16213
    From     To            Length          Height
    ----     --            ------          ------
    6   Human           0.13460         0.81285
    5      6            0.02836         0.67825
    6   Chimp           0.13460         0.81285
    4      5            0.07638         0.64990
    5   Gorilla         0.16296         0.81285
    3      4            0.06639         0.57352
    4   Orang           0.23933         0.81285
    2      3            0.42923         0.50713
    3   Gibbon          0.30572         0.81285
    1      2            0.07790         0.07790
    2   Mouse           0.73495         0.81285
    1   Bovine          0.81285         0.81285