  • One useful and automatable phylogenomics approach would be as follows: if a novel sequence has orthologs, annotation can be transferred from them (as in best BLAST analysis); if there are no orthologs, the sequence is classified as just a family member (as in Pfam/InterPro analysis) and flagged as possibly the first representative of a novel subfamily.

  • In order to get an estimate of the effectiveness of this implementation of automated phylogenomics, we used the RIO procedure to analyze the A. thaliana [ 16 ] and C. elegans [ 17 ] proteomes.

  • RIO is a procedure for automated phylogenomics.

  • This approach was called "phylogenomics" by Eisen [ 9 ] . It would be desirable to automate this procedure, but the best automated methods for subfamily annotation, such as the COGs database [ 10 ] , are clustering methods that do not directly use phylogenetic analysis.

  • However, a principle of phylogenomics is that orthologous sequences (that diverged by speciation) are more likely to conserve protein function than paralogous sequences (that diverged by gene duplication).

