find_nearest_tips {castor} | R Documentation |

Given a rooted phylogenetic tree and a subset of potential target tips, for each tip and node in the tree find the nearest target tip. The set of target tips can also be taken as the whole set of tips in the tree.

find_nearest_tips(tree, only_descending_tips = FALSE, target_tips = NULL, as_edge_counts = FALSE, check_input = TRUE)

`tree` |
A rooted tree of class "phylo". The root is assumed to be the unique node with no incoming edge. |

`only_descending_tips` |
A logical indicating whether the nearest tip to a node or tip should be chosen from its descending tips only. If FALSE, then the whole set of possible target tips is considered. |

`target_tips` |
Optional integer vector or character vector listing the subset of target tips to restrict the search to. If an integer vector, this should list tip indices (values in 1,..,Ntips). If a character vector, it should list tip names (in this case |

`as_edge_counts` |
Logical, specifying whether to count phylogenetic distance in terms of edge counts instead of cumulative edge lengths. This is the same as setting all edge lengths to 1. |

`check_input` |
Logical, whether to perform basic validations of the input data. If you know for certain that your input is valid, you can set this to |

Langille et al. (2013) introduced the Nearest Sequenced Taxon Index (NSTI) as a measure for how well a set of microbial operational taxonomic units (OTUs) is represented by a set of sequenced genomes of related organisms. Specifically, the NSTI of a microbial community is the average phylogenetic distance of any OTU in the community, to the closest relative with an available sequenced genome ("target tips"). In analogy to the NSTI, the function `find_nearest_tips`

provides a means to find the nearest tip (from a subset of target tips) to each tip and node in a phylogenetic tree, together with the corresponding phylogenetic ("patristic") distance.

If `only_descending_tips`

is `TRUE`

, then only descending target tips are considered when searching for the nearest target tip of a node/tip. In that case, if a node/tip has no descending target tip, its nearest target tip is set to NA. If `tree$edge.length`

is missing or NULL, then each edge is assumed to have length 1. The tree may include multi-furcations as well as mono-furcations (i.e. nodes with only one child).

The asymptotic time complexity of this function is O(Nedges), where Nedges is the number of edges in the tree.

A list with the following elements:

`nearest_tip_per_tip` |
An integer vector of size Ntips, listing the nearest target tip for each tip in the tree. Hence, |

`nearest_tip_per_node` |
An integer vector of size Nnodes, listing the index of the nearest target tip for each node in the tree. Hence, |

`nearest_distance_per_tip` |
Integer vector of size Ntips. Phylogenetic ("patristic") distance of each tip in the tree to its nearest target tip. If |

`nearest_distance_per_node` |
Integer vector of size Nnodes. Phylogenetic ("patristic") distance of each node in the tree to its nearest target tip. If |

Stilianos Louca

M. G. I. Langille, J. Zaneveld, J. G. Caporaso et al (2013). Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nature Biotechnology. 31:814-821.

# generate a random tree Ntips = 1000 tree = generate_random_tree(list(birth_rate_intercept=1),Ntips)$tree # pick a random set of "target" tips target_tips = sample.int(n=Ntips, size=as.integer(Ntips/10), replace=FALSE) # find nearest target tip to each tip & node in the tree results = find_nearest_tips(tree, target_tips=target_tips) # plot histogram of distances to target tips (across all tips of the tree) distances = results$nearest_distance_per_tip hist(distances, breaks=10, xlab="nearest distance", ylab="number of tips", prob=FALSE);

[Package *castor* version 1.7.0 Index]