  • The first n vectors that cumulatively account for greater than 90% of the dataset's variation are sometimes used to describe the dataset and reduce its dimensionality.

  • The dimensionality reduction by principal factor extraction has visualization purposes only.

  • Similarly, the distribution of D values for the comparison of the proteomic thrA and thrB sequences is also represented in Figure 4, alongside with the null model, Eq. 11, for its dimensionality ( n = log 2 ( uu = 20 possible aminoacids) = 4.32 ), which is graphically nearly undistiguishable from that of the comparison between the stanzas, with n = log 2 ( uu = 19 possible letters) = 4.25 (dotted gray line for the rounded value, n = 4.3).

  • It is useful to evaluate how much of the dimensionality of the gene expression variation is captured by the clusters derived from gene shaving.

  • It should be recalled that the particular dimensionality of DNA sequences, n = 2, allows a very convenient unidirectional bi-dimensional representation, which is in fact the Chaos Game Representation procedure (CGR) [ 5 ] . Consequently, CGR is a particular case of USM, obtained when n = 2 and only the forward coordinates are determined.

