  • In figure 1a, the segments " very fond of" in the two stanzas are linked by solid lines to highlight the fact that sequence similarity is reflected by spatial proximity of USM coordinates.

  • Similarly, the distribution of D values for the comparison of the proteomic thrA and thrB sequences is also represented in Figure 4, alongside with the null model, Eq. 11, for its dimensionality ( n = log 2 ( uu = 20 possible aminoacids) = 4.32 ), which is graphically nearly undistiguishable from that of the comparison between the stanzas, with n = log 2 ( uu = 19 possible letters) = 4.25 (dotted gray line for the rounded value, n = 4.3).

  • Therefore, if there is no requirement for an integer result, the effective value of n for the two stanzas can be refined as being n = log 2 (19) = 4.25.

  • To avoid the issue of unit inequality and highlight the general applicability of the USM procedure, stanzas of a poem were used to illustrate the implementation instead.

  • For W. Cope's stanzas, n = ceil(log 2 (19)) = 5. The binary reference coordinates for the unique units are defined by the numerals of the binary code - for example, a will be assigned to the position U ' a ' = [0,0,1,0,1].

