Motifs: shapes and sampling

The library supports different motif shapes and sampling modes. Here, we present a brief explanation of how these mechanisms work and how you can create your own shape or sampling mode.

Shapes

By default, the library includes five predefined motif shapes: :square, :triangle, :pair, :line, and :diagonal. These shapes are defined in the file src/rma/index.jl, and they determine how a motif is drawn by the library and how it is converted to a decimal value used as an index. Therefore, when constructing a motif shape, it is important to consider how it will be converted into a decimal value.

For example, a square motif can be converted into a decimal value using the following equation (without spatial generalization).

\[I = \sum_{r = 0}^{n - 1}\sum_{c = 0}^{n - 1} 2^{rn+c}~\mathbf{R}_{i+r, j+c},\]

where $2^{rn+c}$ is responsible for associating each position of the motif with a power of 2, converting the binary structure into a decimal value. A Julia's function to compute it can be written as:

function compute_index_square(x::AbstractArray, y::AbstractArray, parameters, structure::AbstractVector{Int}, func::F, dim::AbstractVector{Int}, fixed::Vector{Int}, itr::Vector{Int}, metric) where {F}

    ##      Let a variable to store the index.
    I = 0

    ##      Copy the values of the fixed indeces to the vector of iterative indeces.
    copy!(itr, fixed)       ##  We do it to avoid memory allocations =D

    ##      Iterate to compute the index.
    for r in 0:(structure[1] - 1)
        for c in 0:(structure[2] - 1)
            ##      Change the iterator.
            itr[1] = fixed[1] + r
            itr[2] = fixed[2] + c

            ##      Calculate the recurrence between two positions.
            if @inline func(x, y, parameters, itr, metric, dim)
                index += 2^((r * structure[1]) + c)
            end
        end
    end

    return I + 1 ##     It is necessary for Julia indexing! i = I + 1
end

Knowing an algebraic expression to convert a motif into a decimal value is not strictly necessary, but it is recommended — especially considering the importance of understanding how this process will work for any value of $n$ (if applicable). For example, consider a motif with an X-shape (for $n = 3$):

\[\begin{pmatrix} \xi_1 & & \xi_2 \\ & \xi_3 & \\ \xi_4 & & \xi_5 \end{pmatrix}\]

It is easy to convert this shape into an index using something like:

function compute_index_x(x::AbstractArray, y::AbstractArray, parameters, func::F, dim::AbstractVector{Int}, fixed::Vector{Int}, itr::Vector{Int}, metric) where {F}

    ##      Let a variable to store the index.
    I = 0

    ##      Copy the values of the fixed indeces to the vector of iterative indeces.
    copy!(itr, fixed)       ##  We do it to avoid memory allocations =D

    ##  1. \xi_1
    if @inline func(x, y, parameters, itr, metric, dim)
        I += 1
    end

    ##  2. \xi_2
    itr[2] = fixed[2] + 2
    if @inline func(x, y, parameters, itr, metric, dim)
        I += 2
    end

    ##  3. \xi_3
    itr[1] = fixed[1] + 1
    itr[2] = fixed[2] + 1
    if @inline func(x, y, parameters, itr, metric, dim)
        I += 4
    end

    ##  4. \xi_4
    itr[1] = fixed[1] + 2
    itr[2] = fixed[2]
    if @inline func(x, y, parameters, itr, metric, dim)
        I += 8
    end

    ##  5. \xi_5
    itr[2] = fixed[2] + 2
    if @inline func(x, y, parameters, itr, metric, dim)
        I += 16
    end

    return I + 1 ##     It is necessary for Julia indexing! i = I + 1
end

But it is also clear that this code is not scalable, as it relies on a fixed structure. Transforming this into a scalable version that can handle any motif structure is one of the key computational challenges in the process of defining a new motif shape.

The application of a shape is associated with a sampling mode, so it is necessary to create a sampling function for each defined shape. This is because the sampling mode determines how the fixed parameter is set.

Sampling

The sampling defines how each motif will be extracted from an RP, without computing the full RP explicitly. The library includes five predefined sampling modes: :full, :random, :triangleup, :columnwise and :columnwise_full. These modes are defined in the folder src/rma/histograms/, being called by the function distribution(...).

The difficulty of creating a sampling function is proportional to the complexity of your problem. For example, if you are sampling motifs randomly from the RP, the sampling function is simple. However, if the motifs need to be extracted from a specific structure, the difficulty increases.

The sampling code for a :random sampling using square motifs is

function vect_square_random(x::AbstractArray, y::AbstractArray, parameters, structure::AbstractVector{Int},
    space_size::AbstractVector{Int}, func::F, dim::AbstractVector{Int}, hv::Int, samples::Int, metric) where {F}

    ##
    ##      Alloc memory for the histogram and the index list.
    hg = zeros(Int, 2^hv)   ##  Our histogram!
    fixed = ones(Int, length(space_size))
    itr = zeros(Int, length(space_size))

    ##
    ##      Compute the power vector.
    p_vect = zeros(Int, hv)
    for i in 1:hv
        p_vect[i] = 2^(i - 1)
    end

    ##
    ##      Get the samples and compute the histogram.
    @inbounds for _ in 1:samples
        ##
        ##      Take a random index.
        for s in eachindex(space_size)
            fixed[s] = rand(1:space_size[s])
        end

        ##
        ##      Compute the index and register the motif.
        p = @fastmath compute_index_square(x, y, parameters, structure, func, dim, fixed, itr, p_vect, metric)
        hg[p] += 1
    end

    ##
    ##      Return the histogram.
    return hg
end

In this code, we compute the power vector (p_vect) before calling the compute_index_square function, so it is not necessary to calculate powers of 2 inside the loop. Moreover, we allocate memory for the iterator itr and the fixed indices fixed outside the loop to avoid unnecessary calls to the garbage collector. The for...

for s in eachindex(space_size)
    fixed[s] = rand(1:space_size[s])
end

... retrieves an index set $(i,j)$ to define the first recurrence $R_{ij}$, , from which a motif is constructed using the shape function compute_index_square.