Usage

Comparing protein sequences

Find the distance between fingerprints of two protein families

compare [-h] <protein_family> <protein_family> [distance_options] [output_options]
             list names

Comparing Arguments

protein_family

Protein family's name. Provide an existing protein family's name or the file name of a new latent space. Files should contain 30 floats, each float in a separate line.

distance_options
Optional distance flags
output_options
Optional output flags
list names

Show available protein family names

Find the closest family to a new protein sequence or family

search [-h] lat <latent space> [distance_options] [output_options]
            seq <sequence> [output_options]
            list names

Searching Arguments

lat <filename> [distance_options] [output_options]

Provide the file name of one or more new protein family latent spaces. The closest protein family to these new latent spaces will be shown.

distance_options
Optional distance flags
output_options
Optional output flags

seq <filename> [output_options] (Requires 64-bit Python 3.7.x)

Provide the name of one or more files containing a protein sequence to get the closest protein families for those sequences.

output_options
Optional output flags
list names

Show available protein family names

Optional Flags

Output Options

-out
Output filename
-of
Output format, text or csv. Default: text
-om
Output mode, a[ppend] or w[rite]. Default: a

Distance Options

-m
Distance metric. Default: euclidean
-p
Scalar. The p-norm to apply for Minkowski, weighted and unweighted. Default: 2

Available metrics

euclidean (default), minkowski, cityblock, sqeuclidean, cosine, correlation, hamming, jaccard, chebyshev, canberra, braycurtis, yule, dice, kulsinski, rogerstanimoto, russellrao, sokalmichener, sokalsneath