Usage
Comparing protein sequences
Find the distance between fingerprints of two protein families
compare [-h] <protein_family> <protein_family> [distance_options] [output_options]list names
Comparing Arguments
protein_familyProtein family's name. Provide an existing protein family's name or the file name of a new latent space. Files should contain 30 floats, each float in a separate line.
distance_options- Optional distance flags
output_options- Optional output flags
list namesShow available protein family names
Searching
Find the closest family to a new protein sequence or family
search [-h] lat <latent space> [distance_options] [output_options]seq <sequence> [output_options]list names
Searching Arguments
lat <filename> [distance_options] [output_options]Provide the file name of one or more new protein family latent spaces. The closest protein family to these new latent spaces will be shown.
distance_options- Optional distance flags
output_options- Optional output flags
seq <filename> [output_options](Requires 64-bit Python 3.7.x)Provide the name of one or more files containing a protein sequence to get the closest protein families for those sequences.
output_options- Optional output flags
list namesShow available protein family names
Optional Flags
Output Options
-out- Output filename
-of- Output format, text or csv. Default: text
-om- Output mode, a[ppend] or w[rite]. Default: a
Distance Options
-m- Distance metric. Default: euclidean
-p- Scalar. The p-norm to apply for Minkowski, weighted and unweighted. Default: 2
Available metrics
euclidean (default), minkowski, cityblock, sqeuclidean, cosine, correlation, hamming, jaccard, chebyshev, canberra, braycurtis, yule, dice, kulsinski, rogerstanimoto, russellrao, sokalmichener, sokalsneath