About MOLBIOWIZ Joe Couto JAN 2025
MolBioWiz can process up to thousands of DNA or protein sequences at once, for
sequence formatting, translation, reverse-complementing, reverse-transcription and motif-searches.
It can also do multiple antibody V-region sequence alignments, find antibody developability problems, and help design peptide immunogens.
It can also do multiple antibody V-region sequence alignments, find antibody developability problems, and help design peptide immunogens.
[
...
]
help
Tutorial: press this to load some text lines into INPUT. Then play with the buttons.- grep extracts lines that contain the search motif
- Optionally, check the case-sensitive box
- search motif accepts regular expressions
- findAll outputs a list of found motifs and their positions on the complete text
- replace finds search motif and replaces it with replace with
- substring extracts vertical columns of text
- press the add ruler button to help you select the indeces
- note that the ruler is added only if there is some text in INPUT
- once you decide what the indeces should be, press the remove ruler button and then substring
- delRepeats removes repeated lines - (flanking blank spaces are ignored)
- other text functions - At least with Chrome or Edge browsers, you can press crlt + F to search text, and you can right-click over the text areas for additional text functions.
help
Seqs works on fastA-formatted sequences entered in INPUT. You can upload up to thousands of sequences from a textfile, or, copy them from another file or web-page, and paste them into INPUT.Tutorial: press this to load some sequences into INPUT. Then play with the buttons.
- clean outputs a single line of sequence, controls up/lower cases, removes/keeps dashes
- name batch sequence renaming
- rev reverse-complement, reverse-transcribe
- tran translate
- xtrct extract sequences that contain search motif
- show show motifs aligned with seqs
- The openFormat button in the output window opens another window for formatting (i.e. line breaks, residue numbering).
Case:
%IdFr1 | %IdFr2 | %IdFr3 | %IdFr4 | truncL | trunc% |
---|---|---|---|---|---|
AbAlign help
AbAlign aligns fastA-formatted antibody sequences entered in INPUT ignoring all but the variable regions.- It finds VH, VK or VL frameworks using a small database of template sequences, which
can only be edited at the code level (let me know if you think the template needs to be
expanded). As is, it should correctly align most human, mouse,
and rabbit sequences to an IMGT-like FR numbering system.
- ⚠ Sequence names are truncated to 10 characters.
- There are five extra residues in FR1 (0, 11a-d) to accommodate rare rabbit sequences.
- The first 4 boxes allow you to change the %ID threshold for finding a FR match.
- If you know that a FR is NOT present in your sequence enter 100% in its box to avoid finding a similar but wrong FR sequence.
- If the tool can't find a complete FR, it tries to find a truncated one using the last two parameters, minimum length of truncated FR and %ID, respectively.
- The AbDev button in the output window opens another movable window with its own functions that look for sequence motifs that could be problematic for antibody developability.
Developability help
Antibody Developability- These functions check for motifs that might cause problems in antibody developability.
- They work only on the aligned antibody sequences in the Ab Alignment window.
- You cannot download the output because it has colored text but you can select it (CTRL+A), copy it (CTRL+C) and paste it to an MS word document - If you paste with "keep source formatting" the text colors will be preserved in MS word.
- note the regular expressions for the following motifs are more complex than usual because we need to account for possible intervening "_" in the antibody sequence alignments
- glyc - N-linked glycosylation sites N_*[^P]_*[ST]
- NxDx - CDR motifs that may lead to deamination/isomerization N_*[GSTNH]|G_*N_*[FY]|D_*[GSDTH]
- specf - CDR3 scFV non-specificity motifs G+_*G+|R+_*R+|V_*G|V+_*V+|W+_*W+|Y+_*Y+|W_*\w_*W
- aggr - motifs in CDRs that cause aggregation F_*H_*W
- visc - motifs in CDRs that lead to viscosity H_*Y_*F|H_*W_*H'
- posit - marks R,K,D,E CDR residues if Nmbr(R,K) - Nmbr(D,E) > 1
- cys - colors canonical Cys residues at positions 23 and 104 in red and others in blue
- motif -enter your own regular expression in search motif
minPepLen | maxPepLen | maxKyDo | EDRK | ILVMF | duplc | forbid |
|
|
nextStart |
---|---|---|---|---|---|---|---|---|---|
help
PepScan helps finding good peptide immunogens for Input fastA-formatted protein sequencesIt outputs protein-peptide sequence alignments and a peptide list respectively into the ALIGNED and PEPTIDES text areas
The initial selection parameters are pretty strict, so, you may not get any peptides until you change the parameters
You can FORMAT the ALIGNED output to wrap the lines while maintaining the alignments
To fastA-format the peptide list swap the outputs (swap OUT1⇄OUT2) and once the list is in the top output textarea you can toggle the format by pressing fastA⇄name::seq
Selection Parameters
- minPepLen Exclude peps shorter than # Aas
- maxPepLen Exclude peps longer than # Aas
- maxKyDo Exclude peps with KyDo > value (KyDo = weighted arithmetic mean of Aa residue Kyte-Doolittle values, excluding N and C termini)
- EDRK Point-subtraction for excessive charges (see Scoring below)
- ILVMFAC Point-subtraction for hydrophobicity (see Scoring below)
- duplc Point-subtraction for Aa duplications (see Scoring below)
- forbid Exclude peps if they contain listed Aa residues
- score Exclude peps with scores < value
- bestN Show peps with the top N scores
- nextStart The next potential pep starts n residues after previous one
- initial score = peptide length + number of unique residues
- + 2 pts if -0.2 < kydo < -1
- + 1 pt if kydo < -1 (too many +/- charges)
- - x pts if s = 2p(n-2); n = # repeated Aas, p = entered parameter.
- - x pts if pep contains > 2 consecutive charged (EDRK) Aas
- - x pts if pep contains > 2 consecutive hydrophobic (ILVMFAC) Aas
INPUT
time.txt
ANTIBODY DEVELOPABILITY OUTPUT AREA