Bioinformatics workshop, Friday, July 16, 2004.
Sequence alignment.
More
technical details on each assignment can be found in the alignment
lecture handouts.
1.Global
and local pair-wise alignments.
Got to http://jay.bioinformatics.ku.edu/EMBOSS/index.html
a.Perform both global (STRETCHER) }ð and }ð
local (WATER) pair-wise alignment of sequences ‘1_A’ and ‘1_B’ (see
section ‘Sequences’ below).
b.Copy and paste both sequences in corresponding
boxes. DO NOT include sequence names, use sequences only! }ð
c.Select alignment matrix and
gap penalties as shown in Fig 1.
d.Perform both global (STRETCHER) and local
(WATER) }ð pair-wise alignment of
sequences ‘2_a’ and ‘2_b’.
}ð }ð }ð }ð }ð }ð }ð }ð }ð }ð }ð }ð }ð }ð }ð }ð }ð e.Compare the results. Does global alignment seem
appropriate for both examples?
f.Repeat
alignment of ‘2_a’ and ‘2_b’ with the same scoring matrix, but gap opening penalty
of 2 and gap extension penalty of 1. Examine the output. Find the percent
identity and compare it to the previous result. What changed in it?
2.DotPlot.
Go to http://www.isrec.isb-sib.ch/java/dotlet/Dotlet.html
a.Input
sequence 1_A }ð by clicking ‘Input’ button
and pasting the sequences into the input box.
b.Input
sequence name ‘1’ in the ‘Name’ box. (see }ð
Fig.2) Click ‘Ok’. DO NOT include sequence names, use sequences
only! }ð
c.Select
parameters as shown in Fig.3 and click ‘Compute’ button.
d.Find
repeated regions in this sequence.
3.Protein
BLAST search.
Go to the BLAST web-site at http://www.ncbi.nlm.nih.gov/BLAST/.
a.Click on protein-protein BLAST. Paste sequence 1_A
into the input box. DO NOT include sequence names, use sequences only! }ð
b.Deselect ‘Low complexity’ filter. Use other
parameters as shown in Fig.4.
c.Click ‘Blast!’ button and then ‘Format!’ button in
a new window.
d.Open
a new browser window. Repeat all the steps from part 3a, but this time leave
the ‘Low complexity’ checked.
e.Compare
the output of the two searches (a) and (b) – what sequences are the lowest
scoring sequences in both cases? Why are they }ð
different?
f.To
check your answer, go to http://jay.bioinformatics.ku.edu/~propensity/propensity_form.php
paste sequence 1_a into the input box and click ‘Submit sequence’. On the new
page click ‘Table of low and high propensity segments’. Low complexity regions
in your sequence will be marked by ‘x’ characters below the sequence. These low
complexity regions biased the BLAST search when the low complexity filter was
turned off.
}ð
4.Multiple
sequence alignment of protein sequences using Clustal.
Go to Clustal web-site at http://www.ebi.ac.uk/clustalw/.
a.Copy and paste sequences from ‘MULTIPLE’. NOTE:
in this example you need to include both sequence names that start with ‘>’
and sequences! Click ‘Run’ button.
b.In
the output page click ‘Show colors’ and ‘Jalview’ buttons. Inspect the
alignment and identify the most conserved region of this multiple sequence
alignment.
5.PSI-BLAST
search.
Go to PSI-BLAST web-site at http://www.ncbi.nlm.nih.gov/BLAST/.
a.Copy
and paste sequence 1AUX. Select all the options as shown in Fig.5. DO NOT
include sequence names, use sequences only! }ð Click ‘Blast!’ and then ‘Format!’ buttons.
b.On
the output page that shows the results of the 1st iteration, click
‘Run PSI-BLAST iteration 2’ button and then the ‘Format!’ button.
c.Repeat
step (b) until you find sequence 1GLV (look for red square with letter ‘S’
inside on the right side of each database sequence reported in the PSI-BLAST
output).
d.Click
on this red square next to 1GLV. It will take you to the page where you can
find the structure of 1GLV. Click on ‘1GLV’ link and then on ‘PDB:
1GLV’. This
will take you to the Protein Databank that contains structural information
about this protein.
Example sequences
1_A
}ð
MARLLTTCCLLALLLAACTDVALSKKGKGKPSGGGWGAGSHRQPSYPRQP
GYPHNPGYPHNPGYPHNPGYPHNPGYPHNPGYPQNPGYPHNPGYPGWGQG
YNPSSGGSYHNQKPWKPPKTNFKHVAGAAAAGAVVGGLGGYAMGRVMSGM
NYHFDSPDEYRWWSENSARYPNRVYYRDYSSPVPQDVFVADCFNITVTEYSIG
PAAKKNTSEAVAAANQTEVEMENKVVTKVIREMCVQQYREYRLASGIQLHPAD
TWLAVLLLLLTTLFAMH
1_B
}ð
MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNR
YPPQGGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGG
THSQWNKPSKPKTNMKHMAGAAAAGAVVGGLGGYMLGSAMSRPIIHFGSDYE
DRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCVNITIKQHTVTTTTKGENFT
ETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSSPPVILLISFLIFLIVG
2_A
GPYKLLRVKIENEEIEQPLNRRTFLISKDKPFTEKTDVLMFKVDQEIYQAHKNILR
KGVFKISKSLKIYP
2_B
EWTYCEIDDVLINLVVQRWKDLEISGVIDRLDNKSKFEWFQRADILIALEKVPMD
FADGSSIGDGIDYES
1AUX
}ðGAAARVLLVIDEPHTDWAKYFKGKKIHGEIDIKVEQAEFSDLNLVAHANGGF
SVDMEVLRNGVKVVRSLKPDFVLIRQHAFSMARNGDYRSLVIGLQYAGIPSINS
LHSVYNFCDKPWVFAQMVRLHKKLGTEEFPLINQTFYPNHKEMLSSTTYPVVV
KMGHAHSGMGKVKVDNQHDFQDIASVVALTKTYATTEPFIDAKYDVRIQKIGQN
YKAYMRTSVSGNWKTNTGSAMLEQIAMSDRYKLWVDTCSEIFGGLDICAVEAL
HGKDGRDHIIEVVGSSMPLIGDHQDEDKQLIVELVVNKMAQALPR
MULTIPLE
>PRIO_ATEGE
MLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGWG
QPHGGGWGQPHGGGWGQPHGGGWGQGGGTHNQWNKPSKPKTNMKHMAG
AAAAGAVVGGLGGYMLGSAMSRPLIHFGNDYEDRYYRENMYRYPNQVYYR
PVDQYNNQNNFVHDCVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQ
YERESQAYYQRGSSMVLFSSPPVILLISFLI
>PRIO_MOUSE
MANLGYWLLALFVTMWTDVGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYP
PQGGTWGQPHGGGWGQPHGGSWGQPHGGSWGQPHGGGWGQGGGTHNQ
WNKPSKPKTNLKHVAGAAAAGAVVGGLGGYMLGSAMSRPMIHFGNDWEDRY
YRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNITIKQHTVTTTTKGENFTETD
VKMMERVVEQMCVTQYQKESQAYYDGRRSSSTVLFSSPPVILLISFLIFLIVG
>PRIO_turtle
MGRYRLTCWIVVLLVVMWSDVSFSKKGKGKGGGGGNTGSNRNPNYPSNPGY
PQNPGYPRNPSYPHNPAYPPNPAYPPNPGYPHNPSYPRNPSYPQNPGYPGG
GGQHYNPAGGGTNFKNQKPWKPDKPKTNMKAMAGAAAAGAVVGGLGGYAL
GSAMSGMRMNFDRPEERQWWNENSNRYPNQVYYKEYNDRSVPEGRFVRDC
LNNTVTEYKIDPNENQNVTQVEVRVMKQVIQEMCMQQYQQYQLASGVKLLSD
PSLMLIIMLVIFFVMH
>PRIO_CHICK
MARLLTTCCLLALLLAACTDVALSKKGKGKPSGGGWGAGSHRQPSYPRQPGY
PHNPGYPHNPGYPHNPGYPHNPGYPHNPGYPQNPGYPHNPGYPGWGQGYN
PSSGGSYHNQKPWKPPKTNFKHVAGAAAAGAVVGGLGGYAMGRVMSGMNY
HFDSPDEYRWWSENSARYPNRVYYRDYSSPVPQDVFVADCFNITVTEYSIGPA
AKKNTSEAVAAANQTEVEMENKVVTKVIREMCVQQYREYRLASGIQLHPADTW
LAVLLLLLTTLFAMH
![]()
![]()
![]()
![]()

Figure
1. Input form for Stretcher

![]()
![]()
![]()
Figure 2. Adding sequences
to DotPlot.

Figure 3. Parameters used to
create a DotPlot.

Figure 4. BLAST input form.

Figure 5. Options for PSI-BLAST.