Bioinformatics workshop, Friday, July 16, 2004. Sequence alignment.

 

More technical details on each assignment can be found in the alignment lecture handouts.

 

 

1.Global and local pair-wise alignments.

Got to http://jay.bioinformatics.ku.edu/EMBOSS/index.html

 

a.Perform both global (STRETCHER) }ð and }ð local (WATER) pair-wise alignment of sequences ‘1_A’ and ‘1_B’ (see section ‘Sequences’ below).

b.Copy and paste both sequences in corresponding boxes. DO NOT include sequence names, use sequences only! }ð

c.Select alignment matrix and gap penalties as shown in Fig 1.

d.Perform both global (STRETCHER) and local (WATER) }ð pair-wise alignment of sequences ‘2_a’ and ‘2_b’.

 }ð }ð }ð }ð }ð }ð }ð }ð }ð }ð }ð }ð }ð }ð }ð }ð }ð e.Compare the results. Does global alignment seem appropriate for both examples?

f.Repeat alignment of ‘2_a’ and ‘2_b’ with the same scoring matrix, but gap opening penalty of 2 and gap extension penalty of 1. Examine the output. Find the percent identity and compare it to the previous result. What changed in it?

2.DotPlot.

Go to http://www.isrec.isb-sib.ch/java/dotlet/Dotlet.html

a.Input sequence 1_A }ð by clicking ‘Input’ button and pasting the sequences into the input box.

b.Input sequence name ‘1’ in the ‘Name’ box. (see }ð Fig.2) Click ‘Ok’. DO NOT include sequence names, use sequences only! }ð

c.Select parameters as shown in Fig.3 and click ‘Compute’ button.

d.Find repeated regions in this sequence.

3.Protein BLAST search.

Go to the BLAST web-site at http://www.ncbi.nlm.nih.gov/BLAST/.

a.Click on protein-protein BLAST. Paste sequence 1_A into the input box. DO NOT include sequence names, use sequences only! }ð

b.Deselect ‘Low complexity’ filter. Use other parameters as shown in Fig.4.

c.Click ‘Blast!’ button and then ‘Format!’ button in a new window.

d.Open a new browser window. Repeat all the steps from part 3a, but this time leave the ‘Low complexity’ checked.

e.Compare the output of the two searches (a) and (b) – what sequences are the lowest scoring sequences in both cases? Why are they }ð different?

f.To check your answer, go to http://jay.bioinformatics.ku.edu/~propensity/propensity_form.php
paste sequence 1_a into the input box and click ‘Submit sequence’. On the new page click ‘Table of low and high propensity segments’. Low complexity regions in your sequence will be marked by ‘x’ characters below the sequence. These low complexity regions biased the BLAST search when the low complexity filter was turned off.

 }ð

 

4.Multiple sequence alignment of protein sequences using Clustal.

Go to Clustal web-site at http://www.ebi.ac.uk/clustalw/.

a.Copy and paste sequences from ‘MULTIPLE’. NOTE: in this example you need to include both sequence names that start with ‘>’ and sequences! Click ‘Run’ button.

b.In the output page click ‘Show colors’ and ‘Jalview’ buttons. Inspect the alignment and identify the most conserved region of this multiple sequence alignment.

5.PSI-BLAST search.

Go to PSI-BLAST web-site at http://www.ncbi.nlm.nih.gov/BLAST/.

a.Copy and paste sequence 1AUX. Select all the options as shown in Fig.5. DO NOT include sequence names, use sequences only! }ð Click ‘Blast!’ and then ‘Format!’ buttons.

b.On the output page that shows the results of the 1st iteration, click ‘Run PSI-BLAST iteration 2’ button and then the ‘Format!’ button.

c.Repeat step (b) until you find sequence 1GLV (look for red square with letter ‘S’ inside on the right side of each database sequence reported in the PSI-BLAST output).

d.Click on this red square next to 1GLV. It will take you to the page where you can find the structure of 1GLV. Click on ‘1GLV’ link and then on ‘PDB: 1GLV’. This will take you to the Protein Databank that contains structural information about this protein.


Example sequences

 

1_A

 

 }ð MARLLTTCCLLALLLAACTDVALSKKGKGKPSGGGWGAGSHRQPSYPRQP

GYPHNPGYPHNPGYPHNPGYPHNPGYPHNPGYPQNPGYPHNPGYPGWGQG

YNPSSGGSYHNQKPWKPPKTNFKHVAGAAAAGAVVGGLGGYAMGRVMSGM

NYHFDSPDEYRWWSENSARYPNRVYYRDYSSPVPQDVFVADCFNITVTEYSIG

PAAKKNTSEAVAAANQTEVEMENKVVTKVIREMCVQQYREYRLASGIQLHPAD

TWLAVLLLLLTTLFAMH

 

1_B

 

 }ð MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNR

YPPQGGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGG

THSQWNKPSKPKTNMKHMAGAAAAGAVVGGLGGYMLGSAMSRPIIHFGSDYE

DRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCVNITIKQHTVTTTTKGENFT

ETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSSPPVILLISFLIFLIVG

 

2_A

 

GPYKLLRVKIENEEIEQPLNRRTFLISKDKPFTEKTDVLMFKVDQEIYQAHKNILR

KGVFKISKSLKIYP

 

2_B

 

EWTYCEIDDVLINLVVQRWKDLEISGVIDRLDNKSKFEWFQRADILIALEKVPMD

FADGSSIGDGIDYES

 

1AUX

 

 }ðGAAARVLLVIDEPHTDWAKYFKGKKIHGEIDIKVEQAEFSDLNLVAHANGGF

SVDMEVLRNGVKVVRSLKPDFVLIRQHAFSMARNGDYRSLVIGLQYAGIPSINS

LHSVYNFCDKPWVFAQMVRLHKKLGTEEFPLINQTFYPNHKEMLSSTTYPVVV

KMGHAHSGMGKVKVDNQHDFQDIASVVALTKTYATTEPFIDAKYDVRIQKIGQN

YKAYMRTSVSGNWKTNTGSAMLEQIAMSDRYKLWVDTCSEIFGGLDICAVEAL

HGKDGRDHIIEVVGSSMPLIGDHQDEDKQLIVELVVNKMAQALPR

 

 


MULTIPLE

 

>PRIO_ATEGE

MLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGWG

QPHGGGWGQPHGGGWGQPHGGGWGQGGGTHNQWNKPSKPKTNMKHMAG

AAAAGAVVGGLGGYMLGSAMSRPLIHFGNDYEDRYYRENMYRYPNQVYYR

PVDQYNNQNNFVHDCVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQ

YERESQAYYQRGSSMVLFSSPPVILLISFLI

 

>PRIO_MOUSE

MANLGYWLLALFVTMWTDVGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYP

PQGGTWGQPHGGGWGQPHGGSWGQPHGGSWGQPHGGGWGQGGGTHNQ

WNKPSKPKTNLKHVAGAAAAGAVVGGLGGYMLGSAMSRPMIHFGNDWEDRY

YRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNITIKQHTVTTTTKGENFTETD

VKMMERVVEQMCVTQYQKESQAYYDGRRSSSTVLFSSPPVILLISFLIFLIVG

 

>PRIO_turtle

MGRYRLTCWIVVLLVVMWSDVSFSKKGKGKGGGGGNTGSNRNPNYPSNPGY

PQNPGYPRNPSYPHNPAYPPNPAYPPNPGYPHNPSYPRNPSYPQNPGYPGG

GGQHYNPAGGGTNFKNQKPWKPDKPKTNMKAMAGAAAAGAVVGGLGGYAL

GSAMSGMRMNFDRPEERQWWNENSNRYPNQVYYKEYNDRSVPEGRFVRDC

LNNTVTEYKIDPNENQNVTQVEVRVMKQVIQEMCMQQYQQYQLASGVKLLSD

PSLMLIIMLVIFFVMH

>PRIO_CHICK

MARLLTTCCLLALLLAACTDVALSKKGKGKPSGGGWGAGSHRQPSYPRQPGY

PHNPGYPHNPGYPHNPGYPHNPGYPHNPGYPQNPGYPHNPGYPGWGQGYN

PSSGGSYHNQKPWKPPKTNFKHVAGAAAAGAVVGGLGGYAMGRVMSGMNY

HFDSPDEYRWWSENSARYPNRVYYRDYSSPVPQDVFVADCFNITVTEYSIGPA

AKKNTSEAVAAANQTEVEMENKVVTKVIREMCVQQYREYRLASGIQLHPADTW

LAVLLLLLTTLFAMH

 

Figure 1. Input form for Stretcher


 


Figure 2. Adding sequences to DotPlot.

 

 

 

 

 

 

 

 

 

 

 

 

Figure 3. Parameters used to create a DotPlot.


Figure 4. BLAST input form.


 

Figure 5. Options for PSI-BLAST.