Genetics Refresher
Lists & Loops Refresher
Checkpoint I
Dictionaries
Checkpoint II
Build Your Own Protein
Summary
Build Your Own Protein
# Our genetic code dictionary
genetic_code_dict = { 'ATG': 'methionine', 'TTT': 'phenylalanine', 'CGG': 'arginine'}
# DNA sequence, already grouped into codons
dna_sequence = ['ATG','CGG','TTT','CGG','TTT']
amino_acids = []
# An empty list to hold the sequence of amino acids specified by the DNA sequence
# Loop through each DNA codon
for codon in dna_sequence:
# Look up the corresponding amino acid in a genetic code dictionary
amino_acid = genetic_code_dict[ codon ]
# Append the correct amino acid to our growing amino acid list (protein)
amino_acids.append( amino_acid )
print(amino_acids)
For the last example, we started with a DNA sequence already grouped into codon triplets. This is pretty unrealistic, though. Generally, a DNA sequence that you would work with when coding is just a string of As, Cs, Ts, and Gs. The good news is that we can easily convert a string of DNA nucleotides into a list of codons. To do this, we need one more tool in our toolbox: modulo.
Modulo, although it sounds complicated, is just another word for the remainder left after division. If two numbers divide into one another with no fraction remaining, then those numbers are said to have "modulo 0." For example, 4 and 2 have modulo 0 because 4 is divisible by 2 with no remainder. 5 and 2, however, have modulo 1, because 5 divided by 2 is 2 with a remainder of 1.
We can calculate modulo in Python with the % symbol:
1 print(4 % 2) # 4 modulo 2 is 0
2 print(5 % 2) # 5 modulo 2 is 1
3 print(0 % 2) # 0 modulo any number is 0!
So why are we on this modulo tangent? Because we want to divide a string of DNA nucleotides into a list of triplets! So we will use modulo 3 to check if we've reached the beginning of a new codon when reading through the DNA string. Remember that a string can be treated just like a list of characters; we can index into the string and divide up the string. So, we can convert a string of nucleotides into a list of codons:
1 dna_sequence = 'ATGCGGTTTCGGTTT'
2
3 codons = [] # Create an empty list to eventually hold the codons
4 i = 0 # Start a counter at 0
5
6 # Move through each index in the DNA sequence string length
7 for i in range(0, len(dna_sequence)):
8 if i % 3 == 0: # If the index is divisible by 3, then...
9 # Index into the DNA sequence using i thru (i + 3) to get 3 nucleotides
10 codons.append(dna_sequence[i:i+3])
11
12 print(codons)
What would happen if we didn't use the modulo function? We would triple-count all of the nucleotides and end up with the wrong protein!
1 dna_sequence = 'ATGCGGTTTCGGTTT'
2
3 codons = [] # Create an empty list to eventually hold the codons
4 i = 0 # Start a counter at 0
5
6 # Move through each index in the DNA sequence string length
7 for i in range(0, len(dna_sequence)):
8
9 # Index into the DNA sequence using i through (i + 3) to get 3 nucleotides
10 codons.append(dna_sequence[i:i+3])
11 print(codons)
Clearly, this isn't what we were trying to do - there are obviously too many codons in our solution to have come from the original DNA sequence. Thank goodness for the modulo function!