Loops & If Statements
For Loops & If Statements
Now that we know what a for loop is, how can we use a for loop to better understand our protein? For example, the amino acid lysine is positively charged, so to get a sense of the overall positive charge of a protein, we might want to know the number of lysines it contains. We can use a number of tricks that we've already learned to count the number of lysines in our protein.
To do this, we can "loop through" each element in our list and check if each element (amino acid) is a lysine or not. We use a variable to keep track of the number of lysines that we encounter. Then, we can update this count as we move through the loop, adding 1 each time we observe a lysine.
With that basic approach in mind, there are a couple of ways to structure a for loop.
First, we can use range() to create a list of numbers such that we can index into each value in the protein in order. That is, we want to create a list of numbers from 0 to the length of the protein. These numbers can then serve as indices to access each amino acid in our protein, one by one. So all together, here's what it looks like (don't worry if it seems confusing, we'll step through it together below):
1 my_protein = ['methionine','valine','leucine','serine', 'proline', 'alanine', 'lysine', 'threonine', 'asparagine','valine','lysine','alanine', 'alanine', 'tryptophan', 'glycine','lysine', 'valine', 'glycine','alanine']
# Way 1
lysine_count = 0
# Since we haven't looked at the protein yet, we have observed 0 lysines so far
for i in range(0,len(my_protein)):
amino_acid = my_protein[i]
if amino_acid == 'lysine':
lysine_count = lysine_count + 1
# For each lysine that we observe, we add 1 to the value of the variable keeping track of the number of lysines in the protein
print('The number of lysines in our protein is', lysine_count)
1 amino_acids = ['alanine', 'arginine', 'asparagine', 'aspartic acid', 'cysteine', 'glutamic acid', 'glutamine', 'glycine', 'histidine', 'isoleucine', 'leucine', 'lysine', 'methionine', 'phenylalanine', 'proline', 'serine', 'threonine', 'tryptophan', 'tyrosine', 'valine']
Okay, so what's happening here? We create a variable called lysine_count, and set its initial value to 0, since we haven't encountered any lysines in our protein so far. Now comes the for loop. Here, we create another variable, i, that will eventually hold the indexing value for each amino acid in our protein list. We create a list of these indices using range() for the entire length of our protein list (remember 0 indexing!). Now that we have these indices, we step into the body of the for loop. The first time we enter the for loop, the variable i takes on the value of 0 (the first value of the index list). So, when we index into our protein list with i = 0, we get the first element of the list (methionine), stored into a third variable, amino_acid. Then, we can use an if statement to check if the first amino acid is a lysine (remember that == is how we check if something is equal to another value). If the amino acid is a lysine, then we want to update lysine_count by adding 1 to our running count. Because our first amnio acid is a methionine, our count remains at 0. And then the loop starts all over again, this time with i = 1. The loop will continue in this manner until we reach i = 18, the very end of our protein list.
If this still seems a little confusing, try what programmers do all the time to figure out how code works: insert print statements into the body of the for loop to print out the value of variables (maybe try i, amino_acid, or lysine_count). If you get errors doing this, try moving the print statments to other parts of the loop!
But that's only one way to skin the cat. We can also side-step using indices by looping directly through the amino acids in the protein: