top of page

Genetics Refresher

Lists & Loops Refresher

List Methods

Dictionaries

Checkpoint

Build Your Own Protein

Summary

List Methods

Because lists are so useful in programming, Python has some useful built-in list functions that we can use to make our code more efficient. For example, to count the number of lysines in our protein we can simply write:

1 my_protein = ['methionine','valine','leucine','serine', 'proline', 'alanine', 'lysine', 'threonine','asparagine', 'valine','lysine','alanine', 'alanine', 'tryptophan', 'glycine','lysine', 'valine', 'glycine','alanine']

print(len(my_protein))

3 print('There are ', my_protein.count('lysine'), 'lysines in my protein')

Note the difference between the len() function and the method .count()

​

Another common list function is append. This adds on a new element to the end of an existing list. For example, let's say that we discover the next amino acid in the hemoglobin protein is a histidine, and we want to add this amino acid to our protein list. This is accomplished with a simple one-liner:

1 my_protein = ['methionine','valine','leucine','serine', 'proline', 'alanine', 'lysine', 'threonine','asparagine', 'valine','lysine','alanine', 'alanine', 'tryptophan', 'glycine','lysine', 'valine', 'glycine','alanine']

my_protein.append('histidine')

4 print(my_protein)

If we want to see the last element in our list, to make sure that "append" worked, we can use indexing. Of course, we could index using the length of the protein (minus 1 - don't forget the 0-indexing!), to get the index of the last element:

​

There is an easier way, though! Python allows us to use negative numbers to index in from the back of the list. For example, to print the last element of our list, we can simply index in with -1!

​

Both methods are shown below:

1 my_protein = ['methionine','valine','leucine','serine', 'proline', 'alanine', 'lysine', 'threonine','asparagine', 'valine','lysine','alanine', 'alanine', 'tryptophan', 'glycine','lysine', 'valine', 'glycine','alanine']

print(my_protein[len(my_protein)-1])

4

print(my_protein[-1])

In biology, proteins are not symmetrical. That is, there is a defined start and end to each chain of amino acids at the structural level. Amino acids are linked in a chain that goes like this:

ProteinStructure.png

Notice that there is a N (which stands for a nitrogen atom) at the start of the amino acid chain and a C (which stands for carboxy) at the end of the chain. For this reason, we often refer to the start of a protein as the N-terminus and the end of a protein as the C-terminus. The negative indexing can be really useful to look at the C-terminus of a protein, where a lot of interesting biology can happen!

​

Indexing to multiple elements

​

So far, we've only looked at the entire contents of the list or a single element. But what if we wanted to look at several consecutive elements of a list? It's easier than you think! To print the first three elements in a list, we can just index in with 0:3. With this syntax, the first number is the starting index of interest and the second number is the last index of interest, plus 1 (the first index's element will be included in the list, but the second element's element will not be).

1 my_protein = ['methionine','valine','leucine','serine', 'proline', 'alanine', 'lysine', 'threonine','asparagine', 'valine','lysine','alanine', 'alanine', 'tryptophan', 'glycine','lysine', 'valine', 'glycine','alanine']

# Slicing - what if we wanted to see the first three amino acids?

4 print(my_protein[0:3])

Now it's your turn!

 

Print out the last 6 amino acids of our protein

1  my_protein = ['methionine','valine','leucine','serine', 'proline', 'alanine', 'lysine', 'threonine','asparagine', 'valine','lysine','alanine', 'alanine', 'tryptophan', 'glycine','lysine', 'valine', 'glycine','alanine']

bottom of page