Back to basics: DNA – deoxyribo-whaaat

I seem to have a big mix of people reading my blog which is exciting, but I’ve realised it means that some things I talk about might seem straightforward to more scientific readers but pretty confusing for the non-sciencey bunch (i.e. my science noob dad). I’m starting to realise this is one of the biggest difficulties in talking about science, so as well as the current research posts, I’m going to start doing some ‘back to basic’ blogs to help with some of the basic concepts which are fundamental for understanding how our body works.

The first one is going to be DNA (or deoxyribonucleic acid if you’re feeling fancy). As you know, DNA is the mole
cule that determines just about everything about us, from our hair colour to the things that are common to all living organisms like how we reproduce. But how does it do this? This post will try to unpick what it means when we say that DNA ‘codes’ for things and how this links in with genetics.

You’ve probably heard that our DNA is 99% the same as a plant’s DNA. That’s because most of it “codes” for proteins that carry out tasks in our cells that we’re not even aware are being done. It does this by encoding for proteins. When scientists talk about the proteins in our diet that turn us into muscle men, that’s just one tiny portion of the protein in our body. It is the small percentage our body needs that can’t be encoded for by our DNA. Proteins are required for virtually all functions in our cells. I could do hundreds of blogs on proteins, so will leave it at that for now – all we need to remember for this post is that proteins are at the heart of just about everything that happens in our bodies.

DNA is made up of two strands which wrap together to form the famous double helix. These strands contain 3 main parts – phosphate groups, forming the backbone


DNA structure (Image created by Madeleine Price Ball)

of the strands, bases which do the all-important coding (I’ll explain this in a minute) and deoxyribose sugar groups that join the base and the backbone together.

There are four types of bases – adenine, thymine, cytosine and guanine, or A, T, C and G. These pair up with each other to join two strands together to form the double helix.
Each strand is ‘complementary’ to the other, meaning they pair up neatly. A can only join to T while C can only join to G. The order in which these bases appear determines what that particular strand of DNA encodes for.

How does this code fit into genes and chromosomes?
DNA is found in every cell, coiled up into chromosomes. As you probably know, there are 23 pairs of chromosomes (so 46 all together) in humans. Each is made up of around 2000 genes and because there are two copies of each chromosome, there are two copies of every single gene (there is one long strand of DNA in each chromosome and sections of this strand make up the genes – the picture shows this a bit more clearly). I’ll talk in a future post about how these genes are inherited and why certain genes are expressed rather than others.

Now, every cell has the identical full set of chromosomes, but not every cell needs all the DNA to code for every single protein all the time. For example, you don’t need the DNA for the pigment that causes a specific eye colour to be expressed in your big toe. Whole genes (i.e. – whole segments of DNA) are either switched on or off. These switches are controlled by a set of proteins called ‘transcription factors’ – the regulators of DNA expression.


DNA to chromosome (from BBC Bitesize)

DNA to protein
So how does DNA ‘code’ for a protein? DNA is found in the nucleus, whereas proteins are generally required in the rest of the cell – the cytoplasm. The problem is, DNA is too big and bulky to get out of the nucleus to code for proteins directly. That is one of the reasons why there is a step in between. This is where the molecule RNA, or ribonucleic acid, comes in. RNA is very like DNA, but with a couple of differences. For example RNA only has a single strand and uses ribose sugars rather than deoxyribose sugars to bond the bases to the backbone.

So when a certain protein is required, transcription factors (mentioned above) activate the DNA in a specific gene. This DNA undergoes a process called ‘transcription’ – the formation of a new strand of RNA, using the DNA as a template. The RNA contains the exact ‘complimentary’ sequence to DNA. So for example if the DNA sequence was CTGGTC, then the corresponding RNA would be GACCAG (remember how the bases match up). But chromosomes have anywhere between 50,000,000 to 300,000,000 base pairs each, so the strand being sequenced would be a lot longer than 6 letters!

So that’s transcription – the conversion of DNA to RNA in the nucleus of cells. This RNA is smaller and more transportable, so it then moves out of the nucleus through nuclear pores into the cytoplasm, where it can finally start making proteins.

The conversion of RNA to protein is called ‘translation’ – you can think of it as being converted into a whole new ‘language’. This process occurs on a piece of machinery called the ribosome. The ribosome feeds the RNA strand through its ‘reading apparatus’ and ‘reads’ the code. The ribosome can then recruit the correct amino acids required, depending on what it has read on the RNA. Every three bases code for a particular amino acid, depending on the order that they appear. For example, AGT codes for serine while AGA codes for arginine. These 3 letter codes are called codons. As there are only 20 biological amino acids used to make up all proteins, but 64 possible codons (different combinations of the 4 bases to produce a 3 letter code), there is more than one codon that can code for a particular amino acid. And not all 64 codons actually code for an amino acid, some code for a ‘stop’, which signals to the ribosome that the amino acid strand is complete.

Once the amino acid strand has been made, it is transported to various other structures in the cell to be processed into a correctly folded and complete protein.

Things can go wrong at any stage of the process I’ve just described. It requires so many things to go right that it’s not surprising that problems occur from time to time. For example, cancer is often caused by mutated genes, i.e. – mutated DNA sequences. UV light or carcinogens may cause a base pair to be deleted or swapped for another, causing a whole different amino acid to be coded for. Luckily, as you may have realised, our bodies are extremely clever and well adapted to deal with these problems. There are various repair mechanisms to ensure that mutated DNA doesn’t produce misfolded proteins. Often, it is only when these repair mechanisms themselves are mutated that disease occur.

So hopefully, if you’ve made it to the end of this long post, you understand why DNA is so important. It’s quite hard to get your head around how complicated all these tiny processes are. Plus they’re taking place every second in all the billions of cells in your body.