10.3.1 Gene expression and protein synthesis

Printer-friendly version

The coded information for enzymic, regulatory and structural protein resides in the genomic DNA. Information flows from DNA → RNA → protein (Figure 10.24), sometimes called the ‘Central Dogma of Biology’. In some viruses, RNA can serve instead as a template for DNA synthesis, so information flow is not always DNA → RNA. The enzyme that allows this reverse flow, reverse transcriptase, has become extremely useful as a tool enabling molecular biologists to synthesise DNA artificially using RNA templates.


Figure 10.24 Flow of genetic information in relation to plant cell ultrastructure. Information in chromosomal DNA is transcribed into RNA in the nucleus and then processed to form mature messenger RNA (mRNA) which is then translocated into the cytoplasm, where it is translated on free or membrane-bound ribosomes into polypeptides. Several components are visible under a transmission electron microscope: (a) nucleus, nucleolus (densely stained region) and perichromatin granules (inset, x 44 000) which are probably pre-RNA molecules, x 7700; (b) free ribosomes; (c) ribosomes on the surface of rough endoplasmic reticulum; (d) polyribosomes or polysomes (arrowed), which represent active translation of mRNA in a 'bead on a string' configuration, indicating simultaneous synthesis of several polypeptides from a single mRNA, all x 30 000, from Helianthus tuberosus.

(Photographs courtesy R.J. Rose)


Table 10.2

(a)  The genetic code

The genetic code operates as triplet combinations of the four nitrogenous bases adenine (A), cytosine (C), guanine (G) and thymine (T) that form the backbone of DNA molecules. Each triplet is a codon (e.g. ATT, GTC, CGA) and these code for the different amino acids that are polymerised to form proteins. Between DNA and protein comes the intermediate RNA stage. RNA is synthesised using base pairing rules, but has uracil in place of thymine, so we find A pairs with U, and G with C. Each of the 20 amino acids found in protein are specified by this code (Table 10.2). With four bases, there are 64 possible combinations of three. In fact, most amino acids are coded for by more than one triplet. Because of this, we describe the genetic code as ‘degenerate’. In addition, a few non-coding triplets exist and these serve as stop codons.

(b)  Transcription: ‘DNA →  RNA’ in the nucleus

Three different kinds of RNA are transcribed from DNA: messenger RNA (mRNA), ribosomal RNA (rRNA) and transfer RNA (tRNA). Messenger RNA carries the protein code, rRNA is an integral part of ribosome structure, and tRNA is an adaptor RNA, which in association with the ribosome aligns amino acids with the mRNA code and facilitates synthesis of polypeptide chains.

A gene coding for mRNA typically consists of three regions — promoter, protein coding and terminator (Figure 10.25). Messenger RNA synthesis occurs in transcriptionally active chromatin sites within the nucleus, sometimes called ‘euchromatin’ regions. The promoter region is ‘upstream’ of the 5' end of the protein coding region. A transcription enzyme called RNA polymerase binds to the gene at this 5' end and facilitates transcription of the protein coding regions, often together with other processing sequences that may later be excised. Various regulatory factors determine whether the RNA polymerase can bind to the promoter. Binding of this enzyme is a key component of transcriptional regulation, and is discussed further below.


Figure 10.25 Structure and transcription of a typical plant nuclear gene. Genes contain more than just the DNA sequences that code polypeptide chains. Those that are transcribed into mRNA consist of three main components – promoter, protein coding and terminator. Coding sequences (exons) are usually interrupted by non-coding segments called introns. Promoters contain two main regions. The first is the 'TATA box' (marked **), about 30 bases upstream of the start codon. The second region, further upstream, contains one or more sequences (diagonal hatching; often known as cis regulatory elements) to which regulatory proteins (known as transcription or trans acting factors) bind. Transcription factors enhance or repress transcription depending on how they affect RNA polymerase binding. After transcription, pre-mRNA molecules are processed into mature mRNA, which includes splicing out introns, addition of a polyA (chain of adenosyl residues) tail to the 3' end, and capping at the 5' end. There are also untranslated regions at both the 3' and 5' ends of mature mRNA. Intron splicing can be seen in electron micrographs, represented diagrammatically here, of the interaction of mature mRNA with its complementary DNA, which shows introns as loops but exons hybridised to the mRNA. TBP = TATA-binding protein; TAF = TBP-associated factor.

The coding region is usually interrupted by non-coding segments called ‘introns’ (Figure 10.25). While the mRNA is still within the nucleus, the introns are spliced out to leave a mature mRNA molecule. Mature mRNA has a methyl group added to the first base at the 5' end of the strand. This is known as ‘capping’. The other end (3') has a stretch of A residues attached, called a ‘polyA’ tail. Many of these post-transcriptional changes to mRNA facilitate transport from the nucleus to sites of translation in the cytoplasm.

Ribosomal RNA synthesis and processing occurs in the nucleolus (Figure 10.24), which is the site of the rRNA genes. An rRNA transcript has no cap or a polyA tail. Instead, the molecules are processed in a series of steps to yield mature 28S and 18S rRNA (S is the Svedberg unit which refers to the rate of sedimentation in a high-speed centrifuge, and therefore gives a measure of particle size). These two sizes are associated with the large and small ribosomal subunits, respectively. A third rRNA type is called 5S, and also becomes part of the large subunit. The rest of the ribosome structure consists of a number of proteins linked around the central rRNA molecules. Complete cytoplasmic ribosomes have a sedimentation coefficient of 80S, with the large and small subunits being 60S and 40S, respectively. Prokaryote-type ribosomes found in plastids and mitochondria are somewhat smaller at 70S, but function in a similar manner.

Transfer RNA synthesis takes place in the nucleus. The molecules have no cap or polyA tail, but do undergo other processing, then finally a very precise molecular folding into a clover leaf structure containing an anticodon triplet. This is complementary to the mRNA codon, and the adaptor-binding properties which allow it to collect the appropriate amino acid and make it available for linkage into a polypeptide chain.

(c)  Translation: ‘RNA → protein’ in the cytoplasm

Translation of nuclear mRNA occurs on ribosomes located in the cytoplasm. The intricate structure of ribosomes enables them to carry out this task. Initiation of protein synthesis occurs on the small subunit where an initiator tRNA carrying a methionine molecule interacts with an mRNA molecule (Figure 10.26). The 5' end of the mRNA then forms an initiation complex with an rRNA molecule on the large ribosomal subunit to generate two sites which are sequentially occupied by ‘anticodon’ aminoacyl-tRNAs. In this way amino acids associated with the two attached tRNAs are aligned to allow peptide bond formation. The initiator tRNA then exits and the second tRNA site relocates to its position, allowing a third tRNA to enter ready for formation of the second peptide bond. The cycle continues until the last tRNA enters and the final peptide bond forms. The completed polypeptide is released and the ribosome is free to initiate translation of another mRNA. When translation is rapid, many ribosomes are moving along a single mRNA at any one time. This ‘beads on a chain’ configuration is called a poly-ribosome or poly-some (Figure 10.24) and allows simultaneous synthesis of several copies of the polypeptide. Post-translational modification of polypeptides leads to formation of mature enzymes, structural proteins and regulatory proteins. Many of the last category are themselves active in regulation of transcription and mRNA stability. Perhaps surprisingly, most mRNA molecules have a very short half-life — averaging 30 min. This means that sustained active translation normally requires continuous pro-duction of new mRNA, but also gives cells a mechanism for rapidly changing the types of proteins it produces.


Figure 10.26 Translation of mRNA into polypeptides occurs on ribosomes as a three-step cycle which repeats until the stop codon is reached. The polypeptide is synthesised from the N-terminal end based on the mRNA triplet codes matching complimentary anticodon triplets of aminoacyl-tRNA molecules. In step 1, the aminoacyl-tRNA binds to its codon, lining up its amino acid with the previous one added to the polypeptide chain. In step 2, a peptide bond forms between these two amino acids, and in step 3 the ribosome moves three nucleotides along the mRNA which ejects the previous tRNA and resets the system for acquisition of the next aminoacyl-tRNA.

(Based on Alberts et al. 1994; reproduced with permission of Garland Publishing Inc.)

We can picture the sequential translation process as either the ribosome running along the mRNA or the mRNA passing through the ribosome. In the case of ribosomes bound to endoplasmic reticulum, the latter is more accurate because the growing polypeptide enters the endoplasmic reticulum lumen. These proteins are typically destined to be part of Golgi apparatus or plasma membrane structure, or may ultimately be secreted outside the cell, for example a-amylase enzyme from cereal seed aleurone cells.

(d)  Regulation of gene expression

Every living cell type has a particular set of expressed genes, in effect a blueprint, but this complement will change during development, for example genes relating to different enzymes required for cell division compared with those needed to sustain photosynthesis in mature leaf mesophyll. In addition, expression of many plant genes is highly responsive to a wide range of environmental signals. The physiological effects of these factors were discussed earlier (see Chapter 8) and later in this chapter we describe the molecular mechanisms. How is differential gene expression coordinated and regulated precisely and reliably for each cell type, for each stage of development and as adjustments in response to external factors? Presence or absence of a given protein for any given gene is generally due to regulation at the transcriptional or post-transcriptional levels. The former means that the gene is effectively switched off and no mRNA is produced. The latter relates to processes acting on the mRNA preventing it from proceeding to trans-lation. In addition, many polypeptides are processed further during post-translational modification before they become mature functional proteins. This may include excision of part of the amino acid sequence, complexing with other poly-peptides to form multi-subunit proteins, or covalent linkage to other molecules such as carbohydrates to generate glycoproteins.

Transcriptional regulation

Transcription is dependent on the rate of initiation of synthesis of RNA molecules by RNA polymerase, which in turn depends on access to the 5' end of the gene. As described earlier, this is the promoter region, to which binds a diverse class of proteins called transcription (or trans acting) factors that determine whether RNA polymerase can operate.

Promoters contain two main regions (Figure 10.25). The first is near the transcription start site where RNA polymerase binds and is the ‘TATA box’, which is about 30 bases upstream of the start codon (many important regions of genes are called ‘boxes’ probably because scientists annotate such sequences by highlighting them with a box outline). Several transcription factors interact to determine whether and for how long transcription proceeds. One key component is a complex called TFIID which consists of the TATA binding protein (TBP) and other TBP-associated factors (TAFs). The second region is further upstream and consists of a modular series of regulatory DNA sequences, often called cis elements, that each recognise different transcription factors. Within each tran-scription factor protein is a highly specific DNA binding domain and also sometimes an activation domain that interacts with other proteins to influence the basal transcription machinery.

Molecular biologists ‘dissect’ promoters (see Section 10.3.3) into their component elements and often construct artificial promoters with known segments deleted or additional ones inserted. Any particular deletion that influences transcription tells us that that sequence is important for regulation of expression, that is, we can identify precisely which are the control sequences within the whole promoter region and ultimately exactly where each transcription factor will bind. Genes often need to be expressed in groups, for example to generate all the enzymes required for a biosynthetic pathway. We often find that these genes have parts of their promoters in common, allowing a single switch to regulate a suite of genes. Enhancer sequences distant from the promoter region bind specific proteins that also interact with promoter sequences and thus influence ease of transcription by means of forming lengthy DNA loops. Finding enhancer sequence modules is not simple because it is hard to predict how far away they are from the relevant gene.

Post-transcriptional control — mRNA translation and stability

Between transcription and translation are several further potential control points, including RNA splicing and regulating mRNA exit from the nucleus. There are also untranslated regions at both the 3' and 5'ends of mature mRNA (Figure 10.25). Specific proteins binding to the 5' end can block translation, whereas proteins binding to the 3' end may cover ribonuclease (RNase)-sensitive sites and hence protect the mRNA from enzymatic degradation. This may be one reason why half-lives differ between mRNA types.

Protein turnover

Any one protein will normally be synthesised in the ‘right’ place at the ‘right’ time, but still may only be required for a brief period, perhaps a transient phase of cell differentiation. Control of protein stability is therefore important, and in the case of multi-subunit complex enzymes, there need to be appropriate amounts of each component. Often proteins that do not assemble properly are degraded more rapidly. Protein degradation is influenced by the type of N-terminal amino acid and by the ubiquitin class of proteins. Ubiquitin targets proteins that are not linked into their normal complexes, or are incorrectly folded, and so have exposed ubiquitin binding sites. Ubiquitin binding acts as a ‘tag’ for proteolysis (see Section 10.3.4 and Figure 10.35). On the other hand, chaperonins are proteins that facilitate protein assembly and folding. They are synthesised in response to stress when cellular proteins are generally more vulnerable to degradation.