Although many expression systems, ranging from bacterial to eukaryotic and cell free translation systems, are available, the amplification of most proteins is problematic and success is often a matter of trial and error. In particular, eukaryotic membrane proteins and large oligomeric complexes are difficult to produce in functional form and in quantities sufficient for structural analysis. In cases where functional overexpression of membrane proteins fails, the proteins are usually not targeted to the plasma membrane and may be misfolded. Knowledge of bottlenecks in expression will aid the design of new and optimization of existing systems, for instance, by tailoring particular organisms. A number of recent developments in the utilization of Saccharomyces cerevisiae as host for recombinant protein expression, e.g., construction of tuneable integration vectors and the elucidation of protein quality control mechanisms in the endoplasmic reticulum, suggest rational approaches for the engineering of yeast strains for the high-level expression of functional membrane proteins. Reporter strains have been already constructed that utilize various protein quality control pathways of yeast, like the unfolded protein response, to monitor the levels of unfolded/misfolded proteins in the endoplasmic reticulum. Other strategies may include increasing the secretory capacity or the co-expression of chaperones, foldases or ribosome receptors to overcome limitations in the folding and/or targeting of recombinant soluble and membrane proteins towards the endoplasmic reticulum. Similar approaches may lead to improved biosynthesis of functional proteins in bacterial, insect or animal cells. The goal should be to identify the bottlenecks in gene expression and generate technologies to produce individual proteins, protein complexes and hydrophobic proteins at a high rate.
The structure determination of membrane proteins, large multi-domain eukaryotic proteins and protein complexes is not only lagging behind due to difficulties in the over-production, purification and stability of these proteins, but also due to specific problems in crystallisation and labelling of these proteins. Generic methods, based on new approaches towards protein expression, need to be developed to address these problems. For instance, matrix- and interface-assisted 3D crystallization methods have a strong potential for obtaining high quality crystals of some proteins for X-ray analysis, but a major bottleneck is the visualization of non-coloured protein. This could be solved by chromophore-labelling of the protein, either biosynthetically (amino acid analogs or tagging at the gene level) or post-translationally.
Major hurdles also exist in finding conditions for expression and in obtaining the media and isotopes required for site specific labelling for use in NMR (13C, 15N, 2H) and other spectroscopic approaches (e.g.: 2H for FTIR). Many sources of labels are scarce, still under developments and often expensive. Rather less readily available are amino acids for isotopic highly specific labelling of proteins for structural studies, and the price and quantity required for such studies are often wholly prohibitive. In addition, although expression may be successful on defined media, expression on minimal media, which includes isotopically labelled compounds, may not be successful. Each of these hurdles requires breakthroughs in the production of the protein, which may have to come from entirely new expression systems.
For high-level expression, the intended sub-cellular localization of the protein is an important consideration. The simplest strategy is often expression in the cytoplasm, either as a soluble protein or in inclusion bodies, but it can be advantageous to choose other locations in certain situations. Thus, disulphide bonds do not normally form in the cytoplasm, N-linked glycosylation and many other post-translational modifications cannot take place in the cytoplasm, and purification may be easier if the protein is secreted to the medium. For membrane proteins, re-folding from inclusion bodies is often a difficult step, and it is better if the protein can be expressed in a functional state in a suitable membrane.
There are considerable difficulties involved in trying to express proteins in non-cytoplasmic compartments, however. Thus, not all proteins can be efficiently translocated across cellular membranes or moved through the secretory pathway, and high-level over-expression may saturate transport machineries and compromise cell viability. In general, the cellular reactions to over-expression of secretory and integral membrane proteins are poorly understood, and more work needs to be done to provide a firm foundation for rational engineering of strains that can cope with such stress situations.
A major problem in the production of recombinant proteins is their low solubility and stability. For instance, many eukaryotic proteins are found in inclusion bodies when over-produced in Escherichia coli. In some cases, expression as inclusion bodies and subsequent refolding may be advantageous when large amounts of a labile, toxic or disulphide-bonded protein are needed, especially in labelled form. Here, folding screens similar to those in crystallography have to be established which will allow using this methodology on a routine basis. However, the role that the different bacterial chaperones and foldases play in the folding of newly synthesized polypeptide chains has also been well studied, and this knowledge could be used to develop new protocols for the production of soluble proteins in E. coli.
The situation is very different for the production of proteins in eukaryotic cells. First, much less is known about the protein folding processes in eukaryotic cells. Some chaperones and foldases have been found but often their precise role in the folding processes is not fully understood. Second, recent work has shown that most proteins are found in one or more multi-protein complexes in the eukaryotic cell. Little is known about the proteins (e.g. chaperones) and processes involved in the in vivo formation of these complexes. Studying the functional mechanisms of known chaperones and foldases, and discovering new ones, will increase the understanding of the folding processes in the eukaryotic cell, as well as lead to the development of new protocols for the production of soluble proteins in yeast, and in insect and mammalian cell lines. Further, the proteins involved in the protein complex formation and stabilization will need to be identified and their roles studied. This knowledge will help to establish protocols for the successful over-production of multi-protein complexes
Finally, to obtain protein crystals of a recombinant purified protein, it is important to have mono-disperse samples of fully functional protein at (relatively) high concentration that are stable for long periods of time, and ways of achieving such conditions using genetic strategies, need to be made more routine. Catalytic activity is an indirect but important indicator of structural integrity of a protein, but for many complex systems this is not easily determined. More work on the development of methods to screen the functional state(s) of proteins and protein complexes in soluble, surface-associated or membrane-embedded state needs to be done.