Discussion on protein structure formation usually begins with the primary structure or amino acid sequence of the protein, not with amino acid composition. Although we happened to use amino acid composition for investigation of protein structure formability, it resulted in interesting conclusions, as described below.
Structure formability is the same for any protein of the same amino acid composition, that was randomly selected for assembling. This means that every protein synthesized by random peptide bond formation among amino acids in the amino acid composition could be folded into similar but into different structures. Proteins can have the same amino acid composition but different sequences. We call such a specific amino acid composition that is favorable for protein structure formation “protein 0th-order structure”.
Since the genetic code occupies a core position connecting genetic function with catalytic function in the fundamental life system, the origin and evolution of the genetic code is quite important for understanding formation process of the fundamental life system composed of gene, genetic code and protein. In facts, GNC-SNS primitive genetic code hypothesis gave an opportunity for proposing [GADV]-protein world hypothesis (GADV hypothesis) on the origin of life.
The GNC-SNS primitive genetic code hypothesis assumes that the universal genetic code or standard genetic code originated from GNC primeval genetic code encoding the respective [GADV]-amino acids with four codons through SNS primitive genetic code, which codes for 10 amino acids with sixteen codons. According to the hypothesis, it is considered that the both substantially and formally triplet genetic code evolved from substantially singlet but formally triplet GNC code through substantially doublet but formally triplet SNS code.
I started from a study on the entirely new ancestor genes, i.e. the first ancestor genes in gene families consisting of homologous genes. From analyses of microbial genes and proteins obtained from the GenomeNet Database, I found that the first ancestor genes could be produced from non-stop frames on anti-sense strands of, not AT-rich, but GC-rich microbial genes [GC-NSF(a)].
This conclusion was mainly based on the facts that hypothetical proteins encoded by GC-NSF(a)s satisfied six conditions for folding of polypeptide chains into water-soluble globular proteins (hydropathy, α-helix, β-sheet and turn/coil structure formations, acidic amino acid and basic amino acid compositions) and that the probability of stop codon appearance is sufficiently small to produce non-stop frames on the GC-NSF(a)s.
The six conditions were obtained by examining the average values of extant proteins plus/minus standard deviations. Those average values of most proteins held nearly-constant levels, regardless of GC contents, which were obtained by calculation using amino acid structural indexes and amino acid compositions of currently observed microbial proteins encoded by seven microbial genomes with different GC contents.
A possible evolutionary process of emergence of life based upon the GADV hypothesis is as follows. [GADV]-amino acids were synthesized on the primitive Earth. It is well known that [GADV]-amino acids can be easily synthesized in Miller type experiments. [GADV]-proteins were produced, for example, by repeated heat-drying processes of [GADV]-amino acids in tide pools on the primitive Earth, and were further accumulated by pseudo-replication to form [GADV]-protein world. Subsequently, nucleotides and oligonucleotides were synthesized by their high catalytic activities in the world. The accumulation of oligonucleotides triggered the generation of GNC primeval genetic code through stereospecific complex formation among four [GADV]-amino acids and four corresponding GNC-containing oligonucletoides.
More efficient synthesis of [GADV]-proteins with the complexes than direct synthesis among individual [GADV]-amino acids assisted establishing the GNC primeval genetic code generation. Next, GNC-repeating sequences were produced by random phosphodiester bond formation on chiral [GADV]-proteins or by linear arrangement of GNC codons in the complexes of GNC-containing oligonucleotides and [GADV]-amino acids.
Thus, the first single-stranded (GNC)n gene was created, when one (GNC)n sequence encoding a [GADV]-protein with the required function was selected from a pool of (GNC)n polynucleotides, leading to the emergence of the first life. How the “chicken and egg relationship” between genes and proteins was formed on the primitive Earth also can be explained from the standpoint of GADV hypothesis as going up from the lower ([GADV]-protein synthesis) to the upper stream (creation of genes) of the genetic flow. In the RNA world hypothesis, it seems difficult to find a reasonable strategy for creation of the first gene. The notion of GNC primeval genetic code gave a motivation for introduction of the new concept or pseudo-replication of [GADV]-proteins.
Genetic information in the form of DNA base sequences or codon sequences is transformed into mRNA and then into amino acid sequence of proteins, according to the genetic code. But, double-stranded DNA, which carries genetic information, cannot be replicated without enzyme proteins, whereas proteins cannot be reproduced without genes. This dilemma made it difficult to account for the origin of life: this is the so-called chicken and egg relationship between genes and proteins in the life system.
The RNA world hypothesis on the origin of life is generally considered as the key to solve the “chicken and egg dilemma” concerning the evolution of genes and proteins as observed in the modern organisms. This hypothesis, however, contains several serious weak points, as followings. (i) Nucleotides would never be synthesized under pre-biotic conditions through a random combinatory process from simple chemical compounds such as water, carbon dioxide, methane, without proteineous enzymes. (ii) Existence of four hydroxyl groups on ribose also makes it difficult to synthesize RNA by joining nucleotides in the absence of enzyme catalysts. (iii) Self-replication of RNA must be practically impossible due to the following self-contradiction. RNA without any stable tertiary structure would be required to exhibit genetic function as a template, and, simultaneously, RNA would have to be folded into a stable tertiary structure to exhibit its catalytic function.
I have a counterproposal called [GADV]-protein world hypothesis, abbreviated as GADV hypothesis, in which I have suggested that life originated from a [GADV]-protein world, which comprised proteins composed of four amino acids: Gly [G], Ala [A], Asp [D], and Val [V]. A new concept “pseudo-replication” is crucial for the description of the emergence of life. The new hypothesis not only plausibly explains how life originated from the initial chaotic protein world, but also how genes, the genetic code, and proteins were originated and co-evolved.