Genomic libraries are constructed by isolating the complete chromosomal DNA from a cell, then digesting it into fragments of the desired average length with restriction endonucleases. This can be achieved by partial restriction digestion using an enzyme that recognises tetranucleotide sequences. Complete digestion with such an enzyme would produce a large number of very short fragments, but if the enzyme is allowed to cleave only a few of its potential restriction sites before the reaction is stopped, each DNA molecule will be cut into relatively large fragments (Table 1). The average fragment size will depend on the relative concentrations of DNA and restriction enzyme, and in particular, on the conditions and duration of incubation (Figure 1). It is also possible to produce fragments of DNA by physical shearing, although the ends of the fragments may need to be repaired to make them flush-ended. This can be achieved by using the Klenow fragment which does not possess 5′→3′ exonuclease activity and will fill in any recessed 3′ ends on the sheared DNA using the appropriate dNTPs.

Table1. Numbers of clones required for representation of DNA in a genome library

Fig1. Comparison of (a) partial and (b) complete digestion of DNA molecules at restriction enzyme sites (E).
The mixture of DNA fragments is then ligated with a vector, and subsequently cloned. If enough clones are produced, there will be a very high chance that any particular DNA fragment, such as a gene, will be present in at least one of the clones. To keep the number of clones to a manageable size, fragments about 10 kb in length are needed for prokaryotic libraries, but the length must be increased to about 40 kb for mammalian libraries. It is possible to calculate the number of clones that must be present in a gene library to give a probability of obtaining a particular DNA sequence. This formula is:

where N is the number of recombinants, P is the probability and f is the fraction of the genome in one insert. Thus for the E. coli DNA chromosome of 5 × 106 bp and an insert size of 20 kb ( f = 0.004), the number of clones needed ( N ) would be 1 × 103 , with a probability of P = 0.99.