Sequencing the immune response

August 11, 2011 § Leave a comment

Our immune system has quite a problem on its hands: it needs to notice and fight off invaders of all kinds, including bacteria and viruses that evolve extremely rapidly relative to us. There are two obvious strategies for dealing with such attackers: the first is to look for a hard-to-change tag that the attacker usually carries, rather as one army recognizes and attacks the uniform of another. This is the strategy a neutrophil uses in recognizing the formylated peptides produced by bacteria. The second is rather like the method used by the inhabitants of an isolated village when a visitor from the big city arrives: a local person knows everyone who “belongs”, and if you’re not recognized as belonging then you must be foreign. The immune system uses both strategies: the innate immune system, generally speaking, recognizes tags, while the adaptive immune system takes the “you’re not from around here” approach. To tell the difference between locals and invaders, the adaptive immune system uses a method that once upon a time seemed counterintuitive, but perhaps will not seem so to today’s readers. The method depends on exploration and selection: first, the cells of the adaptive immune system produce an array of recognition proteins with the widest possible range of reactivities, each of which could be helpful, or useless, or harmful. An individual cell expresses only one of these recognition proteins. Next, each recognition protein is tested for whether it reacts to “self”. If it does, the cell expressing it is killed. What’s left after this rather brutal procedure is a set of cells expressing recognition proteins that could react to almost anything; the only thing they won’t do, at least in theory, is attack the person or animal producing them. And thus, if you’re “not from around here”, you run into a rather violent reception, while if you’re a local you’re benignly ignored.

The recognition proteins of the adaptive immune system come in two flavors, T cell receptors and antibodies. Both use the same principle to create a wide range of recognition proteins: combinatorial gene rearrangement. The business end of an antibody, the part that has the potential to bind to a specific target (if the correct target comes along), is built of three separate segments, the V (variable) D (diversity) and J (joining) segments. There are many copies of each of these segments in the genome, and to produce an antibody the B cell carves up its DNA and rearranges it so that just one V segment sits next to one D segment and one J segment. Each antibody is made up of two copies each of two chains of different sizes, called “heavy” and “light”, and each chain uses its own set of gene segments (VDJ for heavy, VJ for light). For the human heavy chain there are between 55 and 65 Vs, 27 Ds and 6 Js, i.e. over 10,000 possible combinations. On top of this, there are special mechanisms that randomly chew up and add back nucleotides at the junctions between each segment (adding junctional diversity), and that encourage specific regions of the immunoglobulin gene to mutate rapidly (called somatic hypermutation). When a particular antibody turns out to be useful, the cell producing it is stimulated to divide (this is when somatic hypermutation happens) and the sequence of the gene producing the antibody becomes more strongly represented in the population.

One implication of all this is that you should be able to monitor immune responses by sequencing the antibody genes found in the cells that circulate in the blood. In fact, sequencing might be the only way to get a comprehensive, detailed picture of an antibody response. But there’s some question about whether even sequencing can look deeply enough, at least given current technology. A recent paper (Arnaout et al. 2011. High-resolution description of antibody heavy-chain repertoires in humans. PLoS One 6 e22365) now tackles this question directly. And the results are rather hopeful.

The theoretical diversity of antibody binding sites produced by combining all the mechanisms I’ve described above is practically infinite. Arnaout et al. were interested in asking how diverse the real repertoire in a human actually is, and whether the combinations of segments that they see are random or biased. Using a fairly complex primer design strategy, they were able to amplify and then sequence the VDJ junctional regions of the immunoglobulin heavy chain genes found in blood samples from two human subjects, and in the spleens of four mice. One of the human samples yielded almost 3/4 of a million reads. By comparing these sequences with the known genome sequence, the authors were able to determine whether some segments were preferentially used relative to others, or whether specific combinations are favored. Often, it was not so easy to identify the D segments in a given read: D segments are the shortest, and the combination of junctional diversity and somatic hypermutation can lead to a D segment sequence that is somewhere inbetween the genomic sequences of two or more candidates. But V and J segments were usually possible to assign unambiguously.

It turns out that, far from being random, the VDJ segment usage has a distinct skew. Just 10 V segments were found in about half of the reads from human samples, and 2 of the 6 J segments were highly over-represented. The authors also found a few D/J and V(D)J pairings that were over- or under-represented relative to expectations, but this was a relatively small effect. The picture was broadly similar in mice. Now, the process of VDJ recombination sometimes goes wrong — producing a stop codon somewhere in the rearranged gene, for example —and so it’s possible to ask whether the bias towards specific V segments is due to selection. If selection is important, you should see that some V segments are over-represented in productive rearrangements and not in the non-productive rearrangements. But you don’t, indicating that the preference for certain V regions is built into the way that the genes are structured. It’s not clear why this is: it’s known that the signals for recombination vary at the different segments, so the obvious explanation is that some of these sequences are more efficient than others. But Arnaout et al. found a poor correlation between the sequence of the recombination signal and the over-representation of segments.

Since the authors can not only identify the segments used, but also read the entire sequence from V to J, they can be fairly confident about identifying antibodies that share similar reactivities. The sequence from the V/D junction to the D/J junction is the most diverse part of the antibody, and the most important determinant of the antibody’s binding specificity; it’s called the Complementarity Determining Region 3 (CDR3). If VDJ recombination and selection were entirely random, you would never expect to see the same CDR3 twice. But there have been previous hints that the same CDR3 can come up repeatedly in different individuals. The authors wanted to check this in their dataset, which is much more extensive than previous datasets of this kind, and also to ask how unlikely a pattern like this is to come up by chance. To ask the “how likely” question, they built a statistical model of the whole process of VDJ rearrangement, including the changes produced by the processes that lead to junctional diversity and somatic hypermutation. Their conclusion is that the same CDR3 does indeed come up in different individuals with a significant probability, and that this happens much more frequently than can be explained solely by chance. The most obvious explanation for the fact that antibodies of similar specificities arise in different individuals is that both individuals have for some reason found a particular antibody to be useful, and have selected it from the ever-shifting pool of available sequences. The size of that pool in the blood of an adult human, the authors estimate, is between 3,000,000 and 9,000,000 CDR3 sequences. A large number, but a lot less than the number that’s theoretically possible. And thus, perhaps, a number we can get to grips with.

Arnaout, R., Lee, W., Cahill, P., Honan, T., Sparrow, T., Weiand, M., Nusbaum, C., Rajewsky, K., & Koralov, S. (2011). High-Resolution Description of Antibody Heavy-Chain Repertoires in Humans PLoS ONE, 6 (8) DOI: 10.1371/journal.pone.0022365

It Takes 30