The differents between ends-paired and mate-paired

bengua1985 2012-11-18

展开全文

"paired end" or "mate pair" refers to how the library is made, and then how it is sequenced. Both are methodologies that, in addition to the sequence information, give you information about the physical distance between the two reads in your genome.

For example, you shear up some genomic DNA, and cut a region out at ~500bp. Then you prepare your library, and sequence 35bp from each end of each molecule. Now you have three pieces of information:

--the tag 1 sequence
--the tag 2 sequence
--that they were 500bp ± (some) apart in your genome

This gives you the ability to map to a reference (or de novo for that matter) using that distance information. It helps dramatically to resolve larger structural rearrangements (insertions, deletions, inversions), as well as helping to assemble across repetitive regions.

Structural rearrangements can be deduced when your read pairs map to a reference at a distance that is substantially different from how that library was constructed (~500bp in the above example). Let's say you had two reads that mapped to your reference 1000bp apart...this suggests there has been a deletion between those two sequence reads within your genome. Same thing with an insertion, if your reads mapped 100bp apart on the reference, this suggests that your genome has an insertion.

Mapping over repeats is similar...if one read is unmappable because it falls in a very repetitive region (eg. LINE, LTR, SINE), but the other is unique, you can again use that distance information to map both reads. The first read would likely come from the repeat that is ~500bp away from your unique second read.

Hope that helps. It's a weird concept at first, but very useful for all types of sequencing. It's been around at some levels since the days of shotgun sequencing (http://en./wiki/Shotgun_sequencing).

And lastly, the terminology between "paired end" and "mate pair" is typically that "paired end" refers to sequencing both ends of the same molecule, while "mate pair" (in ABI's case) refers to sequencing only two tags (made by Type IIS restriction enzymes a la SAGE) from the ends of a typically much larger molecule. I could be wrong here though... :D