How Autosomal Matching Works
Your Raw Data File
The raw data file you downloaded from your testing lab is a spreadsheet with four or five columns. Those that have four columns simply combine the last two columns into a single column, the meaning is the same. Each spreadsheet will have between 500,000 and 1,000,000 rows.
- RSID: Reference SNP cluster ID. This number identifies the SNP uniquely. All labs use the same set of RSID numbers.
- Chromosome: The chromosome number, 1-23. Some formats use "X" instead of the number 23.
- Position: This is the position of the SNP within the chromosome. Most labs today use position numbers published by the Genome Reference Consortium Build 37.
- Allele: This is the value read by the lab at the given position. The valid values are A, C, G, or T. Some labs use other values to indicate that the equipment failed to read a particular location. Our matching system only looks at the four valid values, ignoring anything else. Each SNP has two alleles; the order is not significant.
Example:
RSID | Chromosome | Position | Result |
rs4477212 | 1 | 82154 | AA |
rs4970383 | 1 | 838555 | CC |
rs4475691 | 1 | 846808 | CT |
rs7537756 | 1 | 854250 | AA |
rs13302982 | 1 | 861808 | GG |
Definitions
- SNP: Each row represents one SNP.
- Centimorgan (cM): A centimorgan is a measure of the probability that a region between two SNPs will recombine.
- Identical SNP: Two kits "match" for a given SNP when both kits have exactly the same result, and both alleles are the same. If both kits being compared have AA for rs4477212, they match at that location.
- Half-Identical SNP: A "half match" is when only one of the two alleles match. Examples of half-matches include AT and AC, AT and AT, AT and CA. In some cases, only one of the two alleles can be read, so AT and A are also a half match.
- Mismatch: A "mismatch", or "error" is a SNP where the two results don't match at all. Errors include AC and TG, CC and GG.
- Stitch: A "stitch" (an unofficial term) is a string of matching SNPs. These can be either identical or half-identical matches. To count as part of a matching segment, there must be at least 50 identical matches without any mismatches. Half-identical matches are allowed within a stitch, but they do not count towards the required 50 in a row.
- Match Count: In a string of SNPs, each identical SNP adds to the match count. Each half-identical SNP does not add to the match count. Each mismatch stops the match count.
- Matching Segment: To qualify as a matching segment, a series of matching "stitches" must total at least 400 identical SNPs, with no more than one mismatch between each stitch. Half-identical matches are allowed, but do not count towards the required 400 SNPs. In addition, the match segment must total at least 7 cM (centimorgans) to qualify.
Kit 1 | Kit 2 | ||||||||
RSID | Chromosome | Position | Result | Count | RSID | Chromosome | Position | Result | |
rs4477212 | 1 | 82154 | AA | Match | 1 | rs4477212 | 1 | 82154 | AA |
rs4970383 | 1 | 838555 | CC | Match | 2 | rs4970383 | 1 | 838555 | CC |
rs4475691 | 1 | 846808 | CT | Half Match | 2 | rs4475691 | 1 | 846808 | CC |
rs7537756 | 1 | 854250 | AA | Match | 3 | rs7537756 | 1 | 854250 | AA |
rs13302982 | 1 | 861808 | GG | Mismatch | 0 | rs13302982 | 1 | 861808 | TT |