When you mail in your saliva sample to AncestryDNA™ or another DNA collector
they use a nifty device called a microarray to map your genome. They don’t look
at your entire genetic code, because our genomes are by and large 99.9% the same
across all humans (I bet you feel soooo special now). Instead, they focus on
the 0.1% of known common genetic variation between us. These variations, also
known as markers, or SNPs (“snips”), account for most of the human genetic
variations you observe. They account for differences in eye color, skin
pigmentation, and many other traits.
“Raw DNA data” sounds like a mouthful but it’s simply a computer file that
lists out these SNPs, along with other useful information which we’ll get to
later. It’s a text file you can save to your desktop, or double-click to open.
AncestryDNA™ and 23andMe provide raw data in a .txt format (compressed during
download as a .zip); FTDNA provides it as a .CSV file (compressed as a .gz).
More about Your Raw DNA Data
While raw DNA data doesn’t contain your entire genome, it does include, oh,
just several hundred thousand SNPs. AncestryDNA™’s and FTDNA’s raw data files
include about 700,000 markers; 23andMe’s include anywhere from about 600,000 to
1 million markers depending on when you were tested. Here’s an approximate SNP
count for the various DNA collectors:
|Approx. # SNPs
|701,480||AncestryDNA v1 (pre-May 2016) |
|668,940||AncestryDNA v2 |
|571,430||23andMe v2 (really old) |
|949,460||23andMe v3 (pre-Nov 2013) |
|552,540||23andMe v4 (pre-Sep 2017) |
|620,290||23andMe v5 |
|700,000||FamilyTreeDNA (various versions) |
|720,710||MyHeritage v1 & v2 |
|606,130||Living DNA v1 |
|536,070||Genes for Good v1 |
|178,600||Geno 2.0 |
If I had a dollar for every SNP I have…
DNA collectors only use a very small portion of your DNA to generate their
results. The big draw of third-party tools like Gene Heritage is they give you
a second chance to mine your unused raw DNA data for even more information
about yourself. The difference in SNP counts between DNA collectors explains
why coverage may vary with your third-party tool.
Typically all you need to do with your raw DNA data is
download it from
your DNA collector and
upload it to a third-party party tool.
But if you want to get in touch with your inner geek by ogling a long list of
your single nucleotide polymorphisms, or SNPs, open your raw DNA file in a text
editor and you’ll see something like this:
This is a screenshot from my very own AncestryDNA™ raw DNA data (please don’t
use it to clone me; I’m not that special, trust me). Raw DNA from other
collectors looks very similar, typically with five columns corresponding to:
- Your SNPs (each identified by a code called a rsID number)
- The chromosomes on which each SNP is located
- The position of the each SNP on the chromosome
- Your two alleles for each SNP (one comes from your father, one from your
mother). Possible allele letters are A (adenine), C (cytosine), G (guanine), T
(thymine), or 0 (for missing data).
In this screenshot, the highlighted line corresponds to a variation influencing my eye color.
From this highlighted line I can see that:
SNPs are each made of a pair of letters, some combination of A and/or T or C
and/or G. Since you get one letter (or allele) from each other parent, the
possible combinations are AA, AT, or TT.
- The rsID of the SNP is rs12913832
- The SNP is located on my 15th chromosome (each human has 23 chromosomes)
- I inherited one A allele from my mother and another A allele from my father.