What is Raw DNA Data?

By Gene Heritage

When you mail in your saliva sample to AncestryDNA™ or another DNA collector they use a nifty device called a microarray to map your genome. They don’t look at your entire genetic code, because our genomes are by and large 99.9% the same across all humans (I bet you feel soooo special now). Instead, they focus on the 0.1% of known common genetic variation between us. These variations, also known as markers, or SNPs (“snips”), account for most of the human genetic variations you observe. They account for differences in eye color, skin pigmentation, and many other traits.

“Raw DNA data” sounds like a mouthful but it’s simply a computer file that lists out these SNPs, along with other useful information which we’ll get to later. It’s a text file you can save to your desktop, or double-click to open.

AncestryDNA™ and 23andMe provide raw data in a .txt format (compressed during download as a .zip); FTDNA provides it as a .CSV file (compressed as a .gz).

More about Your Raw DNA Data

While raw DNA data doesn’t contain your entire genome, it does include, oh, just several hundred thousand SNPs. AncestryDNA™’s and FTDNA’s raw data files include about 700,000 markers; 23andMe’s include anywhere from about 600,000 to 1 million markers depending on when you were tested. Here’s an approximate SNP count for the various DNA collectors:

Approx. # SNPs DNA Collector
701,480AncestryDNA v1 (pre-May 2016)
668,940AncestryDNA v2
571,43023andMe v2 (really old)
949,46023andMe v3 (pre-Nov 2013)
552,54023andMe v4 (pre-Sep 2017)
620,29023andMe v5
700,000FamilyTreeDNA (various versions)
720,710MyHeritage v1 & v2
606,130Living DNA v1
536,070Genes for Good v1
178,600Geno 2.0

If I had a dollar for every SNP I have…

DNA collectors only use a very small portion of your DNA to generate their results. The big draw of third-party tools like Gene Heritage is they give you a second chance to mine your unused raw DNA data for even more information about yourself. The difference in SNP counts between DNA collectors explains why coverage may vary with your third-party tool.

Typically all you need to do with your raw DNA data is download it from your DNA collector and upload it to a third-party party tool. But if you want to get in touch with your inner geek by ogling a long list of your single nucleotide polymorphisms, or SNPs, open your raw DNA file in a text editor and you’ll see something like this:

This is a screenshot from my very own AncestryDNA™ raw DNA data (please don’t use it to clone me; I’m not that special, trust me). Raw DNA from other collectors looks very similar, typically with five columns corresponding to:

  • Your SNPs (each identified by a code called a rsID number)
  • The chromosomes on which each SNP is located
  • The position of each SNP on the chromosome
  • Your two alleles for each SNP (one comes from your father, one from your mother). Possible allele letters are A (adenine), C (cytosine), G (guanine), T (thymine), or 0 (for missing data).

In this screenshot, the highlighted line corresponds to a variation influencing my eye color.

From this highlighted line I can see that:

  • The rsID of the SNP is rs12913832
  • The SNP is located on my 15th chromosome (each human has 23 chromosomes)
  • I inherited one A allele from my mother and another A allele from my father.
SNPs are each made of a pair of letters, some combination of A and/or T or C and/or G. Since you get one letter (or allele) from each other parent, the possible combinations are AA, AT, or TT.

Posted