How to use DNA databases to catch the perpetrator of a crime
DNA testing is helping to catch more criminals than ever - here's how it's done.
PERP: Crime scene investigators collect biological material such as blood, semen, hair or skin. The DNA molecule decays over time and has a 521-year half-life under ideal conditions (-5ºC), but degrades quickly when exposed to heat, light, water and air. Whether DNA stays viable also depends on how well it’s stored. The oldest DNA recorded was found in Greenland ice, and was estimated to be between 450,000 and 800,000 years old.
USER: When someone orders a £100 DNA testing kit from direct-to-consumer genomics companies like 23andMe, Ancestry or MyHeritage, they spit into a collection tube or take cells from a cheek swab, then post the sample to the company. After four to eight weeks, they log into their account for a report on their genetic variants.
PERP: DNA molecules are cut into fragments and added to a ‘genotyping chip’. Genotyping chips are covered in an array of 700,000 microscopic wells, each containing a probe that will match a genetic variant, which may or may not be present in the DNA sample. If a fragment matches, it can be labelled with one of several fluorescent dyes that enables a computer to read each associated DNA letter.
USER: Genomics companies use different genotyping chips, depending what that company believes are the most informative genetic variants. Most companies allow users to download a text file showing their genetic variants or ‘genotypes’, which are also uploaded to GEDmatch, a searchable genealogy website.
PERP: The DNA profile of genotypes for one person, the criminal, is compared against the profiles from other people in GEDmatch’s public database. Each genotype is a single DNA letter or ‘single nucleotide polymorphism’ (SNP) at one of 700,000 positions that vary in the human population. The pattern and number of SNPs shared by any two people is used to calculate their genetic similarity.
USER: A database search is unlikely to return matches with high similarity. As a person inherits half their DNA from each parent, they’re 50 per cent similar to their mother and father, 25 per cent to a grandparent. For each generation since cousins last shared a common ancestor, similarity is reduced by a quarter. So first cousins share roughly 12.5 per cent of their DNA, second cousins 3.125 per cent, etc.
Genetic similarity can show if two people are related, but not the relationship between them. You share half your DNA with each parent, as do any siblings, so a 50 per cent match could be mum or dad, sister or brother. Going back to fifth cousins (sharing great-great-great-great-grandparents) the overlap is just 0.05 per cent – so you’re effectively unrelated.
The chances of matching many first or second cousins in a database the size of GEDmatch, which contains one million DNA profiles, is extremely low (unless your family’s keen on ancestry). But there’s a high probability that you’ll find tens or even hundreds of third and fourth cousins, as there are enough profiles and their DNA is still similar enough to be identified through genetic similarity.
Genetic genealogists build a family tree by applying traditional techniques to database information, such as the names of two people and their DNA similarity. This includes finding records like census data, newspaper obituaries and birth and marriage certificates, and interviewing living relatives. Nowadays it also involves figuring out relationships via Facebook and other social networks.
Once links between cousins are confirmed, a genetic genealogist works backwards to find where the separate branches of a tree are connected at their long-dead ancestors. Recent twigs of the family tree – living relatives – are then added by building forwards, what expert CeCe Moore calls ‘reverse genealogy’, which can sometimes succeed in ‘triangulation’ – when two distant branches intermarry.
More like this
Police, FBI and other law enforcement agencies use conventional investigative methods before arresting a suspect. For the recent arrest of the Golden State Killer, investigators only had a list of third cousins (sharing great-great-grandparents and under 1 per cent genetic similarity) so they had to use an extensive process of elimination (the offender was about 5’9” and 75kg, for example) to narrow it down to Joseph DeAngelo.
Identifying a suspect requires putting all the pieces of a puzzle together. In Moore’s first genetic genealogy case, GEDmatch had a list of possible relatives that included two distinct matches, each with a 3 per cent similarity to the killer’s DNA. This suggested they were second cousins from different branches of the family tree, enabling Moore to identify William Talbott II by triangulation.