We Are Data – Part One of a Two-Part Series

The mysteries of unknown fate and unexplored lineage weave through the investigations of forensic genealogist Colleen Fitzpatrick, Ph.D. Also an engineer and physicist, she outlines the need for collaboration and data sharing between forensics and genealogy, offers thoughts on future developments, and touches on two historic cold cases.
We Are Data - Part One of a Two-Part Series

“I’ve been applying forensic genetic genealogy to cold case work since 2011. It seemed logical to me as an engineer and scientist, and as someone interested in history, people, and psychology, to apply genetic genealogical data in this novel way.”

Colleen Fitzpatrick, Ph.D., is often asked how a nuclear physicist and laser scientist found her way to genetic genealogy. For her, it’s all about data. “It’s the common thread running through all of my careers — data, and how to look at it, how to make sense of what it means,” Dr. Fitzpatrick said. “When you have a lot of data, as with autosomal DNA and SNP testing, it’s a challenge to work with a poor signal-to-noise ratio. How do you find the “signal” as the indicator of what the data means, in an ocean of data that doesn’t really matter?”

Only in a “perfect world,” does Dr. Fitzpatrick believe every cold case that has DNA available can be solved. “Not only do you need the DNA from a victim or a killer, you need a database to compare it to. If that database is small or unavailable, you may not solve a case. Right now, the genetic genealogy databases are limited,” she said, “but those databases are growing rapidly, so there’s hope.”

Forensics, genetic genealogy, and direct-to-consumer testing companies

The forensic side of DNA identification is by nature a slow and meticulous process. “Forensics is often concerned with life-and-death situations, so the methods of DNA analysis must be validated and standardized,” Dr. Fitzpatrick said.

“On the other hand, the development of genetic genealogy is market driven,” she continued. “Companies like Ancestry.com and 23andMe promote tests that satisfy personal curiosity so that genetic genealogy product lines have expanded much more rapidly compared to forensic testing methods.”

Dr. Fitzpatrick believes there is a way for forensics to make use of genealogy data that may avoid ethical and legal issues. “Genetic genealogical information is to be regarded only as a lead, not as a legal means of identification,” she said. “Legal identification is based on CODIS, fingerprints, or other evidence.

“The fact that forensics uses public data is also important. The initial markers developed by genetic genealogists were Y-STRs borrowed from the forensics community. This makes it possible to find matches to a forensic Y-profile in the genetic genealogy Y-STR databases that have been posted on public websites. Each genealogical profile usually includes a kit number, the most distant ancestor, and other information, but nothing that can be used immediately for personal identification. Since the Y-chromosome is handed down the direct male line of the family along with the family name, a match to a genealogical Y-DNA profile can produce a possible last name for an unknown. The purpose of the search is to find that last name. The identity and the family pedigree of the genealogist who posted that genealogical Y-DNA profile are irrelevant.”

Recent autosomal SNP tests developed by direct-to-consumer (DTC) DNA testing companies such as Ancestry have eclipsed Y-DNA tests in popularity among genetic genealogists. The reason is that autosomal SNP testing can provide information on all lines of a family. Genealogists have found it to be a powerful tool to confirm known genealogies and to fill in gaps in incomplete family pedigrees. This has prompted the growth of enormous DTC proprietary customer databases of autosomal SNP results.

However, because autosomal SNP tests used for genetic genealogy were derived from the biomedical industry and not from forensics, the application of genealogical autosomal SNP data to forensic cases has not been as simple as the application of Y-STR data. “Forensically certified DNA laboratories do not have the capability to produce autosomal SNP data so that law enforcement must rely on private laboratories or service centers for SNP testing,” Dr. Fitzpatrick said. “Furthermore, even if a law enforcement agency has developed a SNP profile from DNA taken from a crime scene, it is still necessary to have a database to which to compare that SNP profile, but DTC DNA testing companies do not allow forensic use of their proprietary databases, citing privacy concerns.”

GEDmatch: the gate in the data fence

To circumvent the DTC prohibition on the use of their company databases, law enforcement uses GEDmatch, a free, online database that is not affiliated with any DTC companies but which accepts data from customers of all DTC companies. “GEDmatch has opened the gate between the forensics and genetic genealogy communities for data sharing,” Dr. Fitzpatrick continued. “You still can’t access the data from the 20 million or so people in the DTC company databases, but you can access the one- million-plus of those who have uploaded their results to GEDmatch.

One such instance of data sharing between the forensic and genealogy communities is the DNA Doe Project, an ongoing collaboration of law enforcement and genetic genealogists, dedicated to restoring identities to the unknown dead. “We have been very careful to position John and Jane Doe cases as a humanitarian effort that would be compatible with the genealogy community’s privacy concerns,” Dr. Fitzpatrick said. “We proceed carefully and our efforts have been well received.

“After we had our first two success stories, the announcement was made that the Golden State Killer had been identified by Paul Holes of the Contra Costa County District Attorney’s Office,” she continued. “It was not clear at first that Dr. Barbara Rae-Venter was involved. Considering that genetic genealogists are nervous about their data being used by law enforcement, had it been announced that one of our own had used GEDmatch to identify the killer, there may have been a backlash in the community and GEDmatch may have been shut down. But the fact that the initial announcement came from law enforcement changed the game. By the time the news emerged that Dr. Rae-Venter had used GEDmatch to help capture one of the most prolific serial killers of all time, there was not much for genetic genealogists to argue about. Brilliant, because the GEDmatch gate has remained open since then for use on both suspect and Doe cases.”

Your DNA horse has left the barn

Dr. Fitzpatrick is blunt when it comes to DNA and privacy in modern times. “Data is data,” she said. “We may tag it with significance because it’s our DNA. But to others, it’s just data. If you put your DNA on the internet, you’ve lost your expectation of privacy. Look how many surveillance cameras you walk under when you shop. You can be ticketed by a camera when there’s no police officer within 50 miles. Facebook can tag you in a photograph on a friend’s page, when all you did was show up for lunch. If we’re worried about privacy, we should have worried 20 years ago, especially before the internet was created.

“We are at a point where autosomal DNA (atDNA) databases can connect hundreds if not thousands of people through their shared DNA,” she continued. “If your relative posts his data on GEDmatch, you are virtually on GEDmatch with him because of the DNA you have in common with him, even if you don’t know him. We’re all networked — it’s the world we live in. In my opinion, the data is there and it’s going to be used. It may be personal and special to you, but it’s not to someone else. So yes, let’s worry about it — but let’s be realistic about what we can expect.”

Lincoln’s Mother — and a Rare mtDNA Haplogroup

We Are Data - Part One of a Two-Part Series
Abraham Lincoln’s mother, Nancy Hanks Lincoln, as envisioned by artist and Lincolniana collector, Lloyd Ostendorf, based on his research. – Indiana State Museum and Historic Sites

The Abraham Lincoln DNA Project was started by Zach Spigelman, M.D., a cancer specialist with an interest in orphan diseases. “Lincoln has no living descendants,” Dr. Fitzpatrick explained. “His parents have no living descendants either. His sister died in childbirth with her son, and his brother died at the age of three days. Since Lincoln’s remains are buried under concrete, the best we could do was to try to get his DNA from relics, although they had to be authenticated before they could be used for our project.”

For authentication, Dr. Fitzpatrick needed to find a family member in Lincoln’s maternal line who could be used as an mtDNA reference. The problem: Almost nothing is known of Lincoln’s mother, Nancy Hanks’ family, or where she came from before she moved to Kentucky in the mid-1780s.

“I started with two possible Nancy Hankses and traced maternal descendants of their sisters,” Dr. Fitzpatrick said. “We were hoping that mtDNA from those descendants would match the mtDNA of the relics, which would be an indication that the relics were authentic and that we had the right Nancy.

“Unfortunately, all the Hanks descendants matched each other, none of the relics matched the other relics, and none of the Hanks descendants matched any of the relics,” she continued. “We were not able to distinguish which Nancy Hanks, if either, was Lincoln’s mother, nor were we able to authenticate any of the relics.

“However, we did have one major discovery — the haplogroup, or the population group associated with the mtDNA of the Hanks descendants. The haplogroup — known as X1c — had never been observed in the western hemisphere. Only two other cases had ever been observed in the world, one in southern Italy and the other in Tunisia. There’s reason to believe his mother’s line goes back to the Mediterranean Basin,” Dr. Fitzpatrick said. “It’s still a puzzle how that DNA got into colonial Virginia.

“There are several ways this could have happened,” she explained. “An interesting, but far-fetched possibility is that the connection could be through Sir Francis Drake, the famous privateer hired by Queen Elizabeth I to destroy King Philip of Spain’s New World settlements. Logbooks show he destroyed Cartagena in the late 1500s and rescued 200 Moorish captives, promising to bring them back home to the Mediterranean. His last stop for resupplying before returning to England was Roanoke, Virginia. When Drake arrived home, records indicate that only 100 galley slaves were on board.

“Where were the other 100? In the chaos, maybe some of those Moorish slaves decided to jump ship and stay in Virginia. We can’t confirm that because the colony of Roanoke disappeared. Yet there were laws on Virginia’s books in the 1600s governing how Islamic persons could inherit land if they married Christians. Where did these Islamic people come from? Is it possible that Lincoln’s mother descended from a female Moorish galley slave who stayed behind in Virginia?”

Imputation: doing more with less

Dr. Fitzpatrick described what’s next on the horizon. “We have developed an interest in working with degraded DNA because we get so many Doe cases in the form of skeletal remains,” she said. “The 20 CODIS markers used for legal identification are not as robust under degradation because they have characteristic lengths and occupy real estate on the genome. If you lose part of a CODIS marker, you lose the whole marker. But considering that Ancestry uses 600,000 SNPs, if you lose half of them, you still have 300,000 SNPs. Even if you lose 90 percent, you still have 60,000 pieces of data. We had a case where the DNA was so damaged, there was only 12 percent of the genome left. We were still able to use GEDmatch to solve the case. Because we use whole genome sequencing, even if there’s a lot of missing data, we can sometimes fill in the gaps using a bioinformatics process called imputation. This allows us to work cases that have been thought impossible.”

“If you put your DNA on the internet, you’ve lost your expectation of privacy. Facebook can tag you in a photograph on a friend’s page, when all you did was show up for lunch. If we’re worried about privacy, we should have worried 20 years ago, especially before the internet was created.”

– Colleen Fitzpatrick, Ph.D., forensic genetic genealogist, Identifinders International

The Amelia Earhart Project

We Are Data - Part One of a Two-Part Series
Colleen Fitzpatrick, Ph.D., is an internationally recognized forensic genealogist, and the founder of Identifinders International. She has been involved in high-profile forensic identification cases, including the identification of the Unknown Child on the Titanic and the Amelia Earhart Project; she is now the forensic genealogist on the Abraham Lincoln DNA Project. She is a past fellow of the Society of Photoinstrumentation Engineers, an associate member of the American Academy of Forensic Science, and a past adjunct professor at Boston University. Her book, Forensic Genealogy, redefined the field for both amateurs and professionals.

In 1937, aviator Amelia Earhart’s Lockheed Electra disappeared over the south Pacific near Nikumaroro Island (then known as Gardner Island), prompting an intensive search for her and navigator Fred Noonan. Curiosity over their fate has lasted more than 80 years. There have been many leads over the decades, but one by one, they have fizzled. “Until the crash site can be found, and a DNA analysis can confirm the remains are those of Earhart and Noonan, it’s in the realm of opinion what happened to her,” Dr. Fitzpatrick said.

If Earhart’s crash site is ever found, Dr. Fitzpatrick is ready. “Amelia Earhart has living family. Obtaining DNA from her family will not be hard to do. Fred Noonan is another story,” she said. “He was an only child whose mother died when he was 4 in 1897. He was married twice, with no children. When working on degraded remains, mtDNA is used for identification because it’s so much more abundant. But since it is inherited along the exclusively female line, it is necessary to find someone maternally related to the deceased who can provide a mtDNA reference sample.”

Dr. Fitzpatrick spent most of a year tracing Noonan’s family pedigree to find such a relative. “Fred Noonan’s mother was from the U.K.,” she said. “Her maiden name was Egan, and she was related to all kinds of Greens, Joneses, Smiths. I had to research his family back to the 1600s and then forward again, trying to find an exclusively female line that had survived to the present. That took about a year. I finally located a maternally linked relative and obtained a DNA sample from that person.

“This wasn’t easy, but it was worth it. If anyone finds Earhart’s wreckage, and the DNA in the remains does not match Amelia’s family, the solution to the mystery of what happened to Amelia Earhart could be in my freezer in Southern California.”

Following the data to the answers

We Are Data - Part One of a Two-Part Series
An mtDNA reference sample from Fred Noonan’s maternal line waits for the day the Earhart crash site is found.

Dr. Fitzpatrick, fascinated by engineering, history, science, and genealogy, is objective in her approach to cold cases, whether they involve John Does or American presidents. Like Melinde Byrne and Barbara Rae-Venter, Ph.D., J.D., in part two of this series, she follows the leads wherever they go, often to places she doesn’t expect, and warns that passionately wanting a particular result can lead to wrong conclusions. “People who love genealogy and DNA, especially those who aren’t scientists, sometimes don’t tick the right boxes and cross-check their results, so they come up with the wrong answers,” Dr. Fitzpatrick said. “Forensic genealogy is a thoughtful process. It takes experience and hard work to reach the right answer — no matter what it is.”

Read part two of this series — our interviews with two experts who weigh in on new developments transforming how cold cases are solved. Please see “The Finders: Cracking Cold Cases with Genealogy, Forensics, and DNA.”


Tools and Terms

Mitochondrial DNA (mtDNA): DNA passed from mother to children of both sexes. Boys have their mother’s mtDNA but don’t pass it down. Girls do, to their male and female children. mtDNA has extremely slow mutation rates.

Y-Chromosome DNA (Y-DNA): DNA passed from fathers to sons.

Autosomal DNA (atDNA): Inherited from the autosomal chromosomes (humans have 22 pairs of autosomes and two sex chromosomes). atDNA is highly specific because it is recombinant from one generation to the next.

Combined DNA Index System (CODIS): An FBI database containing millions of offender DNA profiles, searchable at local, state, and national levels. Analysts use CODIS to search DNA profiles obtained from crime-scene evidence against DNA profiles from other crime scenes and from convicted offenders and arrestees.

GEDmatch: A free, volunteer-run website for people who have already tested their DNA for genealogical purposes, GEDmatch contains public atDNA kits of more than 1 million individuals.