The Crowdsourcing Site That Wants to Pool Our Genomes

A new non-profit site aims to open up genetic data for scientific research.

In 2010, I posted a vial of my finest spit to the genetic-testing company 23andme. In return, I got to see what my genes reveal about my ancestry, how they affect my risk of diseases or my responses to medical drugs, and even what they say about the texture of my earwax. (It’s dry.) 23andme now has around a million users, as do other similar companies like Ancestry.com.

But these communities are largely separated from one another, a situation that frustrated Yaniv Erlich from the New York Genome Center and Columbia University. “Tens of millions of people will soon have access to their genomes,” he says. “Are we just going to let these data sit in silos, or can we partner with these large communities to enable some really large science? That’s why we developed DNA.LAND.”

DNA.LAND, which Erlich developed together with colleague Joe Pickrell, is a website that allows customers of other genetic-testing services to upload files containing their genetic data. Scientists can then use this data for research, to the extent that each user consents to. “DNA.LAND is a way for getting the general public to participate in large-scale genetic studies,” says Erlich. “And we’re not a company. We’re a non-profit website, run by scientists.”

Erlich has a track record of similar projects: Two years ago, he assembled what is probably the world’s largest ever family tree by pooling 43 million profiles from a publicly available genealogy site.

Getting people to actively participate in DNA.LAND will be harder, but Daniel Macarthur from Massachusetts General Hospital is optimistic. “Efforts like this are all about building momentum, and getting them going is hard. But at least in this case they have the advantage of working with an existing community that's actively excited about sharing their own genetic data.”

Erlich was encouraged that two sites, openSNP and GedMatch, have successfully crowdsourced genetic data from hundreds and thousands of people. But neither of these sites includes much in the way of privacy protection. DNA.LAND, by contrast, promises to not release any information without explicit permission. And they designed a consent form that could be read in 5 minutes, that contains minimal scientific or legal jargon, and that links to Erlich’s and Pickrell’s own uploaded genomes. “We have a skin-in-the-game philosophy,” says Erlich.

To encourage users to participate, the site offers several free services, including an ancestry report and a relative finder (which Erlich himself used to find a fourth cousin). Other companies provide these services, but within their own corrals. “What if you get tested by 23andme, and you have relatives in Ancestry.com?” asks Erlich. “We allow you to find relatives outside the silo of the company where you’ve been tested.”

These companies also analyze just hundreds of thousands of markers across a person’s genome, leaving huge tracts uncharted; DNA.LAND fills some of these gaps through a process called imputation. Erlich explains this by pulling up the sentence: Ba_ _ _ _  O_ _ ma  i_  t_ _  Pr_ _ _ _ _ _ t. You can probably quickly read that as “Barack Obama is the President,” because you’re familiar with English and there are only so many options for the letters in the spaces. The genome is similar. DNA is inherited in chunks so based on what’s there, you can make educated guesses about what’s not. “If they test 700,000 markers, we can get to million,” says Erlich.

Next, he wants to infuse DNA.LAND with data from other sources. A person’s tweets might reveal their sleeping patterns, whether they’re sick and what symptoms they have, and the ebb and flow of their moods. Fitbit data could say even more about their health and fitness. If such sources could be connected to DNA.LAND, it would be an effortless way to connect people’s genotype (their genes) with their phenotype (the physical traits that those genes affect).

When I spoke to Erlich last Thursday, he was nervous. The site was due to launch on Friday morning and 24 hours later, he would present the site to his colleagues at the American Society for Human Genetics 2015 conference. “I thought maybe we’d have 30 genomes, and I’d have to wave my hands and talk about how awesome it is,” he said.

In fact, users uploaded 1250 genomes within DNA.LAND’s first day. “We’re getting a genome a minute right now,” he told the crowd. “My programmer was awake all night.” That impressive pace has naturally slowed, but at the time of writing, there are 5,485 genomes on the site. The team are hoping for thousands more.

Ed Yong is a former staff writer at The Atlantic. He won the Pulitzer Prize for Explanatory Reporting for his coverage of the COVID-19 pandemic.