Newsletter
Join the Community
Subscribe to our newsletter for the latest news and updates
1000 Genomes Project
The International Genome Sample Resource (IGSR) and the 1000 Genomes Project IGSR was set up to ensure the future usability and accessibility of data from the 1000 Genomes Project and to extend the data set produced by the 1000 Genomes Project to include new data generated from the 1000 Genomes Project samples and new populations where sampling has been carried out in line with IGSR sampling principles.
The 1000 Genomes Project ran between 2008 and 2015, creating the largest public catalogue of human variation and genotype data. As the project ended, the Data Coordination Centre at EMBL-EBI received funding from the Wellcome Trust to create IGSR with the following aims:
Ensure the future access to and usability of the 1000 Genomes reference data Incorporate additional published genomic data on the 1000 Genomes samples Expand the data collection to include new populations not represented in the 1000 Genomes Project
In 2014, the Genome Reference Consortium released an update of the human genome reference assembly, GRCh38. This update to the human reference assembly increased the quantity of alternative loci represented. GRCh38 contains 178 genomic regions with associated alternative loci (2% of chromosomal sequence (61.9 Mb)). These are made up of 261 alternative loci (containing 3.6 Mb novel sequence relative to chromosomes). The GRC were also able to resolve more than 1000 issues from the previous version of the assembly, providing a better basis for alignment and subsequent analysis.
As part of its work to maintain the 1000 Genomes Project data, IGSR realigned the original project’s sequence data to GRCh38 and used these alignments to call biallelic SNVs and INDELs.
Subseqeuent work by the New York Genome Center (NYGC), funded by NHGRI, generated new high-coverage data for the 1000 Genomes samples and has also analysed the data on GRCh38.
Incorporate published genomic data on the 1000 Genomes samples Along with high-coverage genomic sequence data from NYGC, the cell lines generated by the 1000 Genomes Project have been used by other researchers, who have generated further data sets. The GEUVADIS project, which generated RNA-Seq data on the 1000 Genomes samples of European ancestry and the YRI population is one example of this. Groups such as the Human Genome Structural Variation Consortium (HGSVC) have also generated a wide variety of data from the 1000 Genomes Project cell lines.
Expand the data collection to include new populations IGSR’s sample collection principles enable sharing of further samples with a similar consent to the samples in the 1000 Genomes Project, recognising that the sample collections used in the 1000 Genomes Project do not capture all populations.
During the lifetime of IGSR, various populations have been added, mainly coming from the Human Genome Diversity Project (HGDP), the Simons Genome Diversity Project (SGDP) and the Gambian Genome Variation Project (GGVP).
Learning more Further information on the 1000 Genomes Project IGSR’s Nucleic Acids Research publication Our data portal, listing the publications associated with the data collections Information on using data provided by IGSR Please email questions about any of the above to info@1000genomes.org.