An Update on NCTC 3000: A Type Culture Reference Genome Project
The NCTC 3000 project is a collaborative bacterial Whole Genome Sequencing (WGS) project between Public Health England (PHE), Pacific BioSciences (PacBio) and the Wellcome Trust Sanger Institute (WTSI) which aims to generate 3000 high quality reference genomes from strains within the collection. Annotated and assembled genomes are being made publically available to the scientific community in real-time via public databases, as well as via a Biological Resource information Centre (BRiC) which will link the sequencing data to all other available strain metadata (provenance, taxonomy, phenotypic characteristics and authentication data).
To date high molecular weight DNA has been extracted from over 3000 NCTC strains (representing over 850 bacterial species from 86 different families) and then sent to the WSTI were WGS has been performed using the PacBio Single Molecule, Real-Time (SMRT®) DNA Sequencing technology followed by genome assembly using an automated assembly pipeline and annotation with Prokka. Sequencing has now been completed for 2700 strains and of the genomes sequenced and assembled to date 55.1% have been closed into a single contig and 92% have been assembled into <5 contigs. For many of the more unusual strains within NCTC and particularly the Type strains where the availability reference genomes is often sparse, the NCTC 3000 project represents a significant advancement within the field.
Resources are currently being channel into ensuring that the remaining sequences and associated metadata will be publically available by the end of July 2018 and it is intended that the combined availability of reference genomes, the viable freeze dried cultures and/or genomic DNA from the NCTC strain bank will be used by scientists to support and enhance genetic and phenotypic studies of these important pathogens.