The NCTC3000 project reaches a milestone
The NCTC was established in 1920 and recently began its second century of operation. Since its inception almost 6,000 bacterial strains have been deposited by numerous scientists across the globe. Strains accessioned to the NCTC are almost always chosen for a unique property, whether they are a type strain of a newly described species, a strain isolated during an infectious disease outbreak or one frequently used as a control or reference strain. Together, the strains represent a unique snapshot of bacteria of historical, medical and veterinary importance.
Genome sequencing technologies have revolutionised biology over the past few decades. In terms of culture collections, they provide an opportunity to uncover new information about historic strains, perhaps enabling us to delve deeper into the uniqueness that led to their collection. In 2013, the NCTC joined forces with the Wellcome Sanger Institute, securing Wellcome Trust grant funding to sequence the genomes of approximately a half of the strains within the collection. Pacific Biosciences kindly offered significant additional financial support and know-how, enabling the generation of long-read sequencing for all strains within the study. The project began shortly thereafter, involving a large inter-disciplinary team across the three partners, becoming known as the NCTC3000 project.
In total, 3,305 PacBio long-read sequence datasets were successfully generated for 2,915 NCTC strains, all of which have been made publicly available under BioProject PRJEB6403. The NCTC3000 dataset has further enabled the generation of high-quality genome assemblies and annotations. To date, 2,228 genome assemblies and annotations have been published within the ENA/GenBank/DDBJ databases, around a third of which have achieved ‘Complete Genome’ status. The data are already being widely used by the bacterial community for a broad array of projects and tasks.
The NCTC3000 project is now entering a new phase. Along with the completion of the dataset publication and validation processes, a series of bespoke projects is leading to the detailed annotation of many of its strains. In the coming years, we aim to add value to the NCTC3000 datasets through detailed publication of this information, enabling users of the collection to more precisely choose strains of interest, perhaps due to the possession of a gene or plasmid of interest or considering geographical or lineage information. A manuscript describing the first outputs of the NCTC3000 project has recently been published in Microbial Genomics.
Dicks J, Fazal MA, Oliver K, et al. NCTC3000: a century of bacterial strain collecting leads to a rich genomic data resource. Microb Genom. 2023;9(5):mgen000976. doi:10.1099/mgen.0.000976
Written by Jo Dicks - Culture Collections Lead Bioinformatician