
Microbial Genome Similarity Search with FASTQ (MGSS-FASTQ)
Project goal
Develop a tool to enable rapid identification of microorganism genomes from raw sequencing data generated by NGS/MPS and stored in FASTQ files, leveraging the RefSeqMasher algorithm but extending searches across the entire GenBank database, and providing online access through an intuitive web interface for query management.
Description of Activities (Stages)
- 1.
Generation of NGS reads of various bacterial species and strains common in the human microbiome.
- 2.
Reconstruction of the genomes, resulting in sets of contigs of varying lengths, representing fragments of the studied genomes, covering up to 20% of any particular genome’s length each.
- 3.
Identification of a bioinformatic tool capable of searching the largest possible region of similarity between full-genome data.
- 4.
Creation of a new tool capable of searching the entire microbial genome part of GenBank based on Masher, which is limited to reference genome sequences (RefSeq).
- 5.
Creation of a website for testing the new algorithm.
- 6.
Validation of the obtained results with the use of the available NGS sequences from different microorganisms – ongoing.
Resources used
Data
entire set of bacterial genomes from GenBank (NCBI) consisting of +3 000 000 complete genomes’ sequences, updated regularly.
In Vitro Confirmation
Not applicable.

