Gå direkt till huvudinnehåll
Researchdata.se
ℹ️ Detta är en preview-version av Researchdata.se, innehåll och funktionalitet är under utveckling.

Arthropod Kraken2 Database v1

Arthropod Kraken2 Database v1
https://doi.org/10.17044/SCILIFELAB.29666605
Kraken2 Arthopod Reference Database v.1Kraken2 (v2.1.2) database containing all 2,593 reference assemblies for Arthropoda available on NCBI as of March 2023. This database was built for and used in the analysis of shotgun sequencing data of bulkDNA from Malaise trap samples collected by the Insect Biome Atlas, in the context of the manuscript "Small Bugs, Big Data: Metagenomics for arthropod biodiversity monitoring" by authors: López Clinton Samantha, Iwaszkiewicz-Eggebrecht Ela, Miraldo Andreia, Goodsell Robert, Webster Mathew T, Ronquist Fredrik, van der Valk Tom (for submission to Ecology and Evolution). For custom database building, Kraken2 requires all headers in reference assembly fasta files to be annotated with "kraken:taxid|XXX" at the end of each header. Where "XXX" is the corresponding National Center for Biotechnology Information (NCBI) taxID of the species. The code used to add the taxID information to each fasta file header, and update the accession2taxid.mapÖppnas i en ny tabb file required by Kraken2 for database building, is available in this GitHub repository (https://github.com/SamanthaLop/Small_Bugs_Big_DataÖppnas i en ny tabb) (also linked under "Related Materials" below). ContentBelow is a list of the files in this item (in addition to the README and MANIFEST files), and their description. The first three files (marked with a *) are required to run Kraken2 classifications using the database. - * hash.k2d.gz - A hash file with all minimiser to taxon mappings (855 GB). - * opts.k2d - A file containing all options used when building the Kraken2 database (64 B). - * taxo.k2d - A file containing the taxonomy information used to build the database (385.9 KB). - seqid2taxid.map.gz - A file containing contig accession numbers and their corresponding taxids (810.6 MB). Note that this file is needed by Kraken2 when building the database, and as it was updated during custom building, it has been included for reference, but it is not required to use the database for classification. - genome_assembly_metadata.tsv - NCBI-generated table (tsv format, gzipped) of all reference assemblies for Arthropoda as of March 2023, which were used in the database construction. This includes columns: Assembly Accession, Assembly Name, Organism Name, Organism Infraspecific Names Breed, Organism Infraspecific Names Strain, Organism Infraspecific Names Cultival, Organism Infraspecific Names Ecotype, Organism Infraspecific Names Isolate, Organism Infraspecific Names Sex, Annotation Name, Assembly Stats Total Sequence Length, Assembly Level, Assembly Submission, and WGS project accession. How to use the database- Download the hash.k2d.gz, opts.k2d, and taxo.k2d files to the same directory (e.g. /PATH/TO/DATABASE/). - Unzip the hash.k2d.gz file. - Install or load Kraken2 to run classification on sequencing data using the database. - When running Kraken2, indicate the path to the directory (not the individual files) with the --db flag (e.g. kraken2 --db /PATH/TO/DATABASE/ ...). Note that the whole database must be loaded into memory by Kraken2 to be able to classify any sequencing reads, so ensure you have access to enough memory before running (the uncompressed hash file is around 1.1 TB). We also recommend using the Kraken2 option --memory-mapping, as it ensures the database is loaded once for all samples, instead of once for each individual sample, saving considerable time and resources. For more information on using Kraken2, see the Kraken2 wiki manual (https://github.com/DerrickWood/kraken2/wiki/ManualÖppnas i en ny tabb) . This database was built by Samantha López Clinton (samantha.lopezclinton@nrm) and Tom van der Valk (tom.vandervalk@nrm.seÖppnas i en ny tabb).
Gå till källa för data
Öppnas i en ny tabb
https://doi.org/10.17044/SCILIFELAB.29666605

Citering och åtkomst

Ämnesområde och nyckelord

Relationer

Metadata

scilifelab
Naturhistoriska riksmuseet