Skip to main content
Researchdata.se
ℹ️ This is a preview version of Researchdata.se. The site contents and features are under development.

Supplemental data from the genome assembly and annotation of the Clouded Apollo Butterfly (Parnassius mnemosyne)

Supplemental data from the genome assembly and annotation of the Clouded Apollo Butterfly (Parnassius mnemosyne)
https://doi.org/10.17044/SCILIFELAB.25908748
This dataset contains supplementary data from the genome sequencing of the Clouded Apollo Butterfly (Parnassius mnemosyne), published in: Höglund, J., Dias, G., Olsen, R. A., Soares, A., Bunikis, I., Talla, V., & Backström, N. (2024). A Chromosome-Level Genome Assembly and Annotation for the Clouded Apollo Butterfly (Parnassius mnemosyne): A Species of Global Conservation Concern. Genome Biology and Evolution, 16(2), evae031. https://doi.org/10.1093/gbe/evae031Opens in a new tab Previous data from the project has been deposited at the European Nucleotide Archive (ENA) in the umbrella project PRJEB76269 (https://www.ebi.ac.uk/ena/browser/view/PRJEB76269Opens in a new tab) . The data contained in this archive at SciLifeLab Data Repository describe the genome assembly (ENA accession: GCA_963668995.1 (https://www.ebi.ac.uk/ena/browser/view/GCA_963668995.1Opens in a new tab) ), and the mitochondrial genome assembly (ENA accession: OZ075093.1 (https://www.ebi.ac.uk/ena/browser/view/OZ075093.1Opens in a new tab) ). Below follows a brief description of each file. The information on the methods used to generate the files was adapted from Höglund et al. 2024. - pmne_functional_edit1.gff.gz contains the functional annotation (protein coding genes) of the primary genome assembly (GCA_963668995.1 (https://www.ebi.ac.uk/ena/browser/view/GCA_963668995.1Opens in a new tab) ). This is the original file that was submitted to ENA. A derived version of the file is available from NCBI; the NCBI version was generated from the EMBL records of each annotated gene and differs in that it for instance use a different naming scheme for the seqid column and the locus tags. The NCBI version is available at this link (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/963/668/995/GCA_963668995.1_Parnassius_mnemosyne_n_2023_11/GCA_963668995.1_Parnassius_mnemosyne_n_2023_11_genomic.gff.gzOpens in a new tab) . The genes were predicted using BRAKER (v3.03), GALBA (v1.0.6), and GeneMarkS-T (v5.1). The resulting gene models were combined and filtered using TSEBRA (version: long_reads branch commit 1f2614). The combined gene model was functionally annotated by the NBIS nextflow pipeline v2.0.0 (https://github.com/NBISwedenOpens in a new tab). - pmne_Illumina_RNAseq_StringTie_sorted-transcripts_match.gff.gz contains a transcript assembly of the Illumina RNAseq reads (ENA accession: ERX11559451 (https://www.ebi.ac.uk/ena/browser/view/ERX11559451Opens in a new tab) ). The reads were aligned to the genome with HiSat2 (v2.1.0) and then assembled with StringTie (v2.2.1). - pmne_mtdna.gff.gz contains the functional annotation of the mitochondrial genome assembly (ENA accession: OZ075093.1 (https://www.ebi.ac.uk/ena/browser/view/OZ075093.1Opens in a new tab) ). This is the original file that was submitted to ENA. The annotation was generated using MitoFinder (v1.4.1). - pmne_ncRNAs.gff.gz contains the annotation of putative non-coding RNA (ncRNA) genes. The prediction was done with Infernal (v1.1.4) and the Rfam (v14.1) covariance models. - pmne_tRNAs_and_pseudogenes.gff.gz contains the annotation of putative tRNA genes and pseudogenes. The prediction was done with tRNAscan-SE (v2.0.12). - pmne_PacBio_isoseq.sorted.bam contains the PacBio IsoSeq transcripts (ENA accession: ERX11559436 (https://www.ebi.ac.uk/ena/browser/view/ERX11559436Opens in a new tab) ) aligned to the primary genome assembly. - pmne_repeat_library.fa.gz contains the nucleotide sequences of the prediced repeats in fasta format. The prediction was done with RepeatModeler2 (v2.0.2a). Available variablesFor a description of the column headers of the files, please see the following links to the documentation of the different file formats. The GFF3 format (.gff) is described here: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.mdOpens in a new tab The BAM format (.bam) is a compressed version of the SAM format, both of which are described here: https://samtools.github.io/hts-specs/SAMv1.pdfOpens in a new tab The fasta (.fa) format is described here: https://www.ncbi.nlm.nih.gov/genbank/fastaformatOpens in a new tab ContactFor questions about this dataset, please contact: jacob.hoglund@ebc.uu.seOpens in a new tab niclas.backstrom@ebc.uu.seOpens in a new tab
Go to data source
Opens in a new tab
https://doi.org/10.17044/SCILIFELAB.25908748

Citation and access

Administrative information

Topic and keywords

Relations

Metadata

scilifelabuu