Skip to main content
Researchdata.se
ℹ️ This is a preview version of Researchdata.se. The site contents and features are under development.

Gene annotation of Blastobotrys mokoenaii, Blastobotrys illinoisensis, and Blastobotrys malaysiensis

Gene annotation of Blastobotrys mokoenaii, Blastobotrys illinoisensis, and Blastobotrys malaysiensis
https://doi.org/10.17044/SCILIFELAB.28606814
This dataset contains the gene annotation data for three species of Blastobotrys yeats: B. mokoenaii, B. illinoisensis, and B. malaysiensis. The genome assemblies for B. mokoenaii (NRRL Y-27120) and B. malaysiensis (NRRL Y-6417) were publicly available on the National Center for Biotechnology Information (NCBI) under accessions GCA_003705765.3 and GCA_030558815.1, respectively. The genome assembly for B. illinoisensis (NRRL YB-1343) was generated by SciLifeLab's National Genomics Infrastructure (NGI) using PacBio long-read data and deposited in the European Nucleotide Archive (ENA) under accession GCA_965113335.1. File description- bmokoenaii_annotation.gff This file contains the gene models predicted for B. mokoenaii (GCA_003705765.3). - billinoisensis_annotation.gff This file contains the gene models predicted for B. illinoisensis (GCA_003705765.3). - bmalaysiensis_annotation.gff This file contains the gene models predicted for B. malaysiensis (GCA_030558815.1). Gene annotation methodsRepeat MaskingPrior to annotation, a repeat library was built for each species using RepeatModeler2 v2.0.2 and the genomes were soft-masked using RepeatMasker v4.1.5. $ RepeatModeler -database ${DB} -engine ncbi -pa 16 $ RepeatMasker -dir . -gff -u -no_is -xsmall -e ncbi -lib ${LIBRARY} -pa 16 genome.fasta Structural Annotation Structural annotation was performed on the soft-masked genomes using Braker3 v3.0.3 incorporating external evidence in the form of all fungal proteins from OrthoDB v11 (available at https://bioinf.uni-greifswald.de/bioinf/partitioned_odb11Opens in a new tab). $ braker.plOpens in a new tab --genome="$genome" \ --prot_seq=${protein} --workingdir=${PWD} \ --gff3 --threads=16 --verbosity=3 \ --nocleanup --species=${i} Functional Annotation The predicted genes were functionally annotated using the National Bioiformatics Infrastructure Sweden (NBIS) functional_annotation nextflow pipeline v2.0.0 (https://github.com/NBISweden/pipelines-nextflowOpens in a new tab). Briefly, this pipeline performs similarity searches between the annotated proteins and the UniProtKB/Swiss-Prot database (downloaded on 2023-12) using the Basic Local Alignment Search Tool (BLAST). Then it uses InterProScan to query the proteins against InterPro v59-91 databases, and merges results using AGAT v1.2.0. tRNAs and rRNAs Transfer RNA (tRNA) and ribosomal RNA (rRNA) genes were annotated using tRNAscan-SE v2.0.12 and barrnap v0.9, respectively. Other ncRNAs, such as SRP RNA, RNase P RNA, spliceosomal ncRNAs etc. have not been predicted. Finnally, the functionally annotated protein-coding genes, tRNAs, and rRNAs were combined into a single GFF file using AGAT v1.2.0. $ tRNAscan-SE -E --gff ${output}_trnas.gff --thread 16 ${genome}.fasta $ barrnap --kingdom euk --threads 6 ${genome}.fasta > ${output}_rrna.gff Annotation integrationFinnally, the functionally annotated protein-coding genes, tRNAs, and rRNAs were combined into a single GFF file using AGAT v1.2.0. $ agat_sp_complement_annotations.plOpens in a new tab --ref ${protein_coding} --add ${trna} --add ${rrna} --out full_annotation.gff
Go to data source
Opens in a new tab
https://doi.org/10.17044/SCILIFELAB.28606814

Citation and access

Administrative information

Topic and keywords

Metadata

scilifelabchalmers