1. Comparative population transcriptomics in krill: reference transcriptomes (FASTA, GFF, TSV files)
https://doi.org/10.17044/SCILIFELAB.22722361
This item holds one major gzipped tar archive that contains 20 nested tar archives, each of which containing reference transcriptomes and associated metadata for one species of krill (20 species in total).
Archive:
krill.transcriptomes.tar.gz
Contents of major archive (FILE,TAG,SPECIES,SIZE):
- earm.transcriptomes.tar,earm,Euphausia similis var. armata,491.6M
- ecry.transcriptomes.tar,ecry,Euphausia crystallorophias,89.7M
- edin.transcriptomes.tar,edin,Euphausia distinguenda,496.5M
- efri.transcriptomes.tar,efri,Euphausia frigida,345.2M
- elam.transcriptomes.tar,elam,Euphausia lamelligera,515.9M
- elos.transcriptomes.tar,elos,Euphausia longirostris,234M
- emuc.transcriptomes.tar,emuc,Euphausia mucronata,360.4M
- epac.transcriptomes.tar,epac,Euphausia pacifica,357.1M
- erec.transcriptomes.tar,erec,Euphausia recurva,114.8M
- esim.transcriptomes.tar,esim,Euphausia similis,417.9M
- espi.transcriptomes.tar,espi,Euphausia spinifera,425.1M
- esup.transcriptomes.tar,esup,Euphausia superba,520.6M
- etri.transcriptomes.tar,etri,Euphausia triacantha,396M
- eval.transcriptomes.tar,eval,Euphausia vallentini,635.1M
- mnor.transcriptomes.tar,mnor,Meganyctiphanes norvegica,469M
- nmeg.transcriptomes.tar,nmeg,Nematoscelis megalops,429M
- tine.transcriptomes.tar,tine,Thysanoessa inermis,594.6M
- tlon.transcriptomes.tar,tlon,Thysanoessa longicaudata,328.8M
- tmac.transcriptomes.tar,tmac,Thysanoessa macrura,253.4M
- trac.transcriptomes.tar,trac,Thysanoessa raschii,231.2M
Contents of nested archives:
Each nested tar archive contains the follow set of files (the "TAG" prepends the filenames according to the list of species tags above):
TAG. trinity.fasta
The full Trinity transcriptomem, including non-coding transcripts and alternative isoforms
TAG.trinity.longest_isoforms.fasta.renamed.list.tsv:
A TSV table to translate between original Trinity transcript sequence names (field 3) and names used throughout the analyses (field 2). This table contains the longest isoforms, i.e. the resulting transcripts after removing redundant shorter isoforms.
- field 1: number
- field 2: species-specific transcript sequence names used in analyses. The sequence name follow the format "TAG_NUMBER" for non-coding transcripts and "TAG_NUMBER_OTHER_NUMBER" for coding transcripts (the last number indicates which reading-frame was selected by transdecoder as the best).
- field 3: original Trinity transcript sequence names
TAG.trinity.longest_isoforms.coding.fasta
The filtered transcriptome, including only the longest isoform of each coding transcript.
TAG.trinity.longest_isoforms.coding.fasta.transdecoder.gff3
A GFF coordinate file that specifies where along the coding transcripts features such as CDS, UTRs start and stop.
TAG.trinity.longest_isoforms.fasta.transdecoder.cds.fasta
The CDS of the open reading frame of coding transcripts, as specified by the TAG.trinity.longest_isoforms.coding.fasta.transdecoder.gff3 GFF file and the TAG.trinity.longest_isoforms.coding.fasta file.
TAG.trinity.longest_isoforms.fasta.transdecoder.pep.fasta
The corresponding peptide sequence of encoded by each CDS.
The GFF files follow the GFF3 standard:
https://www.ensembl.org/info/website/upload/gff3.htmlÖppnas i en ny tabb
The FASTA files follow the FASTA standard:
https://www.ncbi.nlm.nih.gov/genbank/fastaformatÖppnas i en ny tabb
Note: Compared to the files used in analyses, these files have been edited to reflect the species names and abbreviations used in publication figures.
Gå till källa för data
Öppnas i en ny tabbhttps://doi.org/10.17044/SCILIFELAB.22722361
Citering och åtkomst
Citering och åtkomst
Skapare/primärforskare:
Forskningshuvudman:
Citering:
Administrativ information
Administrativ information
Ämnesområde och nyckelord
Ämnesområde och nyckelord
Metadata
Metadata
