3. Comparative population transcriptomics in krill: orthogroups (FASTA, TSV files)
https://doi.org/10.17044/SCILIFELAB.24039510
This item contains a gzipped archive with ~13,000 orthogroups used to study molecular evolution in this project.
Archive:
krill.orthogroups.tar.gz
Contents of archive (FILE,SIZE,SPECIES,SAMPLES,SNPs):
- krill.proteinortho.tsv - the primary output table from Proteinortho. Describes which protein sequences from which species belong to the same orthogroup. Format according to the standard output of the program.
- krill.proteinortho.tsv.seqs.csv - a processed table that also contains the actual sequences line by line (see below).
- the alignments directory, which contains all OGs in unaligned and aligned files in FASTA format (see below).
Format of the krill.proteinortho.tsv.seqs.csv table
The fields are:
- NR = orthogroup number
- ORTHO_GROUP = orthogroup ID
- N_SPECIES = the number of species
- N_GENES = the number of genes/sequences in this orthogroup
- N_MATCHING[o] = number of sequences matching outgroup species for this orthogroup
- N_NON_MATCHING = number of sequences matching ingroup species for this orthogroup
- HEADER = the name of this particular sequence
- SEQ = the protein sequence
Contents of the alignments directory
Each orthogroup is represented by up to four FASTA files:
- OG*.cds.ginsi.fasta.orig = the original, unaligned and unfiltered sequences
- OG*.cds.ginsi.fasta = the aligned and filtered sequences
- OG*.cds.ginsi.fasta.without_cold_euphausia.fasta = the aligned and filtered sequences after removing cold-associated Euphausia species
- OG*.cds.ginsi.fasta.without_cold_thysanoessa.fasta = the aligned and filtered sequences after removing cold-associated Thysanoessa species
Go to data source
Opens in a new tabhttps://doi.org/10.17044/SCILIFELAB.24039510
Citation and access
Citation and access
Creator/Principal investigator(s):
Research principal:
Citation:
Administrative information
Administrative information
Topic and keywords
Topic and keywords
Metadata
Metadata
