[BioPython] SeqRecord - understanding of id, name and description arguments
Peter
biopython at maubp.freeserve.co.uk
Thu Jan 11 18:50:13 UTC 2007
Jan Kosinski wrote:
> Hi,
>
> I would like to ask for the intended meaning for SeqRecord "id", "name"
> and "description" arguments.
>
> In the "id" we put accession numbers (Entrez GI numbers, swiss-prot
> accession numbers etc)
>
> In the "description" - any description, could be a name or more information
>
> And what about the "name" ?
>
> When I am reading fasta sequences with SeqIO where I should put the
> sequence name which I understand as everything which comes after ">" to
> the end of the line?
>
> Janek
And example I just made up almost at random from SwissProt might be
something like this:
id: P0A738
name: moaC
description: Molybdenum cofactor biosynthesis protein C
If you are creating your own SeqRecord objects, you can fill in as much
or as little information as you like.
If you are reading sequences from well defined files (e.g. SwissProt or
GenPept/GenBank) then the annotation is nicely defined - so the parser
should be able to tell what is a gene name, what the accession number
is, any description etc.
For Fasta files this tricky. In general you get:
>identifier free format text
ACTGCTGA...
i.e. the first "word" when split with white space is normally an ID or
name of some sort, and the rest of the line is some sort of description.
Its impossible to do much better than this unless you know exactly
what style of Fasta annotation you are dealing with in advance.
Peter
More information about the Biopython
mailing list