[BioPython] SeqRecord - understanding of id, name and description arguments

Thu Jan 11 18:50:13 UTC 2007

Jan Kosinski wrote:
> Hi,
> 
> I would like to ask for the intended meaning for SeqRecord "id", "name" 
> and "description" arguments.
> 
> In the "id" we put accession numbers (Entrez GI numbers, swiss-prot 
> accession numbers etc)
> 
> In the "description" - any description, could be a name or more information
> 
> And what about the "name" ?
> 
> When I am reading fasta sequences with SeqIO where I should put the 
> sequence name which I understand as everything which comes after ">" to 
> the end of the line?
> 
> Janek

And example I just made up almost at random from SwissProt might be 
something like this:

id: P0A738
name: moaC
description: Molybdenum cofactor biosynthesis protein C

If you are creating your own SeqRecord objects, you can fill in as much 
or as little information as you like.

If you are reading sequences from well defined files (e.g. SwissProt or 
GenPept/GenBank) then the annotation is nicely defined - so the parser 
should be able to tell what is a gene name, what the accession number 
is, any description etc.

For Fasta files this tricky.  In general you get:

 >identifier free format text
ACTGCTGA...

i.e. the first "word" when split with white space is normally an ID or 
name of some sort, and the rest of the line is some sort of description. 
  Its impossible to do much better than this unless you know exactly 
what style of Fasta annotation you are dealing with in advance.

Peter