[Biopython-dev] [Bug 3000] Could SeqIO.parse() store the whole, unparsed multiline entry?
Peter
biopython at maubp.freeserve.co.uk
Sun Mar 14 20:30:45 UTC 2010
On Fri, Mar 12, 2010 at 8:29 PM, Martin MOKREJŠ wrote:
>
> Finally, the remaining differences are here (probably the first is in bug #2578):
>
> --- /tmp/orig.gb 2010-03-12 21:09:24.000000000 +0100
> +++ /tmp/new.gb 2010-03-12 21:09:38.000000000 +0100
> @@ -1,4 +1,4 @@
> -LOCUS CR603932 1625 bp mRNA linear HTC 16-OCT-2008
> +LOCUS CR603932 1625 bp DNA HTC 16-OCT-2008
> DEFINITION full-length cDNA clone CS0DK007YH24 of HeLa cells Cot 25-normalized
> of Homo sapiens (human).
> ACCESSION CR603932
> @@ -29,39 +29,39 @@
> division of Invitrogen.
> FEATURES Location/Qualifiers
> source 1..1625
> - /organism="Homo sapiens"
> /mol_type="mRNA"
> - /db_xref="taxon:9606"
> /clone="CS0DK007YH24"
> + /db_xref="taxon:9606"
> /tissue_type="HeLa cells Cot 25-normalized"
> /plasmid="pCMVSPORT_6"
> + /organism="Homo sapiens"
> ORIGIN
>
Yes, the LOCUS line issue would be part of Bug 2578.
As to the order of the feature qualifiers, these are stored
in a Python dictionary which does not preserve the order.
I personally don't think the order of the qualifiers is
important and thus don't care that is can change like
this. Assuming the NCBI have a defined sort order for
the qualifiers (I'm not aware one), then we could sort
the feature qualifiers on output. Another option would
be to store the qualifiers in an ordered-dictionary. Or
just leave it as it is ;)
Peter
More information about the Biopython-dev
mailing list