[Biopython-dev] GSoC python variant update 2

Lenna Peterson arklenna at gmail.com
Fri May 18 04:35:10 UTC 2012


On Wed, May 16, 2012 at 4:47 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>
>> It could be very difficult to make PyVCF compatible with Python 2.5.
>
> What makes you worry? You mention argparse in the blog post,
> but that is for parsing command line arguments - and so is
> not really relevant for a library like Biopython (unless you are
> planning a bunch of command line tools too?).
>
> Peter

The absences that caught my eye were `with` and `next()`. The PyVCF
developers aren't planning to implement 2.5 compatibility
(https://github.com/jamescasbon/PyVCF/issues/30) and I don't have
expertise in that transition.


On Wed, May 16, 2012 at 8:19 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
>
>> I don't think `SeqFeature` or an extension thereof would be
>> appropriate for storing Variant data; therefore, I intend to make a
>> new structure based on `_Record` and `_Call` in PyVCF. I'm not sure if
>> this structure should be associated with `Seq`, i.e. by naming it
>> `SeqVariant`, and would like feedback on this question.
>
> I'm agreed about SeqFeature. Would you consider using _Record/_Call
> directly? Then you could provide functionality to convert this to/from
> basic SeqFeatures if needed. An advantage of using these structures
> explicitly is that you could plug in compatible APIs, like Aaron
> Quinlan's CyVCF:
>
> https://github.com/arq5x/cyvcf
>
> I don't think we should add a new representation class unless we
> explicitly need to store additional information.
>

The reason I suggested a new representation class is so data from all
parsers can be stored in the same way. As far as I can tell, GVF
doesn't store all of the information stored in VCF (for example, the
headers). My concern was unexpected behavior if I tried to store GVF
data in the exact same object used by VCF. On the other hand, your GFF
parser outputs to SeqRecords/SeqFeatures, so if the PyVCF wrapper can
output to SeqRecords as well, I probably wouldn't have to worry about
an intermediate structure.

I'll start by having the PyVCF wrapper use _Record and _Call to keep
things simple. In any case, if I do end up writing an interface/new
structure, I would definitely write it to allow substitution of CyVCF
or other parsers.

Lenna



More information about the Biopython-dev mailing list