[Biopython] Variant interface

Chris Mitchell chris.mit7 at gmail.com
Sat Jun 30 14:47:10 UTC 2012


Hi Lenna,

Here are some features of the VCF/GFF parser I wrote that I use:

The choice to be used as an iterator for parsing huge files or store the
entire vcf in memory.
isVariant(chromosome, position, *arg) -- returns if it is a variant,
optional arg if it's a user-supplied variant
writeVariant(file) -- writes the vcf object to a given file handle
getVariants() -- generator for all the vcf objects, similar method for GFF
exists
getAttribute(attribute, *arg) -- get objects with a given attribute,
optional arg is to get objects with a given attribute equal to arg
getChildren() -- gets child objects of GFF if exists
getParent() -- gets parent object of GFF if exists
getXXX() -- gets the standard info for any VCF/GFF object like SeqId,
Start, End, Alt, Ref, VCF type, etc.
addAttribute(key, value) -- adds a feature to a given GFF/VCF object
removeAttribute(key) -- removes feature

Optional keywords:
filter = [string,string...] -- only keep variants with the keys in filter
filterOnly = Bool -- only keep features specified in filter in our object
(so if we have 20 key-value attributes, just keep the keys in filter)
keyDelim = string -- for compatibility with non-standard vcf/GFF formats
that don't use '=' for the key-value separator
fast = [string, string...] -- if we parse the file in memory, keep the keys
in this list in a dictionary for immediate access to the vcf/GFF objects
exclude = [string, string...] -- exclude entries with these keys
cols = (int,int) -- what cols to use -- useful for parsing in GFF files
that have been merged with bedtools
random = Bool -- if we're treating the file as an iterable, stores the
object's file position coordinates in a dictionary for random access to
objects

Some usage cases this helps me with:
Parsing through a file and adding/removing annotations (for instance if I
want to add the coding transcript affected by a VCF to the file itself)
Trim down files based on several criteria to a smaller more informative file
Being able to read a file only to the point I care about (random access
methods that can index attributes as well as the normal
identifiers/iterator)
Immediate access to an attribute I care about

Hope that helps
Chris


On Sat, Jun 30, 2012 at 1:50 AM, Lenna Peterson <arklenna at gmail.com> wrote:

> Hi all,
>
>
> I'm working on Biopython for Google summer of code; my project is to create
> an interface between Biopython and various existing tools for handling
> sequence variants (including VCF format).
>
>
> I am seeking feedback from variant users. What could my interface offer
> that would make it easier to use variants with Biopython? For example, I am
> planning on a function that will essentially skim through a large file to
> give a general overview of its contents. More specifically, in what ways
> should variant data be able to interact with existing parts of Biopython
> (such as SeqFeature, SeqRecord)?
>
>
> Looking forward to any thoughts you share.
>
>
> Cheers,
>
>
> Lenna
>
> github.com/lennax
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>



More information about the Biopython mailing list