[Biopython] fetching chromosome sizes without gff file?

Tommy Carstensen tommy.carstensen at gmail.com
Wed Mar 22 19:15:17 UTC 2017


Hi Peter,

I ended up doing it like this:

import urllib.request

import operator

url = '
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.14_GRCh37.p13/GCA_000001405.14_GRCh37.p13_assembly_report.txt
'

*with* urllib.request.urlopen(url) *as* response:

    d_lengths = {}

    *for* l *in* filter(

        ## Skip if Sequence-Role is not assembled-molecule.

        *lambda* x: x[1] == 'assembled-molecule',

        ## Split line string into a list.

        map(operator.methodcaller('split', '\t'),

            ## Skip header/comment lines and strip newline characters.

            map(str.rstrip, filter(

            *lambda* x: x[0] != '#',

            ## Decode with utf-8 from bytes to string.

            map(bytes.decode, response))))):

        chrom = l[0]

        length = l[9]

        d_lengths[chrom] = length

On Wed, 22 Mar 2017 at 16:59 Peter Cock <p.j.a.cock at googlemail.com> wrote:

> Hmm.
>
> Using the NCBI Entrez API, you could certainly download these as
> FASTA or GenBank files, either of which would give you the length.
> But I don't think that offers GFF files.
>
> I don't work on model organisms, but I'd suggest ENSEMBL might
> be a good bet - but we don't yet have a Biopython module for that?
>
> http://www.ensembl.org/
> https://github.com/biopython/biopython/issues/512
>
> It might be worth looking at bioservices for this?
>
> https://github.com/cokelaer/bioservices
>
> Peter
>
> On Wed, Mar 22, 2017 at 4:24 PM, Tommy Carstensen
> <tommy.carstensen at gmail.com> wrote:
> > Is it possible to get the chromosome lengths in maize (Zea mays), zebra
> fish
> > and humans with Biopython without having the relevant gff files? How
> would I
> > go about doing that? Basically I just want to be able to fetch the gff by
> > typing in species='homo sapiens' and build=37 or something like that
> without
> > having to worry about URLs.
> >
> > Could Biopython also give me the position of the pseudoautosomal regions
> on
> > the X chromosome in Homo sapiens?
> >
> > Thanks,
> > Tommy
> >
> > _______________________________________________
> > Biopython mailing list  -  Biopython at mailman.open-bio.org
> > http://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20170322/1a62a689/attachment.html>


More information about the Biopython mailing list