helloworld | Rosalind #5 GC content (Reply)

Surprisingly difficult for me XD After I wasted a lot of time trying to cleverly read in lines using loops, J had to suggest using split(), although he conceded that it'd be too slow for someone trying to read in gigs and gigs of data (they'd need to cook up something that reads character-by-character). Still, this worked, I'll take it for now.


with open("C:/Users/Ireneying/Workspace/rosalind_gc.txt") as raw:
    data="".join(line.rstrip() for line in raw)

fasta = data.split(">")
fasta = list(filter(None, fasta))

ids = []
seqs = []
for item in fasta:
    l = len(item)
    ids.append(item[0:13])
    seqs.append(item[13:l])

pairs = dict(zip(seqs, ids))

GC = ['G','C']
gccontent = 0
seq_id = ""
for subseq in seqs:
    count = 0 
    for ch in subseq:
        if ch in GC:
            count +=1
    temp_gcc = count / len(subseq)
    if temp_gcc > gccontent:
        gccontent = temp_gcc
        seq_id = subseq

gccontent = gccontent * 100

print(pairs[seq_id], gccontent)