Rosalind #5 GC content
Oct. 11th, 2015 02:16 pmSurprisingly difficult for me XD After I wasted a lot of time trying to cleverly read in lines using loops, J had to suggest using split(), although he conceded that it'd be too slow for someone trying to read in gigs and gigs of data (they'd need to cook up something that reads character-by-character). Still, this worked, I'll take it for now.
with open("C:/Users/Ireneying/Workspace/rosalind_gc.txt") as raw:
data="".join(line.rstrip() for line in raw)
fasta = data.split(">")
fasta = list(filter(None, fasta))
ids = []
seqs = []
for item in fasta:
l = len(item)
ids.append(item[0:13])
seqs.append(item[13:l])
pairs = dict(zip(seqs, ids))
GC = ['G','C']
gccontent = 0
seq_id = ""
for subseq in seqs:
count = 0
for ch in subseq:
if ch in GC:
count +=1
temp_gcc = count / len(subseq)
if temp_gcc > gccontent:
gccontent = temp_gcc
seq_id = subseq
gccontent = gccontent * 100
print(pairs[seq_id], gccontent)