I wrote some Python code while practicing the munging of proteomic mass spec output data.
I am posting the code for posterity, but this is specifically for Xcalibur output, esp. the way that Xcalibur exports modifications! Do NOT use on your data unless you know it comes out in the same format!
For some reason there were non-ASCII characters in one of the .csv files; I couldn't figure it out but I suspect the person who made the original .xslx file made a smartquote or smart dash or something that the Python code kept choking on - I simply removed that column from the file and that solved the problem in the short term.
I used a dictionary of lists.
I have two conditions, each with a few thousand peptides, each with three forms - unmodified, oxidized, and phosphorylated = 3 columns. Need to cbind the two dictionaries, after doing preprocessing to mark peptides that only appear in one condition as 0 counts in the other condition.
I am posting the code for posterity, but this is specifically for Xcalibur output, esp. the way that Xcalibur exports modifications! Do NOT use on your data unless you know it comes out in the same format!
For some reason there were non-ASCII characters in one of the .csv files; I couldn't figure it out but I suspect the person who made the original .xslx file made a smartquote or smart dash or something that the Python code kept choking on - I simply removed that column from the file and that solved the problem in the short term.
I used a dictionary of lists.
I have two conditions, each with a few thousand peptides, each with three forms - unmodified, oxidized, and phosphorylated = 3 columns. Need to cbind the two dictionaries, after doing preprocessing to mark peptides that only appear in one condition as 0 counts in the other condition.
from collections import defaultdict
import csv
allcounts = defaultdict(list)
def munge2():
input_file = 'ET4.csv'
input = open(input_file, 'r')
counts = 0
counts_oxi = 0
counts_phos = 0
# THIS IS ONLY HERE TO SKIP A BLANK LINE!!!
input.readline()
for line in input:
entry = line.split(',')
#print(entry[0])
if len(entry) <= 1:
continue
if len(entry) >= 23 and entry[16] != "":
counts = counts + int(entry[16])
if "1" in entry[6]:
counts_oxi = counts_oxi + int(entry[16])
if "2" in entry[6]:
counts_phos = counts_phos + int(entry[16])
if len(entry) >= 23 and entry[22] != "":
protein = entry[22].strip()
allcounts[protein].append(counts)
allcounts[protein].append(counts_oxi)
allcounts[protein].append(counts_phos)
counts = 0
counts_oxi = 0
counts_phos = 0
else:
continue
import csv
with open('ETcounts.csv', 'w', newline='') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
for key, values in allcounts.items():
spamwriter.writerow([key] + values)