How to modify and run this Script???

Hello world, I am new to matlab and doesn't know much of it and for this I wasn't able to understand and run a script that I had. Below is the script which I need to run for few thousands of times. Can anybody help in rectifying/modifying the script so that it runs with input fasta file(file contains Uniprot_ID and Seq) and writes the output in new files names according to the Uniprot_ID. Thank You in advance for your help and time.
data.Sequence=MDULSQ…….
data.Header=header file
fastawrite(my_test.txt,data)
type(my_test.txt)
FASTAData=fastaread(my_test.txt)
[Sequence]=fastaread(my_test.txt)
data=proteinpropplot(Sequence,property,hydrophobicity(Kyte & Doolittle))
zero_crossings_indices=data.Indices(diff(sign(data.Data))~=0)
plot(data.Indices,data.Data,-)
hold on
plot(zero_crossings_indices,0,ro)
inputfile.txt
>tr|D6RGD4|D6RGD4_HUMAN Amyloid-beta A4 precursor protein-binding family B member 2 (Fragment) OS=Homo sapiens OX=9606 GN=APBB2 PE=1 SV=1
MAERKNAKALACSSLQERANVNLDVPLQVDFPTPKTELVQKFHVQYLGMLPVDKPVGMDI
LNSAIENLMTSSNKEDWLSVNMNVADA
>tr|G3V3P0|G3V3P0_HUMAN Presenilin-1 (Fragment) OS=Homo sapiens OX=9606 GN=PSEN1 PE=1 SV=1
MTELPAPLSYFQNAQMSEDNHLSNTNDNRERQEHNDRRSLGHPEPLSNGRPQGNSRQVVE
QD
>tr|A0A0A0MRG2|A0A0A0MRG2_HUMAN Amyloid-beta A4 protein OS=Homo sapiens OX=9606 GN=APP PE=1 SV=1
MFCGRLNMHMNVQNGKWDSDPSGTKTCIDTKEGILQYCQEVYPELQITNVVEANQPVTIQ
NWCKRGRKQCKTHPHFVIPYRCLVGEFVSDALLVPDKCKFLHQERMDVCETHLHWHTVAK
ETCSEKSTNLHDYGMLLPCGIDKFRGVEFVCCPLAEESDNVDSADAEEDDSDVWWGGADT
DYADGSEDKVVEVAEEEEVAEVEEEEADDDEDDEDGDEVEEEAEEPYEEATERTTSIATT
Result======expecting.
D6RGD4.file_extension
G3V3P0.file_extension
A0A0A0MRG2.file_extension

 採用された回答

OCDER
OCDER 2018 年 6 月 25 日
編集済み: OCDER 2018 年 6 月 26 日

1 投票

FileName = 'inputfile.txt';
S = fastaread(FileName);
for f = 1:length(S)
FileNameExp = regexp(S(f).Header, 'tr\|(\w+)\|', 'tokens');
SaveName = [FileNameExp{1}{1} '.png'];
if exist(SaveName, 'file'); continue; end %Skips just in case you run this multiple times. Don't want
%to append data to self, which is what fastawrite does.
data = proteinpropplot(S(f).Sequence,'propertytitle','hydrophobicity (Kyte & Doolittle)');
zero_crossings_indices = data.Indices(diff(sign(data.Data))~=0);
plot(data.Indices,data.Data,'-');
hold on
plot(zero_crossings_indices,0,'ro')
hold off
print(gcf, SaveName, '-dpng', '-r300', '-painters');
end

11 件のコメント

Takshan
Takshan 2018 年 6 月 26 日
Thanks bro, but it not exactly working how am expecting. The output file must consist the plot on kyte-doolittle analysis result, not the sequence.
This code needs to be run for each sequence and plot output.
data=proteinpropplot(Sequence,property,hydrophobicity(Kyte & Doolittle))
zero_crossings_indices=data.Indices(diff(sign(data.Data))~=0)
plot(data.Indices,data.Data,-)
hold on
plot(zero_crossings_indices,0,ro)
OCDER
OCDER 2018 年 6 月 26 日
I edited the answer above
Takshan
Takshan 2018 年 6 月 26 日
It worked perfectly. Thanks bro.
OCDER
OCDER 2018 年 6 月 26 日
You're welcome!
Takshan
Takshan 2018 年 6 月 26 日
Is it possible to write value of "zero_crossings_indices" of all sequences in a single csv/excel file with header. Like
IDXXXXX 1234
IDXXXXX 1211
IDXXXXX 1222
OCDER
OCDER 2018 年 6 月 26 日
You'll have to use fprintf or xlswrite
FileName = 'inputfile.txt';
CsvName = 'summary.csv';
FID = fopen(CsvName, 'w');
S = fastaread(FileName);
for f = 1:length(S)
FileNameExp = regexp(S(f).Header, 'tr\|(\w+)\|', 'tokens');
SaveName = [FileNameExp{1}{1} '.png'];
data = proteinpropplot(S(f).Sequence,'propertytitle','hydrophobicity (Kyte & Doolittle)');
zero_crossings_indices = data.Indices(diff(sign(data.Data))~=0);
plot(data.Indices,data.Data,'-');
hold on
plot(zero_crossings_indices,0,'ro')
hold off
print(gcf, SaveName, '-dpng', '-r300', '-painters');
GeneName = FileNameExp{1}{1};
for k = 1:length(zero_crossings_indices)
sprintf('%s, %d', GeneName, zero_crossings_indices(k))
fprintf(FID, '%s, %d\n', GeneName, zero_crossings_indices(k));
end
end
fclose(FID);
Takshan
Takshan 2018 年 6 月 26 日
編集済み: Takshan 2018 年 6 月 26 日
For one IDXXXX, multiple(100+) number of values are printed. I was expecting the total number(count) of "zero_crossings_indices" for each sequences IDXXXX also not the total sum of zero_crossings value.
OCDER
OCDER 2018 年 6 月 26 日
Perhaps it might be worth learning matlab?
replace this
for k = 1:length(zero_crossings_indices)
sprintf('%s, %d', GeneName, zero_crossings_indices(k))
fprintf(FID, '%s, %d\n', GeneName, zero_crossings_indices(k));
end
with
fprintf(FID, '%s, %d\n', GeneName, sum(zero_crossings_indices));
Takshan
Takshan 2018 年 6 月 26 日
Yeah, I should start it. Feeling so helpless for simple task. Btw thanks for your help and time. Also that last code giving the sum of values not the count of the no. of values.
IDXXX 10 20 30 40
*expecting*( total no. of value)
IDXXX 4
but getting(total sum of value)
IDXXX 100
OCDER
OCDER 2018 年 6 月 26 日
Oops, it should be length() instead.
fprintf(FID, '%s, %d\n', GeneName, length(zero_crossings_indices));
Takshan
Takshan 2018 年 6 月 26 日
Thanks bro. I used numel for length and worked .

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

ヘルプ センター および File ExchangeView and Analyze Simulation Results についてさらに検索

質問済み:

2018 年 6 月 25 日

コメント済み:

2018 年 6 月 26 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by