Plz explain this code to me!

Question

0 投票

Hello, I'm trying to understand a program that identifies recognition sites for restriction enzymes of a DNA sequence, and outputs the fragments:

This is the main program, I included sample s and recs, but those will be input from the user.

    s={'CCCAGCTCCCGAATTCCCCAGCTGGGGGGCCCCCAGCTTTATGATGAATTCCCTGTATAAAAA'}; 
    recs={'AG^CT'}; 
    sout={};
    for j=1:numel(recs)
        rec=recs{j};
        sout={};
        for i=1:numel(s) 
            cutsite=rebase_hassite(s{i},rec);
            stemp=str_cut(s{i}, cutsite);
            sout(end+1:end+numel(stemp),:)=stemp;
        end
        s=sout
    end

This is the rebase hassite function: (I understand it)

    function cut_position= rebase_hassite(s, rec)
    x=strfind(rec,'^')
    y=rec
    y(x)=[] 
    cut_position= strfind(s,y)+x-2

This is the str_cut function, that I don't understand

    function ss= str_cut(s,loc)
    %coded by John Rozolis
    %{
    takes locations from 'rebasehassite1.m' and creates a matrix with all of
    the cuts at G^AATTC. One section per cell.
    %}
    I=[0 loc numel(s)];
    ss=[];
    if numel(I)>=4
        %two or more cut sites
           for i=1:numel(I)-2
            ss{end+1}=s(I(i)+1:I(i+1));
           end
           ss{end+1}=s(I(end-1)+1:I(end));
           ss=ss';
    elseif numel(I)==3
        %one cut site
        ss= {s(1:loc); s(loc+1:end)};
    elseif numel(I)==2
        %no cut sites
        ss={s};
    end

loc is the cut_site, I want to understand the working of this function. The author of this code said:

"The third function, ‘str_cut.m’, takes the user’s string (s) as well as cut locations (loc) generated from ‘rebase_hassite.m’ as input arguments. It uses the locations to create a vector holding positional values: I=[0 loc numel(s)]. This vector will always have at least two elements. If this vector only has two elements, then the string contains no cutting sites. For every number greater than two is another cutting site found in the user’s string. A for-loop is activated in the function when the vector has four or more elements. Each cut fragment is stored within a new variable, ss. This variable grows with each cycle of the for-loop. The output for one cut site would be in the form of two fragments of the initial string. The first fragment would span from the first element of the string to the location of the cut site. The second output fragment would start from the location after the cut site and continue to the end. "

Thank you.