extract part of a string with an extension

1 回表示 (過去 30 日間)
Andrea
Andrea 2014 年 12 月 3 日
編集済み: per isakson 2014 年 12 月 4 日
Hi, I have a long string and I want to just exctract the names that have "hdf" as an extension:
I want just to get "MOD11C1.A2013001.005.2013015221704.hdf"
My string is:
U.S. GOVERNMENT COMPUTER
This US Government computer is for authorized users only. By accessing this
system you are consenting to complete monitoring with no expectation of privacy.
Unauthorized access or use may subject you to disciplinary action and criminal
prosecution.
********************************************************************************
</pre>
<pre><img src="/icons/blank.gif" alt="Icon "> Name Last modified Size Description<hr><img src="/icons/back.gif" alt="[DIR]"> Parent Directory -
<img src="/icons/image2.gif" alt="[IMG]"> BROWSE.MOD11C1.A2013001.005.2013015221704.1.jpg 15-Jan-2013 16:29 3.2M
<img src="/icons/image2.gif" alt="[IMG]"> BROWSE.MOD11C1.A2013001.005.2013015221704.2.jpg 15-Jan-2013 16:29 3.3M
<img src="/icons/unknown.gif" alt="[ ]"> MOD11C1.A2013001.005.2013015221704.hdf 15-Jan-2013 16:29 46M
<img src="/icons/unknown.gif" alt="[ ]"> MOD11C1.A2013001.005.2013015221704.hdf.xml 16-Jan-2013 02:15 32K
<hr></pre>
</body></html>
Thanks,
Zeinab
  3 件のコメント
Andrea
Andrea 2014 年 12 月 3 日
編集済み: Andrea 2014 年 12 月 3 日
Thanks, It always has the exact same extension "hdf" file. And it always starts with MOD, as you see the name is I am interested in is: MOD11C1.A2013001.005.2013015221704.hdf But it will change in other loops according to the date. for instance: MOD11C1. A2013001.005.2013015221704 .hdf will be MOD11C1.A2013001.005.2013015221705.hdf.
The reason I need it, is I want to read the files in a web address (that will change with a loop) with urlread which gives me the content as string. Now I need to use urlwrite to save the files I want according to their filenames (with have hdf extension).
Please see this: str=urlread(path1);
Many thanks, I really spend more than 6 hours on it so far!
farz
Star Strider
Star Strider 2014 年 12 月 3 日

サインインしてコメントする。

採用された回答

per isakson
per isakson 2014 年 12 月 3 日
編集済み: per isakson 2014 年 12 月 4 日
Here is a solution(?) based on regexp
>> cac = cssm;
>> cac{:}
ans =
MOD11C1.A2013001.005.2013015221704.hdf
ans =
MOD11C1.A2013001.005.2013015221704.hdf
>>
where
function cac = cssm()
str = fileread( 'cssm.txt' );
name_xpr = '[\w\.]+\.hdf';
cac = regexp( str, name_xpr, 'match' );
end
and cssm.txt contains the text of your question. Two identical name seems to be correct. You might want to apply unique
&nbsp
In response to comments:
My mistake illustrates a problem with regular expressions. Expressions often matches unexpected strings. I missed the case that ".hdf" is part of the base name rather than an extension. Now I have added that ".hdf" should be followed by "\s, Any white-space character; equivalent to [\f\n\r\t\v]". However, that white-space is not included in the output.
>> cssm
ans =
'MOD11C1.A2013001.005.2013015221704.hdf'
function cac = cssm()
str = fileread( 'cssm.txt' );
name_xpr = '[\w\.]+\.hdf(?=\s)'; % <<<<<<< modified
cac = regexp( str, name_xpr, 'match' );
end
&nbsp
Stephen Cobeldick already proposed this modification to the expression. I like Stephen's list, which helps to pinpoint the unique characteristics of the string. It triggers thinking. Does the filename always start with "MOD"? Could "MOD" appear in the middle of the name? It's risky to deduce rules out of small samples. If the name shall always start with "MOD"
name_xpr = '(?<=\s)MOD[\w\.]+\.hdf(?=\s)';
is a better expression.
  4 件のコメント
Andrea
Andrea 2014 年 12 月 3 日
Thank you I tried the one with "s" as you suggested but it did not work. The previous one worked fine for me but gave me all the files with hdf extension which was not a big problem. The one you suggested seems to give me a unique answer but it isn't working and it gives an empty cell as a result.
per isakson
per isakson 2014 年 12 月 4 日
I've added to my answer

サインインしてコメントする。

その他の回答 (1 件)

Stephen23
Stephen23 2014 年 12 月 3 日
編集済み: Stephen23 2014 年 12 月 3 日
Why not all on one line?
str = fileread('temp.txt');
C = regexp(str,'MOD[\w\.]+\.hdf(?=\s)','match');
C =
'MOD11C1.A2013001.005.2013015221704.hdf'
This matches all substrings that meet the following conditions:
  • starts with 'MOD'
  • ends with '.hdf'
  • contains any combination of alphnumeric characters plus period
  • is followed by a space character (ie excludes '....hdf.xml')
As suggested by per isakson, you might also want to apply unique to the output.

カテゴリ

Help Center および File ExchangeHDF5 についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by