Regular expressions on uint8 or single byte characters
5 ビュー (過去 30 日間)
古いコメントを表示
I have a 200 MB text file encoded in UTF-8. My maximum array size is around 350 MB, so I can safely read it in using fread('filename','*uint8'). For using regular expressions, I need to turn this into a char array, which blows up the array size by at least a factor of two (depending on encoding, but for my application I can ignore all fancy characters), and thus leads to an "out of memory" error.
I wrote some code that breaks up the original array, so that the matching of the regular expressions works on smaller chunks, but I am still wondering: Can I somehow run regular expressions on the uint8 array? Or is there a char-like variable type that only uses 1 byte per character?
5 件のコメント
dpb
2013 年 8 月 26 日
Instead of 'unit8', try 'uchar' Not sure it'll help but it is at least a character class, not an integer.
Cedric
2013 年 8 月 27 日
編集済み: Cedric
2013 年 8 月 27 日
Actually, it is simpler to ask what you are trying to match instead of the pattern (copy/paste of chunk of file content or string, and an explanation of what you want to extract). With a little luck, we can perform this using STRFIND (which works on uint8 arrays) or some numeric test on uint8's.
回答 (0 件)
参考
カテゴリ
Help Center および File Exchange で Characters and Strings についてさらに検索
製品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!