How do I efficiently make a large SQL query into Matlab?

Question

1 投票

I have a SQL table that contains data that I need for analysis which is about 60 million records and 5 fields. What is the best way to get that into Matlab variables or a single variable. I have plenty of RAM (64GB) but I find that the import is taking a very long time and often encounters errors.

Is the quickest way to import all date into a single cell? a single table?
With a single pull allowing 'FetchInBatches' to be set to yes to avoid JVM errors?
Should I loop through sections of the table and import it in batches that way?
Should I loop and do each field individually so that I can download to the right data type?

Thanks in advance for your help, Brian

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Rahul Goel 2015 年 10 月 29 日

MATLAB Online で開く

1 投票

Hi Brian,

It will help the community assist you better if you could share more details about the setup you are using. For example, the database and the driver, also bitness: 32 or 64-bit(in some cases).

Following are some observations with MySQL database using JDBC driver:

MySQL JDBC driver by default retrieves the complete 'ResultSet' and stores it in the memory. This can lead to the heap space issues if you are working with the 'ResultSets' that have large number of rows or large values which cannot be allocated a heap space in your JVM for the memory required.

One of the workaround to this limitation is to reconstruct the SQL query to include 'LIMIT' and 'OFFSET' constructs.

The LIMIT clause is used to limit the number of results returned in a SQL statement. So if you have 1000 rows in a table, but only want to return the first 10, you would do something like this:

SELECT column FROM table
LIMIT 10

Now suppose you wanted to show results 11-20. With the OFFSET keyword its just as easy, the following query will do:

SELECT column FROM table LIMIT 10 OFFSET 10

Use the following MATLAB code which makes use of these SQL constructs:

>> LIMIT = <limit>; %Change the LIMIT to select the one with better response time
>> for fetchCount = 1: N
        conn = database(instance,username,password,Name,Value);
        query = ['select * from <table > ' int2str(LIMIT) ' OFFSET ' int2str((fetchCount-1) * LIMIT)];
        curs = exec(conn, query);
        rowlimit = <row_limit>; %Change rowlimit to select the one with a better response time
        while ~strcmp(curs.Data, 'No Data')
            curs = fetch(curs,rowlimit);
            data = curs.Data;
        end
        close(conn);
   end

As mentioned in the comments from the above code, please select the limit and row_limit which gives a better performance. The values for limit and row_limit follows the bell curve trend and there exists the optimal value which gives better performance. For instance, you can use multiples of 10^N depending on the number of rows.

Hope this helps.

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

KAE 2021 年 4 月 7 日

編集済み: KAE 2021 年 4 月 7 日

MATLAB Online で開く

Should the query be

query = ['SELECT * from <table > LIMIT ' int2str(LIMIT) ' OFFSET ' int2str((fetchCount-1) * LIMIT)];   

How would a WHERE clause be added?

サインインしてコメントする。

How do I efficiently make a large SQL query into Matlab?

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

回答 (1 件)

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

カテゴリ

製品

タグ

Community Treasure Hunt

How do I efficiently make a large SQL query into Matlab?

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

回答 (1 件)

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

カテゴリ

製品

タグ

参考

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示