Where to Find a Dataset of High-Quality MATLAB Code for Training an LLM?

4 ビュー (過去 30 日間)
Ivan Rodionov
Ivan Rodionov 2025 年 4 月 17 日
回答済み: John D'Errico 2025 年 4 月 22 日
Hello,
I’m working on fine-tuning an open-source LLM for MATLAB code generation, aiming to reach a performance level similar to ChatGPT. I haven’t been impressed with the results of existing tools so far.
Could anyone point me toward quality datasets or resources specifically for training an LLM on MATLAB code? I’m particularly interested in datasets that cover a wide range of MATLAB applications, from basic scripts to more advanced numerical computations, optimization, and data analysis.
Any guidance or pointers would be greatly appreciated!

回答 (1 件)

John D'Errico
John D'Errico 2025 年 4 月 22 日
Looking at your question a second time, it is about MATLAB in a sense. in that you are looking for a repository of code to train an LLM upon.
You might look at the File Exchange, which is probably the largest repository of MATLAB code out there besides MATLAB itself. The problem is, the FEX tends to include much poorly written code. Sorry, but it does. It has some truly great code too, written by many superb authors. But there is much novice code too. And some of the code there is pretty old. I'll admit that some of my own FEX contributions are at least 25 years old. And that makes them somewhat less useful for training purposes, since MATLAB has grown in that time.
You might also need to consider licensing issues, IF you decide to use code from a source like the FEX, or any such source to train an LLM. In my case, for example, while I am quite happy to see my code used with attribution, I'm not so sure how happy I would be at the idea of an LLM effectively using my code with no attribution at all.

カテゴリ

Help Center および File ExchangeGet Started with MATLAB についてさらに検索

製品


リリース

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by