分享

17 Great Machine Learning Libraries

 非同寻常 2014-12-09

After wonderful feedback on my previous post on Scikit-learn from the guys at /r/MachineLearning, I decided to collect the list of machine learning libraries into this seperate note. Let me know if there’s a library that should be included here.


Python

  • Scikit-learn: comprehensive and easy to use, I wrote a whole article on why I like this library.
  • PyBrain: Neural networks are one thing that are missing from SciKit-learn, but this module makes up for it.
  • nltk: really useful if you’re doing anything NLP or text mining related.
  • Theano: efficient computation of mathematical expressions using GPU. Excellent for deep learning.
  • Pylearn2: machine learning toolbox built on top of Theano - in very early stages of development.
  • MDP (Modular toolkit for Data Processing): a framework that is useful when setting up workflows.

Java

  • Spark: Apache’s new upstart, supposedly up to a hundred times faster than Hadoop, now includes MLLib, which contains a good selection of machine learning algorithms, including classification, clustering and recommendation generation. Currently undergoing rapid development. Development can be in Python as well as JVM languages.
  • Mahout: Apache’s machine learning framework built on top of Hadoop, this looks promising, but comes with all the baggage and overhead of Hadoop.
  • Weka: this is a Java based library with a graphical user interface that allows you to run experiments on small datasets. This is great if you restrict yourself to playing around to get a feel for what is possible with machine learning. However, I would avoid using this in production code at all costs: the API is very poorly designed, the algorithms are not optimised for production use and the documentation is often lacking.
  • Mallet: another Java based library with an emphasis on document classification. I’m not so familiar with this one, but if you have to use Java this is bound to be better than Weka.
  • JSAT: stands for “Java Statistical Analysis Tool” - created by Edward Raff and was born out of his frustation with Weka (I know the feeling). Looks pretty cool.

.NET

  • Accord.NET: this seems to be pretty comprehensive, and comes recommended by primaryobjects on Reddit. There is perhaps a slight slant towards image processing and computer vision, as it builds on the popular library AForge.NET for this purpose.
  • Another option is to use one of the Java libraries compiled to .NET using IKVM - I have used this approach with success in production.

C++

  • Vowpal Wabbit: designed for very fast learning and released under a BSD license, this comes recommended by terath on Reddit.
  • MultiBoost: a fast C++ framework implementing some boosting algorithms as well as some cascades (like the Viola-Jones cascades). It’s mainly focused on AdaBoost.MH so it is multi-class/multi-label.
  • Shogun: large machine learning library with a focus on kernel methods and support vector machines. Bindings to Matlab, R, Octave and Python.

General

  • LibSVM and LibLinear: these are C libraries for support vector machines; there are also bindings or implementations for many other languages. These are the libraries used for support vector machine learning in Scikit-learn.

Conclusion

This article is a work in progress, so please send me your comments or criticisms!

Want more? Sign up below to get a free ebook Machine Learning in Practice, and updates on new posts:

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多