Machine learning & Data science (DS/ML) tooling
An end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.
An API designed for human beings, not machines. Keras follows best practices for reducing cognitive load: it offers consistent & simple APIs, it minimizes the number of user actions required for common use cases, and it provides clear & actionable error messages.
An interpreted high-level general-purpose programming language. Has a huge community in machine learning and computer vision.
A Python-based scientific computing package that uses the power of graphics processing units. Deep learning research platform built to provide maximum flexibility and speed. Tensor computations with strong GPU acceleration support and building deep neural networks on a tape-based autograd systems.
- Jupyter hub
Language agnostic, interactive computing and it supports execution environments (aka kernels) in several dozen languages.
A Python-based ecosystem of open-source software for mathematics, science, and engineering. Provides more utility functions for optimization, stats and signal processing.
A Python library designed to support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The core of NumPy is well-optimized C code.
Open source machine learning software that can be accessed through a graphical user interface, standard terminal applications, or a Java API. Contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.
A real-time analytics database designed for fast slice-and-dice analytics ("OLAP" queries) on large data sets. Druid is most often used as a database for powering use cases where real-time ingest, fast query performance, and high uptime are important.
Machine learning library has a range of supervised and unsupervised learning algorithms. Provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities.
Comprehensive library for creating static, animated, and interactive visualizations in Python. Offers a viable open source alternative to MATLAB.
Visualization tookit designed for machine learning. Enables you to track various metrics such as accuracy and log loss on training or validation set.
A high-level, high-performance, dynamic general-purpose programming language. Designed to solve the “two language problem” by providing ease of use and speed. Targeting big data analytics, high-performance computing and running simulations for scientific and engineering research.
A language and environment for statistical computing and graphics. Popular language in the world of Data Science. It is heavily used in analyzing data that is both structured and unstructured.
A fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
An interface for Apache Spark in Python. provides a shell for analyzing your data interactively in a distributed environment.
An open source computer vision and machine learning software library written in C++. Contains more than 2500 optimized algorithms, which includes a comprehensive set of both classic and state-of-the-art computer vision and machine learning algorithms.
- Spark ML
A fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala and Python, and an optimized engine that supports general execution graphs. Supports higher-level tools for SQL and structured data processing, machine learning, graph processing and data streaming.
An optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way.
A Python data visualization library based on matplotlib. Provides a high-level interface for drawing attractive and informative statistical graphics.
A programming platform designed specifically for engineers and scientists to analyze and design systems and products that transform our world. Takes your ideas from research to production by deploying to enterprise applications and embedded devices, as well as integrating with Simulink® and Model-Based Design.
- TF Privacy
A Python library that includes implementations of TensorFlow optimizers for training machine learning models with differential privacy. Contains analysis tools for measuring privacy guarantees.
- TF Cloud
The TensorFlow Cloud repository provides APIs that ease the transition from local model building and debugging to distributed training and hyperparameter tuning on Google Cloud.
- GNU Octave
a scientific Programming Language, primarily intended for numerical computations with built-in 2D/3D plotting and visualization tools. Highly compatible with Matlab.
- TF Model Analysis
A library for performing TensorFlow model evaluation. Evaluating models on large amounts of data in a distributed manner.
An open-source AI layer for existing databases that allows you to effortlessly develop, train and deploy state-of-the-art machine learning models using SQL queries. Provide both graphical user-interface and plain SQL for building and deploying ML models.
A Python library to benchmark machine learning systems' vulnerability to adversarial examples.
- Google Charts
A set of AI-centric extensions to JupyterLab Notebooks. Provides a Pipeline Visual Editor for building AI pipelines from JupyterLab notebooks.
An open-source Python framework for creating reproducible, maintainable and modular data science code. Follows the concepts of modularity, separation of concerns and versioning.
A framework with wide support for deep learning algorithms. Open-source, distributed, deep learning library for the JVM. Works with Spark and Hadoop on top of distributed CPUs or GPUs.
A Python library for topic modelling, document indexing and similarity retrieval with large corpora. Memory efficient by design. highly optimized and parallelized C routines are used at the core level.
A suite of libraries that implement machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects. Runs ML tasks on one GPU, multi GPUs and multi node GPU.
A library for advanced Natural Language Processing in Python and Cython. Comes with pretrained pipelines and currently supports tokenization and training for 60+ languages.
GPU-Accelerated Libraries for AI and HPC. A collection of libraries, tools, and technologies that deliver dramatically higher performance than alternatives across multiple application domains—from artificial intelligence to high performance computing.
- Google's Differential Privacy
Tool help protect private information. Securely draw insights from datasets that contain the private and sensitive personal information of its users.
- WIT Tool
An easy-to-use interface for expanding understanding of a black-box classification or regression ML model. Can perform inference on a large set of examples and immediately visualize the results in a variety of ways. Contains tooling for investigating model performance and fairness over subsets of a dataset.
A machine Learning Platform for Kubernetes. Designed for building, training, and monitoring large scale deep learning applications. Supports multi GPU environment and all the major deep learning frameworks.
An open-source deep learning software framework, used to train, and deploy deep neural networks. It is scalable, allowing for fast model training, and supports a flexible programming model and multiple programming languages.
Geographic information systems use GeoTIFF and other formats to organize and store gridded, or raster, datasets. Rasterio reads and writes these formats and provides a Python API based on N-D arrays.
- Intel MKL-DNN
Math Kernel Library for Deep Neural Networks. An open-source performance library written in C++ for improving deep learning applications on Intel CPUs and GPUs.
Designing for Production-Grade AI
We are highly selective when we choose to add new technologies to our Machine learning stack, as we learned that the real test of AI applications is evaluating them in production environments.
Key lessons from our ML practitioners
- Transfer learning with DSM
The use of multiple regularization techniques while optimizing the feature map for target tasks helps avoid the overall drop in performance and negative transfers when applying transfer learning to Domain Specific Models.
- Model fine-tuning is key
Tuning models during and post training through quantization and pruning could reduce model size by a factor of 4 & improves speed without impacting precision which is critical to AI on edge devices.
- The inevitable dimensionality
Having a good sense for the data and working with business domain experts to analyze and evaluate every move could help in avoiding the curse of dimensionality while training classifiers in high-dimensional spaces.
From model-centric to data-centric Machine Learning
AI system today consist mainly of machine learning algorithms that manipulate and process data to build predictive models. While continuously working on ML models and trying different techniques to optimize neural networks is critical to the success of AI products; improving data quality and data labeling process is found to be of a greater importance to AI initiatives.
Consistency of data collection and the manual labeling of the data could make all the difference in setting "the Ground Truth" in data sets for machine learning algorithms.
While our team puts a lot of emphasis on every step of the lifecycle of our machine learning products, including model architecture, model training, error analysis & iterative improvements, we also design project-specific frameworks and processes for data acquisition, augmentation and labeling that significantly impacts model accuracy when tested in production against real-world scenarios.
Driving the next evolution of AI
The ongoing advancements in AI is anticipated to create significant business opportunities and societal value both on the short and long term.
At area99 we take up the mantle and charge ahead, working with business partners and research institutes to commercialize scientific discoveries, and create AI solutions that serve society and bring more radical innovations for humans in the future.