2. Comprehensive Overview of Data Types and Machine Learning Approaches

In this second article of our AI and ML series, we will discuss the different types of data and machine learning methods. Understanding these concepts is essential, as data is the backbone of machine learning. The effectiveness of any model largely depends on the quality and type of data it is trained on.

Types of Data

Data is broadly classified into four main types:

Unlabelled Data
- Definition: Data that does not have any accompanying labels. It is raw and unorganized, making it difficult to understand without further processing.
- Example (Technical): A dataset with numerical values without any indication of what they represent.
- Example (Non-Technical): Imagine a collection of photos without any descriptions or tags. You wouldn't know if the photo is of a cat, a dog, or a landscape.
Labeled Data
- Definition: Data that comes with labels, making it clear what each data point represents.
- Example (Technical): A dataset where each row of data has an accompanying label indicating the category or value of the data.
- Example (Non-Technical): A collection of photos where each photo is tagged with a description, such as 'cat', 'dog', or 'landscape'.

Labeled and Unlabeled Data- What is the difference?? | by Varun Sakhuja | Medium

Structured Data
- Definition: Data that is organized in rows and columns, typically found in databases and spreadsheets.
- Example (Technical): A table in a database where each row represents a record and each column represents a field (e.g., name, age, address).
- Example (Non-Technical): An Excel spreadsheet listing customer information, with each row representing a different customer and each column representing a different attribute (name, email, phone number).
Unstructured Data
- Definition: Data that is not organized in a predefined manner. This includes text, images, audio, and video files.
- Example (Technical): A folder containing various text documents, images, and audio recordings.
- Example (Non-Technical): A collection of emails, photos, and audio messages that are not sorted or labeled in any particular way.
  
  Image Reference

Types of Machine Learning

Machine learning can be categorized into three main types:

Supervised Learning
- Definition: In supervised learning, models are trained using labeled data. The model learns to associate input data with the correct output.
- Example (Technical): Training a model to classify emails as spam or not spam by providing it with a dataset of labeled emails.
- Example (Non-Technical): Teaching a child to recognize fruits by showing them images of fruits with their names. The child learns to identify apples, bananas, and oranges based on the labeled images.

Image Refrence

Semi-supervised learning

Unsupervised Learning
- Definition: In unsupervised learning, models are trained using unlabeled data. The model tries to identify patterns and relationships within the data on its own.
- Example (Technical): Using clustering algorithms to group similar customers based on their purchasing behavior without predefined labels.
- Example (Non-Technical): Imagine sorting a box of mixed buttons by size and color without any prior knowledge. You group the buttons based on their characteristics.

Image Reference

Unsupervised Learning

Reinforcement Learning
- Definition: In reinforcement learning, models learn by taking actions and receiving feedback in the form of rewards or penalties. The goal is to maximize the cumulative reward.
- Example (Technical): Training a robot to navigate a maze where it receives a reward for reaching the end and a penalty for hitting obstacles.
- Example (Non-Technical): Think of teaching a dog tricks by giving it treats for performing the correct action and withholding treats for incorrect actions. Over time, the dog learns to perform the tricks correctly to receive the reward.
- Image Reference

A Complete Guide to Boost Your Business with Reinforcement Learning

Example Reinforcement Learning Agent Playing Brick Breaker

Data Splitting in Machine Learning

In machine learning, data is typically split into three sets:

Training Data: Used to train the model.
Validation Data: Used to tune the model parameters and ensure it is not overfitting.
Test Data: Used to evaluate the final model performance.

By dividing the data, we ensure that the model is trained effectively and its performance is evaluated accurately.

Image Reference

Hold-out Method for Training Machine Learning Models - Analytics Yogi

2. Comprehensive Overview of Data Types and Machine Learning Approaches

Table of contents

Types of Data

Types of Machine Learning

Data Splitting in Machine Learning