Machine Learning Basics
by Dr. Jane Smith
Machine Learning Basics
Machine Learning Basics
This book provides a comprehensive introduction to machine learning, covering both theoretical foundations and practical implementations. Throughout this book, we'll explore various concepts with embedded code examples and related articles.
Introduction
Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. It focuses on developing computer programs that can access data and use it to learn for themselves.
Key Concepts
Before diving into machine learning, it's essential to understand data processing. The following snippet demonstrates how to prepare and preprocess data for ML models.
Data preprocessing is a critical step in any machine learning project. It involves cleaning, transforming, and preparing data for analysis.
Types of Machine Learning
There are three main types of machine learning:
- Supervised Learning: Learning from labeled data
- Unsupervised Learning: Finding patterns in unlabeled data
- Reinforcement Learning: Learning through interaction with an environment
Mathematical Foundations
Machine learning relies heavily on mathematical concepts from linear algebra, calculus, and statistics.
Python Data Processing
Python Data Processing Example
This snippet demonstrates data processing using pandas and numpy.
1
2 import pandas as pd
3 import numpy as np
4 from sklearn.preprocessing import StandardScaler
5
6 # Create sample data
7 data = {
8 'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
9 'age': [25, 30, 35, 28, 32],
10 'salary': [50000, 60000, 70000, 55000, 65000],
11 'department': ['IT', 'HR', 'Finance', 'IT', 'Marketing']
12 }
13
14 # Create DataFrame
15 df = pd.DataFrame(data)
16 print("Original DataFrame:")
17 print(df)
18
19 # Data preprocessing
20 # 1. Handle missing values
21 df.fillna({'salary': df['salary'].mean()}, inplace=True)
22
23 # 2. Standardize numerical columns
24 scaler = StandardScaler()
25 numerical_cols = ['age', 'salary']
26 df[numerical_cols] = scaler.fit_transform(df[numerical_cols])
27
28 # 3. One-hot encode categorical columns
29 df_encoded = pd.get_dummies(df, columns=['department'])
30
31 print("\nProcessed DataFrame:")
32 print(df_encoded)
33
34 # 4. Group by department and calculate mean salary
35 dept_salary = df.groupby('department')['salary'].mean()
36 print("\nAverage salary by department:")
37 print(dept_salary)