Методы и примеры кода для анализа данных: от очистки данных к машинному обучению - Fcodenotes

Очистка данных:

Метод: удаление дубликатов из набора данных с помощью библиотеки pandas Python.

import pandas as pd

# Remove duplicates from a DataFrame
df = pd.DataFrame({'A': [1, 2, 2, 3, 4],
                'B': ['a', 'b', 'b', 'c', 'd']})

df_cleaned = df.drop_duplicates()
print(df_cleaned)

Исследовательский анализ данных (EDA):
- Метод: создание описательной статистики с использованием библиотеки Python pandas.
```
import pandas as pd

# Calculate descriptive statistics
df = pd.DataFrame({'A': [1, 2, 3, 4, 5]})

stats = df.describe()
print(stats)
```

Визуализация данных:

Метод: создание точечной диаграммы с использованием библиотеки Python matplotlib.

import matplotlib.pyplot as plt

# Create a scatter plot
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.scatter(x, y)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Scatter Plot')
plt.show()

Статистическое моделирование:

Метод: построение модели линейной регрессии с использованием библиотеки Python scikit-learn.

from sklearn.linear_model import LinearRegression

# Build a linear regression model
X = [[1], [2], [3], [4], [5]]
y = [2, 4, 6, 8, 10]

model = LinearRegression()
model.fit(X, y)

# Predict the output for a new input
X_new = [[6]]
y_pred = model.predict(X_new)
print(y_pred)

Машинное обучение:

Метод: обучение классификатора дерева решений с использованием библиотеки Python scikit-learn.

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Train a decision tree classifier
model = DecisionTreeClassifier()
model.fit(X, y)

# Make predictions for new data
X_new = [[5.1, 3.5, 1.4, 0.2], [6.2, 2.8, 4.8, 1.8]]
y_pred = model.predict(X_new)
print(y_pred)