Convolution 在中文翻譯中,有稱為捲積、疊積、褶積、旋積、或卷積,也許是據其操作方法所做的翻譯,輸入向量如同卷軸般的分段移動,而後再將這些小段與Kernel向量做內積運算。 所以CNN有時被譯為「卷積(類)神經網絡」,因為CNN的字元數較少,我們偏好使用CNN,但當語意需要時,有時仍會使用「卷積(類)神經網絡」。
在此文我們將以MNIST的資料集為例子來展示使用CNN的做多類別的影像辨識工作。相對來說MNIST資料集的資料量小(訓練組僅有60,000筆),且影像內容簡單(28 x 28),即使使用簡單的MLP模型也可以達到超過90%的成功率,此文的主要目的有二:
- 示範如何使用CNN進行影像辨識,
- 比較MLP和CNN模型的總參數數量。
預處理MNIST資料集
首先我們先檢視40張MNIST的圖檔。
雖然MNIST資料集已伴隨在常用的機器學習套件中,在這裡我們練習如何直接從影像的壓縮檔中載入資料,以及前處理的方法,我們直接讀取原始資料集的gz
檔,將之重新整理為28 x 28的矩陣,之後再將之分為訓練組、驗證組及測試組,
from utils import mnist_reader
from utils.helper import invert_grayscale
import sys
import numpy as np
import pickle as pkl
DATA_DIR = "../data/MNIST"
def convert2image_format(image):
mnist_digits = image
X0 = np.reshape(mnist_digits, (-1, 28, 28))
X0_ = invert_grayscale(X0)
out = X0_[0, :, :]
return out
def split_train_and_valid(data_path, train_ratio=0.80):
# Mitigate the Python rounding error problem
val_ratio = round(1 - train_ratio, 2)
X0_test, Y0_test = mnist_reader.load_mnist(path=data_path, kind="t10k")
X0, Y0 = mnist_reader.load_mnist(path=data_path, kind="train")
X1_test = np.array([convert2image_format(element) / 255 for element in X0_test])
X1 = np.array([convert2image_format(element) / 255 for element in X0])
nsample = X0.shape[0]
train_size = int(train_ratio * nsample)
val_size = int(val_ratio * nsample)
X_train = X1[:train_size, :]
X_val = X1[train_size : train_size + val_size, :]
Y_train = Y0[:train_size]
Y_val = Y0[train_size : train_size + val_size]
X_test = X1_test
Y_test = Y0_test
print("The size of the train, validation and test are: ")
print(X_train.shape, X_val.shape, X_test.shape)
print(Y_train.shape, Y_val.shape, Y_test.shape)
return X_train, X_val, X_test, Y_train, Y_val, Y_test
if __name__ == "__main__":
# python3 preprocess_data.py
if len(sys.argv) > 1:
train_ratio = float(sys.argv[1])
print(f"Argument provided: {train_ratio}")
else:
print("No arguments provided.")
X_train, X_val, X_test, Y_train, Y_val, Y_test = split_train_and_valid(
DATA_DIR, train_ratio
)
with open("tmp_data/mnist_data.pkl", "wb") as file_obj:
pkl.dump((X_train, X_val, X_test, Y_train, Y_val, Y_test), file_obj)
下一步,得到預處理的的資料後,我們分別訓練一組MLP模型及一組CNN模型。兩組程序都使用30次遞迴。
python3 fit_cnn_model.py 30
python3 fit_mlp_model.py 30
完成後,我們使用pickle
及keras.model的套件將訓練結果載入分析。
import pickle as pkl
from keras.models import load_model
# history
with open('models/cnn_mnist_history.pkl', 'rb') as file_obj:
cnn_history = pkl.load(file_obj)
with open('models/mlp_mnist_history.pkl', 'rb') as file_obj:
mlp_history = pkl.load(file_obj)
cnn_model = load_model("models/cnn_mnist_model.keras")
mlp_model = load_model("models/mlp_mnist_model.keras")
with open("tmp_data/mnist_data.pkl", "rb") as file:
X_train, X_val, X_test, Y_train, Y_val, Y_test = pkl.load(file)
CNN 使用了6040個參數,而MLP使用了 16020個參數。
cnn_model.count_params(), mlp_model.count_params()
兩組模型都使用了四層神經層。
cnn_model.summary()
mlp_model.summary()
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer ┃ Output Shape ┃ Param # ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ conv2d │ (None, 28, 28, 3) │ 150 │
├─────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d │ (None, 14, 14, 3) │ 0 │
├─────────────────┼────────────────────────┼───────────────┤
│ flatten │ (None, 588) │ 0 │
├─────────────────┼────────────────────────┼───────────────┤
│ dense │ (None, 10) │ 5,890 │
└─────────────────┴────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer ┃ Output Shape ┃ Param # ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ flatten │ (None, 784) │ 0 │
├─────────────────┼────────────────────────┼───────────────┤
│ dense │ (None, 20) │ 15,700 │
├─────────────────┼────────────────────────┼───────────────┤
│ dense_1 │ (None, 10) │ 210 │
├─────────────────┼────────────────────────┼───────────────┤
│ dense_2 │ (None, 10) │ 110 │
└─────────────────┴────────────────────────┴───────────────┘
雖然使用了較少的參數,CNN表現比MLP高了約3.5個百分點;在10,000張測試組的圖片中,CNN誤認了233圖片,而MLP誤認了555張圖片。
cnn_model.evaluate(X_test, Y_test)
mlp_model.evaluate(X_test, Y_test)
# accuracy: 0.9710 - loss: 0.0913
# accuracy: 0.9360 - loss: 0.2251
最後讓我們來檢視100張誤認的影像,左上角的2被CNN辨識為3,而MLP辨識為9,第一行第二欄的9被兩個模型都辨識為8,第一行第三欄的4,兩模型都辨識為2。第一行第四欄的6,兩模型都辨識為0,若請一位有經驗的銀行行員來辨識,或許他(她)可以正確的辨識這些數字。
有趣的是第三行第八欄的7被兩模型都辨識為1,這個例子似乎有些困難,或許有經驗的工作人員將之判定為7。
結語
在面對類似的影像辨識作業時CNN所需要最佳化的參數總量小於相近的MLP模型,因此當分析影像愈來愈複雜時,單一影像檔所輸入的特徵值必然增多,因此在可用的方法及工具不變的情況下,若要達到類似的高辨識率,我們必須設置更多層的類神經網絡,且每一層網絡必然會有更多的節點,因此採用CNN成為較經濟的方法。
即使在此資料集中,一個簡易的CNN模型已經達到了超過97%的正確率,但在某些要求極高的應用場景,例如銀行匯兌,即使只有不到3%的錯誤率,也可能造成巨大的損失。我們可增加CNN模型的複雜度並採用Dropout的方法可以達到99.1% (驗證組)及98.8%(測試組)的正確率。
model = keras.models.Sequential(
[
keras.layers.Conv2D(
filters=32,
kernel_size=7,
activation="relu",
padding="SAME",
input_shape=[28, 28, 1],
),
keras.layers.MaxPooling2D(2),
keras.layers.Conv2D(
filters=128, kernel_size=3, activation="relu", padding="SAME"
),
keras.layers.MaxPooling2D(pool_size=2),
keras.layers.Flatten(),
keras.layers.Dense(units=128, activation="relu"),
keras.layers.Dropout(0.5),
keras.layers.Dense(units=64, activation="relu"),
keras.layers.Dropout(0.5),
keras.layers.Dense(units=10, activation="softmax"),
]
)
附錄
一、MLP及CNN的 Python 程式碼
import numpy as np
import argparse
from tensorflow import keras
import pickle as pkl
if __name__ == "__main__":
# python3 fit_mlp_model.py "tmp_data/mnist_data.pkl" "models/mlp_mnist_history.pkl" "models/mlp_mnist_model.keras" --nepoch 30
# python3 fit_mlp_model.py "tmp_data/fashion_mnist_data.pkl" "models/mlp_fashion_mnist_history.pkl" "models/mlp_fashion_mnist_model.keras" --nepoch 30
parser = argparse.ArgumentParser(description="Process MNIST data")
parser.add_argument("data_path", type=str, help="Path to the data file")
parser.add_argument("output_path", type=str, help="Path to the stored data file")
parser.add_argument("model_output", type=str, help="Path to the model description")
parser.add_argument("--nepoch", type=int, default=30, help="Number of epoch")
args = parser.parse_args()
DATA_DIR = args.data_path
OUTPUT_DIR = args.output_path
MODEL_DIR = args.model_output
nepoch = args.nepoch
print(f"Data path provided: {DATA_DIR}")
print(f"Data were stored: {OUTPUT_DIR}")
print(f"Model was stored: {OUTPUT_DIR}")
print(f"The number of epoch: {nepoch}")
with open(DATA_DIR, "rb") as file:
X_train, X_val, X_test, Y_train, Y_val, Y_test = pkl.load(file)
model = keras.models.Sequential(
[
keras.layers.Flatten(input_shape=[28, 28, 1]),
keras.layers.Dense(20, activation="relu"),
keras.layers.Dense(10, activation="relu"),
keras.layers.Dense(10, activation="softmax"),
]
)
print(model.summary())
print("Total number of the parameters is: ", model.count_params(), "\n")
model.compile(
loss="sparse_categorical_crossentropy", optimizer="sgd", metrics=["accuracy"]
)
history = model.fit(X_train, Y_train, epochs=nepoch, validation_data=(X_val, Y_val))
print("Evaluating the test set:\n")
model.evaluate(X_test, Y_test)
with open(OUTPUT_DIR, "wb") as file_obj:
pkl.dump(history, file_obj)
model.save(MODEL_DIR)
import numpy as np
import argparse
from tensorflow import keras
import pickle as pkl
if __name__ == "__main__":
# python3 fit_cnn_model.py "tmp_data/mnist_data.pkl" "models/mlp_mnist_history.pkl" "models/mlp_mnist_model.keras" --nepoch 30
# python3 fit_cnn_model.py "tmp_data/fashion_mnist_data.pkl" "models/mlp_fashion_mnist_history.pkl" "models/mlp_fashion_mnist_model.keras" --nepoch 30
parser = argparse.ArgumentParser(description="Process MNIST data")
parser.add_argument("data_path", type=str, help="Path to the data file")
parser.add_argument("output_path", type=str, help="Path to the stored data file")
parser.add_argument("model_output", type=str, help="Path to the model description")
parser.add_argument("--nepoch", type=int, default=30, help="Number of epoch")
args = parser.parse_args()
DATA_DIR = args.data_path
OUTPUT_DIR = args.output_path
MODEL_DIR = args.model_output
nepoch = args.nepoch
print(f"Data path provided: {DATA_DIR}")
print(f"Data were stored: {OUTPUT_DIR}")
print(f"Model was stored: {OUTPUT_DIR}")
print(f"The number of epoch: {nepoch}")
with open(DATA_DIR, "rb") as file:
X_train, X_val, X_test, Y_train, Y_val, Y_test = pkl.load(file)
model = keras.models.Sequential(
[
keras.layers.Conv2D(
filters=3,
kernel_size=7,
activation="relu",
padding="SAME",
input_shape=[28, 28, 1],
),
keras.layers.MaxPooling2D(2),
keras.layers.Flatten(),
keras.layers.Dense(units=10, activation="softmax"),
]
)
print(model.summary())
print("Total number of the parameters is: ", model.count_params())
model.compile(
loss="sparse_categorical_crossentropy", optimizer="sgd", metrics=["accuracy"]
)
history = model.fit(X_train, Y_train, epochs=nepoch, validation_data=(X_val, Y_val))
print("Evaluating the test set:\n")
model.evaluate(X_test, Y_test)
with open(OUTPUT_DIR, "wb") as file_obj:
pkl.dump(history, file_obj)
model.save(MODEL_DIR)
二、評估模型表現的繪圖碼
import pandas as pd
import matplotlib.pyplot as plt
cnn_df = pd.DataFrame(cnn_history.history)
mlp_df = pd.DataFrame(mlp_history.history)
# Shift the training metrics half an epoch to the left
cnn_df['accuracy'] = cnn_df['accuracy'].shift(-1)
cnn_df['loss'] = cnn_df['loss'].shift(-1)
mlp_df['accuracy'] = mlp_df['accuracy'].shift(-1)
mlp_df['loss'] = mlp_df['loss'].shift(-1)
# Drop the last row as it will have NaN values after the shift
cnn_df = cnn_df.dropna()
mlp_df = mlp_df.dropna()
cnn_df['epoch'] = list(sequence)
mlp_df['epoch'] = list(sequence)
cnn_df['source'] = 'CNN'
mlp_df['source'] = 'MLP'
# Concatenate the DataFrames
combined_df = pd.concat([cnn_df, mlp_df], ignore_index=True)
# Reset index
combined_df.reset_index(drop=True, inplace=True)
import matplotlib.pyplot as plt
# Plot accuracy
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
for source in combined_df['source'].unique():
subset = combined_df[combined_df['source'] == source]
plt.plot(subset['epoch'], subset['accuracy'], '-o', label=f'{source} Training set')
plt.plot(subset['epoch'], subset['val_accuracy'], '-o', label=f'{source} Validation set')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.title('Model Accuracy')
plt.legend()
# Plot loss
plt.subplot(1, 2, 2)
for source in combined_df['source'].unique():
subset = combined_df[combined_df['source'] == source]
plt.plot(subset['epoch'], subset['loss'], '-*', label=f'{source} Training set')
plt.plot(subset['epoch'], subset['val_loss'], '-*', label=f'{source} Validation set')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.tight_layout()
plt.show()
三、評估誤判的繪圖碼
import numpy as np
cnn_predicted_y_proba = cnn_model.predict(X_test)
mlp_predicted_y_proba = mlp_model.predict(X_test)
cnn_predicted_classes = np.argmax(cnn_predicted_y_proba, axis=1)
mlp_predicted_classes = np.argmax(mlp_predicted_y_proba, axis=1)
# Find the indices of incorrect predictions
cnn_incorrect_indices = np.where(cnn_predicted_classes != Y_test)[0]
mlp_incorrect_indices = np.where(mlp_predicted_classes != Y_test)[0]
common_values = np.intersect1d(mlp_incorrect_indices, cnn_incorrect_indices)
difference_values = np.setdiff1d(mlp_incorrect_indices, cnn_incorrect_indices)
print("Common values:", common_values)
n_rows = 5
n_cols = 10
plt.figure(figsize=(n_cols * 2, n_rows * 2))
for row in range(n_rows):
for col in range(n_cols):
index = n_cols * row + col
incorrect_index = common_values[index]
image = convert2image_format(X_test[incorrect_index,:])
label_data = Y_test[incorrect_index]
clabel = mlp_predicted_classes[incorrect_index]
mlabel = cnn_predicted_classes[incorrect_index]
combined_label = f"{label_data} c-{clabel} m-{mlabel}"
plt.subplot(n_rows, n_cols, index + 1)
plt.imshow(image, cmap="binary", interpolation="nearest")
plt.axis('off')
plt.title(combined_label, fontsize=20)
plt.subplots_adjust(wspace=0.2, hspace=0.5)
plt.tight_layout()
plt.show()
四、同樣模型辨識Fashion MNIST影像的結果
MLP及CNN對測試組的正確率分別是87%和88%,在不改變模型複雜度的情況下兩模型表現相當,即使使用更為複雜的CNN模型正確率也僅來到91%。從另一角度來看,CNN使用了少於MLP一半的參數總量即達到了相近的正確率。