大数据分析机器学习(四) 神经网络-二元分类(1)

qianbo_insist 发布时间：2022-04-25 17:25:20 ，浏览量：0

数据准备

本次数据使用imdb数据, IMDB一般指互联网电影资料库, Internet Movie Database，简称IMDb,创建于1990年10月17日，隶属于亚马逊公司旗下网站. 此次的数据使用正向和负向二元，一种语论为正，叫positive，一种语论为负，为negtive，那么我们训练的数据就有neg，和pos，如下图所示：在这里插入图片描述

打开pos地下的文件，有很多，如下图所示，里面有很多语句在这里插入图片描述如下语句是0_9.txt Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as “Teachers”. My 35 years in the teaching profession lead me to believe that Bromwell High’s satire is much closer to reality than is “Teachers”. The scramble to survive financially, the insightful students who can see right through their pathetic teachers’ pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled … at … High. A classic line: INSPECTOR: I’m here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn’t! 这个是正向的语句，代表1 ，而neg文件夹里面的词语代表0，还有一个文件叫imdb.vocab,这个文件集合了所有的词语，如下图所示：

在这里插入图片描述每一条正向和负向的语句都会从该文件中查找到索引，其实这个文件就是维度文件，要这么理解，每一个单词都是一个维度。

以上数据看出，做一个数据集真的是不容易，反而是代码非常简单，此次我们使用tensorflow来做。

神经网络基础

在tensorflow里面，使用神经网络比较简单，整个过程如下： 1 定义训练数据 2 定义网络 3 配置学习过程 4 调用模型的fit在训练数据上进行迭代

model = keras.Sequential([
    layers.Dense(16, activation="relu"),
    layers.Dense(16, activation="relu"),
    layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
model.fit(x_train, y_train, epochs=4, batch_size=512)
results = model.evaluate(x_test, y_test)

建立16个输出单元的层，激活函数为relu，在建立一层16个输出单元，激活为relu，最后输出为1个层，激活函数为sigmoid，有不理解的地方查一下，为什么最后为一个输出，因为是二元，要么是1，要么是0，也就是要么是正向的，要么是负向的，理解一下。

code

from keras.datasets import imdb
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
word_index = imdb.get_word_index()
#print(word_index)
print((train_data[0]))
#print([max(sequence) for sequence in train_data])
word_index = imdb.get_word_index()
reverse_word_index = dict(
    [(value, key) for (key, value) in word_index.items()])
decoded_review = " ".join(
    [reverse_word_index.get(i - 3, "?") for i in train_data[0]])


def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        for j in sequence:
            results[i, j] = 1.
    return results
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
print(x_train[0])
y_train = np.asarray(train_labels).astype("float32")
y_test = np.asarray(test_labels).astype("float32")

model = keras.Sequential([
    layers.Dense(16, activation="relu"),
    layers.Dense(16, activation="relu"),
    layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
x_val = x_train[:10000]
partial_x_train = x_train[10000:]
y_val = y_train[:10000]
partial_y_train = y_train[10000:]
history = model.fit(partial_x_train,
                    partial_y_train,
                    epochs=20,
                    batch_size=512,
                    validation_data=(x_val, y_val))
history_dict = history.history
loss_values = history_dict["loss"]
val_loss_values = history_dict["val_loss"]
epochs = range(1, len(loss_values) + 1)
plt.plot(epochs, loss_values, "bo", label="Training loss")
plt.plot(epochs, val_loss_values, "b", label="Validation loss")
plt.title("Training and validation loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.show()

以下是训练过程在这里插入图片描述以下是训练过程的Epochs图：一个完整的数据集通过了神经网络一次并且返回了一次，这个过程称为一次 Epoch，正向传播和反向传播都经历了，所有数据集识别和提取一次并反馈一次。

在这里插入图片描述

数据识别

先理解概念，下一次讲了。。。。

关注

打赏

1663161521

查看更多评论

大数据分析机器学习(四) 神经网络-二元分类(1)

最近更新

热门博客

[ 申请 ]友情链接：