转载自:https://blog.csdn.net/u012609509/article/details/78554709
StandardScaler
作用:去均值和方差归一化。且是针对每一个特征维度来做的,而不是针对样本。 StandardScaler对每列分别标准化,因为shape of data: [n_samples, n_features]
【注:】 并不是所有的标准化都能给estimator带来好处。 “Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual feature do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).”实例代码
# coding=utf-8# 统计训练集的 mean 和 std 信息from sklearn.preprocessing import StandardScalerimport numpy as npdef test_algorithm(): np.random.seed(123) print('use sklearn') # 注:shape of data: [n_samples, n_features] data = np.random.randn(10, 4) scaler = StandardScaler() scaler.fit(data) trans_data = scaler.transform(data) print('original data: ') print data print('transformed data: ') print trans_data print('scaler info: scaler.mean_: {}, scaler.var_: {}'.format(scaler.mean_, scaler.var_)) print('\n') print('use numpy by self') mean = np.mean(data, axis=0) std = np.std(data, axis=0) var = std * std print('mean: {}, std: {}, var: {}'.format(mean, std, var)) # numpy 的广播功能 another_trans_data = data - mean # 注:是除以标准差 another_trans_data = another_trans_data / std print('another_trans_data: ') print another_trans_dataif __name__ == '__main__': test_algorithm()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
程序的输出如下:
use sklearn original data: [[-1.0856306 0.99734545 0.2829785 - 1.50629471] [-0.57860025 1.65143654 - 2.42667924 - 0.42891263] [1.26593626 - 0.8667404 - 0.67888615 - 0.09470897] [1.49138963 - 0.638902 - 0.44398196 - 0.43435128] [2.20593008 2.18678609 1.0040539 0.3861864] [0.73736858 1.49073203 - 0.93583387 1.17582904] [-1.25388067 - 0.6377515 0.9071052 - 1.4286807] [-0.14006872 - 0.8617549 - 0.25561937 - 2.79858911] [-1.7715331 - 0.69987723 0.92746243 - 0.17363568] [0.00284592 0.68822271 - 0.87953634 0.28362732]] transformed data: [[-0.94511643 0.58665507 0.5223171 - 0.93064483] [-0.53659117 1.16247784 - 2.13366794 0.06768082] [0.9495916 - 1.05437488 - 0.42049501 0.3773612] [1.13124423 - 0.85379954 - 0.19024378 0.06264126] [1.70696485 1.63376764 1.22910949 0.8229693] [0.52371324 1.02100318 - 0.67235312 1.55466934] [-1.08067913 - 0.85278672 1.13408114 - 0.858726] [-0.18325687 - 1.04998594 - 0.00561227 - 2.1281129] [-1.49776284 - 0.9074785 1.15403514 0.30422599] [-0.06810748 0.31452186 - 0.61717074 0.72793583]] scaler info: scaler.mean_: [0.08737571 0.33094968 - 0.24989369 - 0.50195303], scaler.var_: [1.54038781 1.29032409 1.04082479 1.16464894] use numpy by self mean: [0.08737571 0.33094968 - 0.24989369 - 0.50195303], std: [1.24112361 1.13592433 1.02020821 1.07918902], var: [1.54038781 1.29032409 1.04082479 1.16464894] another_trans_data: [[-0.94511643 0.58665507 0.5223171 - 0.93064483] [-0.53659117 1.16247784 - 2.13366794 0.06768082] [0.9495916 - 1.05437488 - 0.42049501 0.3773612] [1.13124423 - 0.85379954 - 0.19024378 0.06264126] [1.70696485 1.63376764 1.22910949 0.8229693] [0.52371324 1.02100318 - 0.67235312 1.55466934] [-1.08067913 - 0.85278672 1.13408114 - 0.858726] [-0.18325687 - 1.04998594 - 0.00561227 - 2.1281129] [-1.49776284 - 0.9074785 1.15403514 0.30422599] [-0.06810748 0.31452186 - 0.61717074 0.72793583]]
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61