博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
sklearn 数据预处理1: StandardScaler
阅读量:5155 次
发布时间:2019-06-13

本文共 4049 字,大约阅读时间需要 13 分钟。

转载自:https://blog.csdn.net/u012609509/article/details/78554709

StandardScaler

作用:去均值和方差归一化。且是针对每一个特征维度来做的,而不是针对样本。 StandardScaler对每列分别标准化,因为shape of data: [n_samples, n_features]

【注:】
并不是所有的标准化都能给estimator带来好处。
“Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual feature do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).”

实例代码

# coding=utf-8# 统计训练集的 mean 和 std 信息from sklearn.preprocessing import StandardScalerimport numpy as npdef test_algorithm():    np.random.seed(123)    print('use sklearn')    # 注:shape of data: [n_samples, n_features]    data = np.random.randn(10, 4)    scaler = StandardScaler()    scaler.fit(data)    trans_data = scaler.transform(data)    print('original data: ')    print data    print('transformed data: ')    print trans_data    print('scaler info: scaler.mean_: {}, scaler.var_: {}'.format(scaler.mean_, scaler.var_))    print('\n')    print('use numpy by self')    mean = np.mean(data, axis=0)    std = np.std(data, axis=0)    var = std * std    print('mean: {}, std: {}, var: {}'.format(mean, std, var))    # numpy 的广播功能    another_trans_data = data - mean    # 注:是除以标准差    another_trans_data = another_trans_data / std    print('another_trans_data: ')    print another_trans_dataif __name__ == '__main__':    test_algorithm()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37

程序的输出如下:

use sklearn    original data:    [[-1.0856306   0.99734545  0.2829785 - 1.50629471]     [-0.57860025  1.65143654 - 2.42667924 - 0.42891263]    [1.26593626 - 0.8667404 - 0.67888615 - 0.09470897]    [1.49138963 - 0.638902 - 0.44398196 - 0.43435128]    [2.20593008    2.18678609    1.0040539    0.3861864]    [0.73736858  1.49073203 - 0.93583387  1.17582904]    [-1.25388067 - 0.6377515    0.9071052 - 1.4286807]    [-0.14006872 - 0.8617549 - 0.25561937 - 2.79858911]    [-1.7715331 - 0.69987723    0.92746243 - 0.17363568]    [0.00284592  0.68822271 - 0.87953634  0.28362732]]    transformed    data:    [[-0.94511643  0.58665507  0.5223171 - 0.93064483]     [-0.53659117  1.16247784 - 2.13366794  0.06768082]    [0.9495916 - 1.05437488 - 0.42049501    0.3773612]    [1.13124423 - 0.85379954 - 0.19024378  0.06264126]    [1.70696485    1.63376764    1.22910949    0.8229693]    [0.52371324  1.02100318 - 0.67235312  1.55466934]    [-1.08067913 - 0.85278672    1.13408114 - 0.858726]    [-0.18325687 - 1.04998594 - 0.00561227 - 2.1281129]    [-1.49776284 - 0.9074785    1.15403514    0.30422599]    [-0.06810748  0.31452186 - 0.61717074  0.72793583]]    scaler info: scaler.mean_: [0.08737571  0.33094968 - 0.24989369 - 0.50195303], scaler.var_: [1.54038781  1.29032409                                                                                          1.04082479  1.16464894]    use numpy by self    mean: [0.08737571  0.33094968 - 0.24989369 - 0.50195303], std: [1.24112361  1.13592433  1.02020821                                                                    1.07918902], var: [1.54038781  1.29032409                                                                                       1.04082479  1.16464894]    another_trans_data:    [[-0.94511643  0.58665507  0.5223171 - 0.93064483]     [-0.53659117  1.16247784 - 2.13366794  0.06768082]    [0.9495916 - 1.05437488 - 0.42049501    0.3773612]    [1.13124423 - 0.85379954 - 0.19024378  0.06264126]    [1.70696485    1.63376764    1.22910949    0.8229693]    [0.52371324  1.02100318 - 0.67235312  1.55466934]    [-1.08067913 - 0.85278672    1.13408114 - 0.858726]    [-0.18325687 - 1.04998594 - 0.00561227 - 2.1281129]    [-1.49776284 - 0.9074785    1.15403514    0.30422599]    [-0.06810748  0.31452186 - 0.61717074  0.72793583]]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61

参考网址

转载于:https://www.cnblogs.com/super-saiyan-blue/p/9330833.html

你可能感兴趣的文章
建立,查询二叉树 hdu 5444
查看>>
[Spring框架]Spring 事务管理基础入门总结.
查看>>
2017.3.24上午
查看>>
Python-常用模块及简单的案列
查看>>
(VC/MFC)多线程(Multi-Threading) -1. 基本概念.
查看>>
快数据时代下,Moka携手DataPipeline提升招聘效能
查看>>
day1 用户登陆三次机会
查看>>
LeetCode 159. Longest Substring with At Most Two Distinct Characters
查看>>
LeetCode Ones and Zeroes
查看>>
基本算法概论
查看>>
jquery动态移除/增加onclick属性详解
查看>>
css important
查看>>
KindEditor图片上传到七牛云
查看>>
JavaScript---Promise
查看>>
暖暖的感动
查看>>
Java中的日期和时间
查看>>
Django基于admin的stark组件创建(一)
查看>>
批处理/DOS命令删除文件夹下某类型的文件
查看>>
模板 - 数学 - 矩阵快速幂
查看>>
优秀的持久层框架Mybatis,连接数据库快人一步
查看>>