NumPy#
NumPy is a fundamental library for computation in Python.
Additional resources:
import numpy as np
Python list#
Lets’s compare regular Python lists and NumPy arrays.
# A list is created with [.., ..]
myvals = [1.0, 2.0, 1.5]
myvals
[1.0, 2.0, 1.5]
type(myvals)
list
Numpy 1D array (vector)#
myvals_np = np.array([1.2, 3.0, 4.0])
myvals_np
array([1.2, 3. , 4. ])
type(myvals_np)
numpy.ndarray
myvals_np.dtype
dtype('float64')
myvals_np.sum()
np.float64(8.2)
Indexing#
x = np.array([1.0,1.5, 2.0, 5.3])
x
array([1. , 1.5, 2. , 5.3])
x[1]
np.float64(1.5)
x[-1]
np.float64(5.3)
x[1] = 2.0 # modify the second value in the array
x
array([1. , 2. , 2. , 5.3])
Slicing#
x[:2]
array([1., 2.])
Inline exercise
Create an array
x
with three values: 1, 2, 3What is is the data type of
x
?Create a new array:
y = x/2
What is the data type of
y
?`
Math operations#
Python is a general purpose language not designed with numerical computing in mind.
However, NumPy is designed for numerical computing!
[1.2, 4.5] + [2.3, 4.3] # is this the result you expected??
[1.2, 4.5, 2.3, 4.3]
np.array([1.2, 4.5]) + np.array([2.3, 4.3])
array([3.5, 8.8])
np.array([1.2, 4.5]) * np.array([2.3, 4.3])
array([ 2.76, 19.35])
Note for Matlab users, all operators such as *
are element wise
np.array([1.2, 4.5]) @ np.array([2.3, 4.3]) # in case you actually wanted to do a dot product
np.float64(22.11)
x = np.arange(5, 100, 5)
x
array([ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85,
90, 95])
x.dtype # Integers!
dtype('int64')
x + 1 # add 1!
array([ 6, 11, 16, 21, 26, 31, 36, 41, 46, 51, 56, 61, 66, 71, 76, 81, 86,
91, 96])
x = x + 3.0 # add a float to some integers, can we do that?
x
array([ 8., 13., 18., 23., 28., 33., 38., 43., 48., 53., 58., 63., 68.,
73., 78., 83., 88., 93., 98.])
x.dtype # but now it became floats!
dtype('float64')
xr = np.random.random(10)
xr
array([0.31035755, 0.45839361, 0.80595107, 0.00978142, 0.47022436,
0.20772032, 0.74551387, 0.69377802, 0.70482485, 0.38023257])
xr.mean()
np.float64(0.4786777645232451)
xr.std()
np.float64(0.24647671774390265)
xr.max()
np.float64(0.805951068531288)
xr - xr.mean()
array([-0.16832022, -0.02028415, 0.3272733 , -0.46889634, -0.00845341,
-0.27095744, 0.26683611, 0.21510026, 0.22614709, -0.09844519])
xn = np.random.normal(loc=5.0, scale=2.0, size=100)
xn[30] = 99.0
mu = xn.mean()
sigma = xn.std()
xn[xn < mu - 3*sigma]
array([], dtype=float64)
xn[xn > mu + 3*sigma]
array([99.])
Missing values (delete values)#
NumPy has support for missing values.
y = np.random.random(10)
y
array([0.01428201, 0.98171471, 0.42694689, 0.77372733, 0.4008848 ,
0.74154504, 0.23949779, 0.31040726, 0.1831805 , 0.20518675])
y[5:] = np.nan
y
array([0.01428201, 0.98171471, 0.42694689, 0.77372733, 0.4008848 ,
nan, nan, nan, nan, nan])
y.mean()
np.float64(nan)
np.nanmean(y)
np.float64(0.5195111487289672)
y * np.pi
array([0.04486827, 3.08414771, 1.34129322, 2.43073611, 1.25941673,
nan, nan, nan, nan, nan])
Boolean indexing#
z = np.random.normal(loc=0.0, scale=3.0, size=10)
z_sorted = np.sort(z)
z_sorted
array([-5.86072832, -3.27283618, -3.02898487, -2.86986327, -1.93996782,
-1.83239882, 0.61938381, 0.88391902, 2.37416647, 3.69812726])
z<0.0
array([False, True, True, True, False, False, False, True, True,
True])
z_sorted<0.0
array([ True, True, True, True, True, True, False, False, False,
False])
z_sorted[z_sorted<0.0]
array([-5.86072832, -3.27283618, -3.02898487, -2.86986327, -1.93996782,
-1.83239882])
z_sorted[z_sorted<0.0] = 0.0
z_sorted
array([0. , 0. , 0. , 0. , 0. ,
0. , 0.61938381, 0.88391902, 2.37416647, 3.69812726])
np.where(z<0.0)
(array([1, 2, 3, 7, 8, 9]),)
xn = np.random.normal(loc=5.0, scale=2.0, size=100)
xn[30] = 99.0 # outlier
median = np.median(xn)
sigma = xn.std()
sigma # sample std affected by outlier
np.float64(9.54656989459206)
xn[xn > median + 3*sigma] # but 1 abnormally high value
array([99.])
xn[xn > median + 3*sigma] = np.nan
np.nanstd(xn) # much closer to the true std==2.0
np.float64(1.8754785963392921)
2D arrays#
X = np.array([
[0.0, 1.0, 2.0],
[3.0, 4.0, 5.0]
])
X
array([[0., 1., 2.],
[3., 4., 5.]])
X.shape
(2, 3)
nrows = X.shape[0]
nrows
2
ncols = X.shape[1]
ncols
3
X[0,0]
np.float64(0.0)
X[1,1]
np.float64(4.0)
X[-1,-1]
np.float64(5.0)
X[0,:]
array([0., 1., 2.])
X[0]
array([0., 1., 2.])
X.mean()
np.float64(2.5)
colmeans = X.mean(axis=0)
colmeans
array([1.5, 2.5, 3.5])
colmeans.shape
(3,)
rowmeans = X.mean(axis=1)
rowmeans
array([1., 4.])
X - colmeans
array([[-1.5, -1.5, -1.5],
[ 1.5, 1.5, 1.5]])
# X - rowmeans # executing this will fail
NumPy broadcasting (detailed explanation of how arrays can be used in expressions)
R = rowmeans[:, np.newaxis] # add a new dimension to create a 2D array
R
array([[1.],
[4.]])
np.expand_dims(rowmeans, 1) # same result
array([[1.],
[4.]])
X.shape
(2, 3)
R.shape
(2, 1)
X - R
array([[-1., 0., 1.],
[-1., 0., 1.]])
Reshaping#
x = X.flatten()
x
array([0., 1., 2., 3., 4., 5.])
x.reshape(2,3)
array([[0., 1., 2.],
[3., 4., 5.]])