NumPy#

NumPy is a fundamental library for computation in Python.

Additional resources:

import numpy as np

Python list#

Lets’s compare regular Python lists and NumPy arrays.

# A list is created with [.., ..]
myvals = [1.0, 2.0, 1.5]
myvals
[1.0, 2.0, 1.5]
type(myvals)
list

Numpy 1D array (vector)#

myvals_np = np.array([1.2, 3.0, 4.0])
myvals_np
array([1.2, 3. , 4. ])
type(myvals_np)
numpy.ndarray
myvals_np.dtype
dtype('float64')
myvals_np.sum()
8.2

Indexing#

x = np.array([1.0,1.5, 2.0, 5.3]) 
x
array([1. , 1.5, 2. , 5.3])
x[1]
1.5
x[-1]
5.3
x[1] = 2.0 # modify the second value in the array
x
array([1. , 2. , 2. , 5.3])

Slicing#

x[:2]
array([1., 2.])

Inline exercise

  1. Create an array x with three values: 1, 2, 3

  2. What is is the data type of x?

  3. Create a new array: y = x/2

  4. What is the data type of y?`

Math operations#

Python is a general purpose language not designed with numerical computing in mind.

However, NumPy is designed for numerical computing!

[1.2, 4.5] + [2.3, 4.3] # is this the result you expected??
[1.2, 4.5, 2.3, 4.3]
np.array([1.2, 4.5]) + np.array([2.3, 4.3]) 
array([3.5, 8.8])
np.array([1.2, 4.5]) * np.array([2.3, 4.3]) 
array([ 2.76, 19.35])

Note for Matlab users, all operators such as * are element wise

np.array([1.2, 4.5]) @ np.array([2.3, 4.3]) # in case you actually wanted to do a dot product
22.11
x = np.arange(5, 100, 5)
x
array([ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85,
       90, 95])
x.dtype # Integers!
dtype('int64')
x + 1 # add 1!
array([ 6, 11, 16, 21, 26, 31, 36, 41, 46, 51, 56, 61, 66, 71, 76, 81, 86,
       91, 96])
x = x + 3.0 # add a float to some integers, can we do that?
x
array([ 8., 13., 18., 23., 28., 33., 38., 43., 48., 53., 58., 63., 68.,
       73., 78., 83., 88., 93., 98.])
x.dtype # but now it became floats!
dtype('float64')
xr = np.random.random(10)
xr
array([0.35768268, 0.6889755 , 0.1592107 , 0.63488875, 0.32039466,
       0.03918149, 0.16808186, 0.06856908, 0.765837  , 0.60742217])
xr.mean()
0.38102438833192964
xr.std()
0.2592044030162411
xr.max()
0.7658370025427886
xr - xr.mean()
array([-0.02334171,  0.30795111, -0.22181369,  0.25386436, -0.06062973,
       -0.3418429 , -0.21294253, -0.31245531,  0.38481261,  0.22639778])
xn = np.random.normal(loc=5.0, scale=2.0, size=100)
xn[30] = 99.0
mu = xn.mean()
sigma = xn.std()

xn[xn < mu - 3*sigma]
array([], dtype=float64)
xn[xn > mu + 3*sigma]
array([99.])

Missing values (delete values)#

NumPy has support for missing values.

y = np.random.random(10)
y
array([0.77554858, 0.95861717, 0.48472919, 0.06205427, 0.27357326,
       0.9851281 , 0.63129154, 0.89431399, 0.683834  , 0.5429035 ])
y[5:] = np.nan
y
array([0.77554858, 0.95861717, 0.48472919, 0.06205427, 0.27357326,
              nan,        nan,        nan,        nan,        nan])
y.mean()
nan
np.nanmean(y)
0.5109044940187875
y * np.pi
array([2.43645773, 3.01158467, 1.52282166, 0.19494923, 0.85945573,
              nan,        nan,        nan,        nan,        nan])

Boolean indexing#

z = np.random.normal(loc=0.0, scale=3.0, size=10)

z_sorted = np.sort(z)
z_sorted
array([-5.15264602, -4.00480408, -3.2023718 ,  0.25670486,  0.6500434 ,
        1.00857388,  2.66170335,  4.42462713,  5.44320422,  5.5993012 ])
z<0.0
array([False, False,  True, False, False, False, False,  True, False,
        True])
z_sorted<0.0
array([ True,  True,  True, False, False, False, False, False, False,
       False])
z_sorted[z_sorted<0.0]
array([-5.15264602, -4.00480408, -3.2023718 ])
z_sorted[z_sorted<0.0] = 0.0
z_sorted
array([0.        , 0.        , 0.        , 0.25670486, 0.6500434 ,
       1.00857388, 2.66170335, 4.42462713, 5.44320422, 5.5993012 ])
np.where(z<0.0)
(array([2, 7, 9]),)
xn = np.random.normal(loc=5.0, scale=2.0, size=100)

xn[30] = 99.0 # outlier

median = np.median(xn)
sigma = xn.std()
sigma # sample std affected by outlier
9.567980749793008
xn[xn > median + 3*sigma] # but 1 abnormally high value
array([99.])
xn[xn > median + 3*sigma] = np.nan
np.nanstd(xn) # much closer to the true std==2.0
2.1388685746394196

2D arrays#

X = np.array([
              [0.0, 1.0, 2.0],
              [3.0, 4.0, 5.0]
])
    
X
array([[0., 1., 2.],
       [3., 4., 5.]])
X.shape
(2, 3)
nrows = X.shape[0]
nrows
2
ncols = X.shape[1]
ncols
3
X[0,0]
0.0
X[1,1]
4.0
X[-1,-1]
5.0
X[0,:]
array([0., 1., 2.])
X[0]
array([0., 1., 2.])
X.mean()
2.5
colmeans = X.mean(axis=0)
colmeans
array([1.5, 2.5, 3.5])
colmeans.shape
(3,)
rowmeans = X.mean(axis=1)
rowmeans
array([1., 4.])
X - colmeans
array([[-1.5, -1.5, -1.5],
       [ 1.5,  1.5,  1.5]])
X - rowmeans    # this will fail
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[63], line 1
----> 1 X - rowmeans    # this will fail

ValueError: operands could not be broadcast together with shapes (2,3) (2,) 

NumPy broadcasting (detailed explanation of how arrays can be used in expressions)

R = rowmeans[:, np.newaxis] # add a new dimension to create a 2D array
R
np.expand_dims(rowmeans, 1) # same result
X.shape
R.shape
X - R

Reshaping#

x = X.flatten()
x
x.reshape(2,3)