# NumPy

[NumPy](https://numpy.org/) is a fundamental library for computation in Python. 

Additional resources: 

* [NumPy Quickstart](https://numpy.org/doc/stable/user/quickstart.html) 
* [NumPy absolute basics](https://numpy.org/doc/stable/user/absolute_beginners.html)
* [NumPy for MATLAB users](https://numpy.org/doc/stable/user/numpy-for-matlab-users.html)


In [1]:
import numpy as np

## Python list

Lets's compare regular Python lists and NumPy arrays.

In [2]:
# A list is created with [.., ..]
myvals = [1.0, 2.0, 1.5]
myvals

[1.0, 2.0, 1.5]

In [3]:
type(myvals)

list

## Numpy 1D array (vector)

In [4]:
myvals_np = np.array([1.2, 3.0, 4.0])

In [5]:
myvals_np

array([1.2, 3. , 4. ])

In [6]:
type(myvals_np)

numpy.ndarray

In [7]:
myvals_np.dtype

dtype('float64')

In [8]:
myvals_np.sum()

8.2

## Indexing

In [9]:
x = np.array([1.0,1.5, 2.0, 5.3]) 
x

array([1. , 1.5, 2. , 5.3])

In [10]:
x[1]

1.5

In [11]:
x[-1]

5.3

In [12]:
x[1] = 2.0 # modify the second value in the array
x

array([1. , 2. , 2. , 5.3])

## Slicing

In [13]:
x[:2]

array([1., 2.])

**Inline exercise**

1. Create an array `x` with three values: 1, 2, 3
2. What is is the data type of `x`?
3. Create a new array: `y = x/2`
4. What is the data type of `y`?`


## Math operations

Python is a general purpose language *not* designed with numerical computing in mind.

However, NumPy is designed for numerical computing!

In [14]:
[1.2, 4.5] + [2.3, 4.3] # is this the result you expected??

[1.2, 4.5, 2.3, 4.3]

In [15]:
np.array([1.2, 4.5]) + np.array([2.3, 4.3]) 

array([3.5, 8.8])

In [16]:
np.array([1.2, 4.5]) * np.array([2.3, 4.3]) 

array([ 2.76, 19.35])

*Note for Matlab users, all operators such as `*` are element wise*

In [17]:
np.array([1.2, 4.5]) @ np.array([2.3, 4.3]) # in case you actually wanted to do a dot product

22.11

In [18]:
x = np.arange(5, 100, 5)
x

array([ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85,
       90, 95])

In [19]:
x.dtype # Integers!

dtype('int64')

In [20]:
x + 1 # add 1!

array([ 6, 11, 16, 21, 26, 31, 36, 41, 46, 51, 56, 61, 66, 71, 76, 81, 86,
       91, 96])

In [21]:
x = x + 3.0 # add a float to some integers, can we do that?
x

array([ 8., 13., 18., 23., 28., 33., 38., 43., 48., 53., 58., 63., 68.,
       73., 78., 83., 88., 93., 98.])

In [22]:
x.dtype # but now it became floats!

dtype('float64')

In [23]:
xr = np.random.random(10)
xr

array([0.09419073, 0.84322101, 0.57117251, 0.57363163, 0.67681287,
       0.11278987, 0.13718663, 0.79112221, 0.77402251, 0.65204329])

In [24]:
xr.mean()

0.5226193260197219

In [25]:
xr.std()

0.27993540072223144

In [26]:
xr.max()

0.8432210116882018

In [27]:
xr - xr.mean()

array([-0.4284286 ,  0.32060169,  0.04855318,  0.05101231,  0.15419354,
       -0.40982945, -0.3854327 ,  0.26850288,  0.25140319,  0.12942397])

In [28]:
xn = np.random.normal(loc=5.0, scale=2.0, size=100)
xn[30] = 99.0

In [29]:
mu = xn.mean()
sigma = xn.std()

xn[xn < mu - 3*sigma]

array([], dtype=float64)

In [30]:
xn[xn > mu + 3*sigma]

array([99.])

## Missing values (delete values)

NumPy has support for missing values.

In [31]:
y = np.random.random(10)
y

array([0.40245995, 0.08616727, 0.31967252, 0.12899401, 0.55626325,
       0.69120526, 0.07342892, 0.63192764, 0.59825283, 0.88321703])

In [32]:
y[5:] = np.nan
y

array([0.40245995, 0.08616727, 0.31967252, 0.12899401, 0.55626325,
              nan,        nan,        nan,        nan,        nan])

In [33]:
y.mean()

nan

In [34]:
np.nanmean(y)

0.29871140025253223

In [35]:
y * np.pi

array([1.26436523, 0.27070245, 1.00428083, 0.40524664, 1.74755255,
              nan,        nan,        nan,        nan,        nan])

## Boolean indexing

In [36]:
z = np.random.normal(loc=0.0, scale=3.0, size=10)

z_sorted = np.sort(z)
z_sorted

array([-5.025737  , -4.51460712, -4.03576839, -3.18565794, -1.88822797,
        0.23355421,  2.23941407,  2.53201344,  3.33607068,  4.86455053])

In [37]:
z<0.0

array([ True, False,  True,  True, False, False, False,  True,  True,
       False])

In [38]:
z_sorted<0.0

array([ True,  True,  True,  True,  True, False, False, False, False,
       False])

In [39]:
z_sorted[z_sorted<0.0]

array([-5.025737  , -4.51460712, -4.03576839, -3.18565794, -1.88822797])

In [40]:
z_sorted[z_sorted<0.0] = 0.0

In [41]:
z_sorted

array([0.        , 0.        , 0.        , 0.        , 0.        ,
       0.23355421, 2.23941407, 2.53201344, 3.33607068, 4.86455053])

In [42]:
np.where(z<0.0)

(array([0, 2, 3, 7, 8]),)

In [43]:
xn = np.random.normal(loc=5.0, scale=2.0, size=100)

xn[30] = 99.0 # outlier

median = np.median(xn)
sigma = xn.std()

In [44]:
sigma # sample std affected by outlier

9.558148096349761

In [45]:
xn[xn > median + 3*sigma] # but 1 abnormally high value

array([99.])

In [46]:
xn[xn > median + 3*sigma] = np.nan

In [47]:
np.nanstd(xn) # much closer to the true std==2.0

1.9609121963552831

## 2D arrays

In [48]:
X = np.array([
              [0.0, 1.0, 2.0],
              [3.0, 4.0, 5.0]
])
    

In [49]:
X

array([[0., 1., 2.],
       [3., 4., 5.]])

In [50]:
X.shape

(2, 3)

In [51]:
nrows = X.shape[0]
nrows

2

In [52]:
ncols = X.shape[1]
ncols

3

In [53]:
X[0,0]

0.0

In [54]:
X[1,1]

4.0

In [55]:
X[-1,-1]

5.0

In [56]:
X[0,:]

array([0., 1., 2.])

In [57]:
X[0]

array([0., 1., 2.])

In [58]:
X.mean()

2.5

In [59]:
colmeans = X.mean(axis=0)
colmeans

array([1.5, 2.5, 3.5])

In [60]:
colmeans.shape

(3,)

In [61]:
rowmeans = X.mean(axis=1)
rowmeans

array([1., 4.])

In [62]:
X - colmeans

array([[-1.5, -1.5, -1.5],
       [ 1.5,  1.5,  1.5]])

In [63]:
# X - rowmeans    # executing this will fail

[NumPy broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html) (detailed explanation of how arrays can be used in expressions)

In [64]:
R = rowmeans[:, np.newaxis] # add a new dimension to create a 2D array
R

array([[1.],
       [4.]])

In [65]:
np.expand_dims(rowmeans, 1) # same result

array([[1.],
       [4.]])

In [66]:
X.shape

(2, 3)

In [67]:
R.shape

(2, 1)

In [68]:
X - R

array([[-1.,  0.,  1.],
       [-1.,  0.,  1.]])

## Reshaping

In [69]:
x = X.flatten()

In [70]:
x

array([0., 1., 2., 3., 4., 5.])

In [71]:
x.reshape(2,3)

array([[0., 1., 2.],
       [3., 4., 5.]])