Arrays reorganizing#

asarray()#

The asarray() function is used when you want to convert an input to an array. The input could be a lists, tuple, ndarray, etc.

Syntax:

numpy.asarray(data, dtype=None, order=None)[source]
  • data: Data that you want to convert to an array

  • dtype: This is an optional argument. If not specified, the data type is inferred from the input data

  • Order: Default is C which is an essential row style. Other option is F (Fortan-style)

# Consider the following 2-D matrix with four rows and four columns filled by 1

import numpy as np

a = np.matrix(np.ones((4,4)))

If you want to change the value of the matrix, you cannot. The reason is, it is not possible to change a copy.

np.array(a)[2]=3
print(a)  # value won't change in result
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]

Matrix is immutable. You can use asarray if you want to add modification in the original array. Let’s see if any change occurs when you want to change the value of the third rows with the value 2

np.asarray(a)[2]=2 # np.asarray(A): converts the matrix A to an array
print(a)
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [2. 2. 2. 2.]
 [1. 1. 1. 1.]]

arange()#

The arange() is an inbuilt numpy function that returns an ndarray object containing evenly spaced values within a defined interval. For instance, you want to create values from 1 to 10; you can use arange() function.

Syntax:

numpy.arange(start, stop,step) 
  • start: Start of interval

  • stop: End of interval

  • step: Spacing between values. Default step is 1

# Example 1:

import numpy as np
np.arange(1, 11)
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

If you want to change the step, you can add a third number in the parenthesis. It will change the step.

# Example 2:

import numpy as np
np.arange(1, 14, 4)
array([ 1,  5,  9, 13])
np.arange(0,11,2)   # even no by adding a step size
array([ 0,  2,  4,  6,  8, 10])
np.arange(1,11,2)   # odd no
array([1, 3, 5, 7, 9])

Reshape Data#

In some occasions, you need to reshape the data from wide to long. You can use the reshape function for this.

Syntax:

numpy.reshape(a, newShape, order='C')
  • a: Array that you want to reshape

  • newShape: The new desires shape

  • order: Default is C which is an essential row style.

import numpy as np

e  = np.array([(1,2,3), (4,5,6)])
print(e)
e.reshape(3,2)
[[1 2 3]
 [4 5 6]]
array([[1, 2],
       [3, 4],
       [5, 6]])

Broadcasting with array reorganizing#

It’s super cool and super useful. The one-line explanation is that when doing elementwise operations, things expand to the “correct” shape.

# add a scalar to a 1-d array
x = np.arange(5)
print('x:  ', x)
print('x+1:', x + 1, end='\n\n')

y = np.random.uniform(size=(2, 5))
print('y:  ', y,  sep='\n')
print('y+1:', y + 1, sep='\n')
x:   [0 1 2 3 4]
x+1: [1 2 3 4 5]

y:  
[[0.20040757 0.96248829 0.29534113 0.14204329 0.00786338]
 [0.68572271 0.22357762 0.88618684 0.91174698 0.87380356]]
y+1:
[[1.20040757 1.96248829 1.29534113 1.14204329 1.00786338]
 [1.68572271 1.22357762 1.88618684 1.91174698 1.87380356]]

Since x is shaped (5,) and y is shaped (2,5) we can do operations between them.

x * y
array([[0.        , 0.96248829, 0.59068226, 0.42612988, 0.03145352],
       [0.        , 0.22357762, 1.77237368, 2.73524094, 3.49521424]])

Without broadcasting we’d have to manually reshape our arrays, which quickly gets annoying.

x.reshape(1, -1).repeat(2, axis=0) * y
array([[0.        , 0.96248829, 0.59068226, 0.42612988, 0.03145352],
       [0.        , 0.22357762, 1.77237368, 2.73524094, 3.49521424]])
before = np.array([[1,2,3,4],[5,6,7,8]])
print(before)

after = before.reshape((2,4))
print(after)
[[1 2 3 4]
 [5 6 7 8]]
[[1 2 3 4]
 [5 6 7 8]]

Flatten Data#

When you deal with some neural network like convnet, you need to flatten the array. You can use flatten().

Syntax:

numpy.flatten(order='C')
  • a: Array that you want to reshape

  • newShape: The new desires shape

  • order: Default is C which is an essential row style.

e.flatten()
array([1, 2, 3, 4, 5, 6])

What is hstack?#

With hstack you can appened data horizontally. This is a very convinient function in Numpy. Lets study it with an example:

## Horitzontal Stack

import numpy as np
f = np.array([1,2,3])
g = np.array([4,5,6])

print('Horizontal Append:', np.hstack((f, g)))
Horizontal Append: [1 2 3 4 5 6]
# Horizontal  stack

h1 = np.ones((2,4))
h2 = np.zeros((2,2))

np.hstack((h1,h2))
array([[1., 1., 1., 1., 0., 0.],
       [1., 1., 1., 1., 0., 0.]])

What is vstack?#

With vstack you can appened data vertically. Lets study it with an example:

## Vertical Stack

import numpy as np
f = np.array([1,2,3])
g = np.array([4,5,6])

print('Vertical Append:', np.vstack((f, g)))
Vertical Append: [[1 2 3]
 [4 5 6]]
# Vertically stacking vectors

v1 = np.array([1,2,3,4])
v2 = np.array([5,6,7,8])

np.vstack([v1,v2,v1,v2])
array([[1, 2, 3, 4],
       [5, 6, 7, 8],
       [1, 2, 3, 4],
       [5, 6, 7, 8]])

Generate Random Numbers#

To generate random numbers for Gaussian distribution use:

Syntax:

numpy.random.normal(loc, scale, size)
  • loc: the mean. The center of distribution

  • scale: standard deviation.

  • size: number of returns

## Generate random number from normal distribution

normal_array = np.random.normal(5, 0.5, 10)
print(normal_array)
[4.73370838 5.02319487 5.03247675 4.63238643 5.20491465 4.23379671
 4.65235094 4.03461004 4.81788758 5.40603744]

Linspace#

Linspace gives evenly spaced samples.

Syntax:

numpy.linspace(start, stop, num, endpoint)
  • start: Start of sequence

  • stop: End of sequence

  • num: Number of samples to generate. Default is 50

  • endpoint: If True (default), stop is the last value. If False, stop value is not included.

# Example:

import numpy as np
np.linspace(0,10,6) 
array([ 0.,  2.,  4.,  6.,  8., 10.])
# Example: For instance, it can be used to create 10 values from 1 to 5 evenly spaced.

import numpy as np
np.linspace(1.0, 5.0, num=10)
array([1.        , 1.44444444, 1.88888889, 2.33333333, 2.77777778,
       3.22222222, 3.66666667, 4.11111111, 4.55555556, 5.        ])

If you do not want to include the last digit in the interval, you can set endpoint to False

np.linspace(1.0, 5.0, num=5, endpoint=False)
array([1. , 1.8, 2.6, 3.4, 4.2])

LogSpace#

LogSpace returns even spaced numbers on a log scale. Logspace has the same parameters as np.linspace.

Syntax:

numpy.logspace(start, stop, num, endpoint)
  • start: Start of sequence

  • stop: End of sequence

  • num: Number of samples to generate. Default is 50

  • endpoint: If True (default), stop is the last value. If False, stop value is not included.

# Example:

np.logspace(3.0, 4.0, num=4)
array([ 1000.        ,  2154.43469003,  4641.58883361, 10000.        ])

Finaly, if you want to check the memory size of an element in an array, you can use .itemsize

x = np.array([1,2,3], dtype=np.complex128)
x.itemsize
16

Statistics#

NumPy has quite a few useful statistical functions for finding minimum, maximum, percentile standard deviation and variance, etc from the given elements in the array. The functions are explained as follows −

Numpy is equipped with the robust statistical function as listed below:

Function

Numpy

Min

np.min()

Max

np.max()

Mean

np.mean()

Median

np.median()

Standard deviation

np.std()

# Consider the following Array

import numpy as np

normal_array = np.random.normal(5, 0.5, 10)
print(normal_array)	
[4.57057605 5.76249894 4.09935016 5.38336447 4.96652011 5.29402596
 4.79025614 6.00232788 5.49574452 5.60059908]
# Example:Statistical function

### Min 
print(np.min(normal_array))

### Max 
print(np.max(normal_array))

### Mean 
print(np.mean(normal_array))

### Median
print(np.median(normal_array))

### Sd
print(np.std(normal_array))
4.099350157876355
6.002327879593869
5.196526332130161
5.338695217273813
0.5550161930871431
stats = np.array([[1,2,3],[4,5,6]])
stats
array([[1, 2, 3],
       [4, 5, 6]])
np.min(stats)
1
np.max(stats, axis=1)
array([3, 6])
np.sum(stats, axis=0)
array([5, 7, 9])

Miscellaneous#

Load Data from File#

you can download the “data.txt” from here

filedata = np.genfromtxt('data.txt', delimiter=',')
filedata = filedata.astype('int32') # you can also change type to 'int64'
print(filedata)
[[  1  13  21  11 196  75   4   3  34   6   7   8   0   1   2   3   4   5]
 [  3  42  12  33 766  75   4  55   6   4   3   4   5   6   7   0  11  12]
 [  1  22  33  11 999  11   2   1  78   0   1   2   9   8   7   1  76  88]]

Boolean Masking and Advanced Indexing#

filedata >50
array([[False, False, False, False,  True,  True, False, False, False,
        False, False, False, False, False, False, False, False, False],
       [False, False, False, False,  True,  True, False,  True, False,
        False, False, False, False, False, False, False, False, False],
       [False, False, False, False,  True, False, False, False,  True,
        False, False, False, False, False, False, False,  True,  True]])
print(filedata)
filedata[filedata >50] # '[]' will display the value of data point from the dataset
[[  1  13  21  11 196  75   4   3  34   6   7   8   0   1   2   3   4   5]
 [  3  42  12  33 766  75   4  55   6   4   3   4   5   6   7   0  11  12]
 [  1  22  33  11 999  11   2   1  78   0   1   2   9   8   7   1  76  88]]
array([196,  75, 766,  75,  55, 999,  78,  76,  88])
print(filedata)
np.any(filedata > 50, axis = 0) # axis=0 refers to columns and axis=1 refers to rows in this dataset
[[  1  13  21  11 196  75   4   3  34   6   7   8   0   1   2   3   4   5]
 [  3  42  12  33 766  75   4  55   6   4   3   4   5   6   7   0  11  12]
 [  1  22  33  11 999  11   2   1  78   0   1   2   9   8   7   1  76  88]]
array([False, False, False, False,  True,  True, False,  True,  True,
       False, False, False, False, False, False, False,  True,  True])
print(filedata)
np.all(filedata > 50, axis = 0) # '.all' refers to all the data points in row/column (based on axis=0 or axis=1).
[[  1  13  21  11 196  75   4   3  34   6   7   8   0   1   2   3   4   5]
 [  3  42  12  33 766  75   4  55   6   4   3   4   5   6   7   0  11  12]
 [  1  22  33  11 999  11   2   1  78   0   1   2   9   8   7   1  76  88]]
array([False, False, False, False,  True, False, False, False, False,
       False, False, False, False, False, False, False, False, False])
print(filedata)
(((filedata > 50) & (filedata < 100)))
[[  1  13  21  11 196  75   4   3  34   6   7   8   0   1   2   3   4   5]
 [  3  42  12  33 766  75   4  55   6   4   3   4   5   6   7   0  11  12]
 [  1  22  33  11 999  11   2   1  78   0   1   2   9   8   7   1  76  88]]
array([[False, False, False, False, False,  True, False, False, False,
        False, False, False, False, False, False, False, False, False],
       [False, False, False, False, False,  True, False,  True, False,
        False, False, False, False, False, False, False, False, False],
       [False, False, False, False, False, False, False, False,  True,
        False, False, False, False, False, False, False,  True,  True]])
print(filedata)
(~((filedata > 50) & (filedata < 100)))  # '~' means not
[[  1  13  21  11 196  75   4   3  34   6   7   8   0   1   2   3   4   5]
 [  3  42  12  33 766  75   4  55   6   4   3   4   5   6   7   0  11  12]
 [  1  22  33  11 999  11   2   1  78   0   1   2   9   8   7   1  76  88]]
array([[ True,  True,  True,  True,  True, False,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True, False,  True, False,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True, False,
         True,  True,  True,  True,  True,  True,  True, False, False]])
### You can index with a list in NumPy
a = np.array([1,2,3,4,5,6,7,8,9])
a [[1,2,8]] #indexes
array([2, 3, 9])

Numpy Documentation#

This brief overview has touched on many of the important things that you need to know about numpy, but is far from complete. Check out the numpy reference to find out much more about numpy.