Learn NumPy in One Article
If you're using Python to do Machine Learning, Deep Learning, engineering, or anything else that has to do with math, then you need to start by learning NumPy. It is a MUST!
Why? Because NumPy is THE package that allows you to create matrices and do mathematics in an ultra-efficient way (NumPy was developed in C)! If you are an engineer, a scientist or a mathematician, you probably know it: matrices are the basis of everything.
What's on the program:
- NDarray : The NumPy array in N-Dimensions:
- NDArray: ndim, shape, size
- generators: np.ones(), np.zeros(), np.random.randn(), ...
- methods: ravel(), reshape(), concatenate(), ...
- Indexing, Slicing in NumPy
- Boolean indexing
- Image processing example
- Statistics and Linear Algebra with NumPy
- argmax(), argsort(), sum()
- Statistics + np.unique(), np.corrcoef()
- Linear algebra
- Broadcasting in NumPy
- Warning! Machine learning program
- Review
1. NumPy ndarray: The N-Dimensional Array
At the base of NumPy there is a very powerful object: The N-Dimensional array (ndarray).
If I say that it is a powerful object, it is because it allows to perform a lot of advanced mathematical actions, it allows to contain an infinity of data, and is very fast in execution.
NumPy Array Generators
1 2 3 4 5 6 | A = np.zeros((2, 3)) # an array full of 0 of 2x3 dimensionsB = np.ones((2, 3)) # an array full of 1 of 2x3 dimensionsC = np.random.randn(2, 3) # a random array (normal distribution) of 2x3 dimensionsD = np.random.rand(2, 3) # a random array (uniform distribution)E = np.random.randint(0, 10, [2, 3]) #an array of random ints (0 to 10) of 2x3 dimension |
It is also possible to choose the type of data we want to use for our table using the dtype parameter. This can be very important for powerful and efficient codes.
1 2 | A = np.ones((2, 3), dtype=np.float16)B = np.eye(4, dtype=np.bool) # creates an identity matrix and converts the elements into bool type. |
The N-Dimensional array class (ndarray) offers several attributes and methods. Here are the most useful ones, which you should absolutely know!
Important Attributes of ndarray
1 2 3 4 5 6 7 8 | A = np.zeros((2, 3)) # creates an array of shape (2, 3)print(A.size) # the number of elements in the array Aprint(A.shape) # the dimensions of array A (in Tuple form)print(type(A.shape)) #here is the proof that the shape is a tupleprint(A.shape[0]) # the number of elements in 1st dimension of A |
Important Methods of ndarray
1 2 3 4 5 | A = np.zeros((2, 3)) # creates an array of shape (2, 3)A = A.reshape((3, 2)) # reshapes the array A (3 rows, 2 columns)A.ravel() # flattens the array A (one dimension only)A.squeeze() # eliminates the "1" dimensions of A. |
2. Indexing and Slicing in a NumPy Array
When working on a NumPy array (usually 2D), it is important to be able to navigate easily in order to manipulate the data. To do this, we move on one axis only, which leads us to change the position only according to the same axis.
Indexing
As for lists, we choose to access a particular element of the array by indicating an index for this element.
1 2 3 | A = np.array([[[1, 2, 3], [4, 5, 6]])print(A[0, 1]) # row 0, column 1 |
Slicing
In the case of slicing, we choose instead to access several elements of the same axis of the array. We often talk about subset. We must therefore indicate a start and end index for each dimension of our array
1 2 3 | A = np.array([[[1, 2, 3], [4, 5, 6]])print(A[0:2, 0:2]) # row 0 and 1, column 0 and 1 |
On the web, it is common to see "implicit" Slicing operations, in which only 1 index is given. This extracts the whole row corresponding to this index (see the example below). For better clarity, I recommend to always use explicit syntax in your codes.
1 2 3 4 5 | A = np.array([[[1, 2, 3], [4, 5, 6]])print(A[1,:] # prints the whole row 1, explicitlyprint(A[1]) # prints the whole row 1, but this syntax is not ideal |
Now, let's take a look at a very common technique used in Data Science: Boolean Indexing.
Boolean Indexing
1 2 3 4 5 6 7 8 | A = np.array([[[1, 2, 3], [4, 5, 6]])print(A<5) # boolean maskprint(A[A < 5] # a subset filtered by the boolean maskA[A<5] = 4 # converts the selected values.print(A) |
3. Mathematics Using NumPy
Basic Mathematics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | A = np.array([[[1, 2, 3], [4, 5, 6]])print(A.sum()) # sums all the elements of the arrayprint(A.sum(axis=0) # performs the sum of the columns (sum over elements of the rows)print(A.sum(axis=1) # performs the sum of the rows (sum over the elements of the columns)print(A.cumsum(axis=0) # performs the cumulative sumprint(A.prod() # performs the productprint(A.cumprod() # performs the cumulative productprint(A.min() # finds the minimum of the arrayprint(A.max()) # finds the maximum of the arrayprint(A.mean()) # calculates the mean (average)print(A.std()) # calculates the standard deviationprint(A.var()) # calculates the variance |
NumPy argsort()
1 2 3 4 5 6 7 8 | A = np.random.randint(0, 10, [5, 5]) # random arrayprint(A)print(A.argsort()) #returns the indexes to sort each row of the array print(A[:,0].argsort()) #returns the indexes to sort column 0 of AA = A[A[:,0].argsort(), :] # sorts the array columns by column |
NumPy Statistics
1 2 3 4 5 6 7 8 9 10 | B = np.random.randn(3, 3) # 3x3 random numbers# returns the correlation matrix of Bprint(np.corrcoef(B))# returns the correlation matrix between rows 0 and 1 of Bprint(np.corrcoef(B[:,0], B[:, 1]))# selects the correlation between row 0 and row 1print(np.corrcoef(B[:,0], B[:, 1][0,1])) |
Statistics with NaN
Linear Algebra using NumPy
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | A = np.ones((2,3))B = np.ones((3,3))print('transpose', A.T) # transpose of the matrix Aprint('A.B product', A.dot(B)) # matrix product A.BA = np.random.randint(0, 10, [3, 3])print('det', np.linalg.det(A)) # calculates the determinant of Aprint('A inv', np.linalg.inv(A)) # calculates the inverse of Aval, vec = np.linalg.eig(A)print('eigenvalue', val) # eigenvalueprint('eigenvector', vec) # eigenvector |
4. Broadcasting Using NumPy
Numpy Broadcasting: Rules and Examples
1 2 3 4 5 6 7 8 9 10 11 12 | A = np.ones((2, 3))B = 3print(A+B) # result 1A = np.ones((2, 3))B = np.ones((2, 1)) # B has one column, it will be spread over the three columns of Aprint(A+B) # result 2A = np.ones((2, 3))B = np.ones((2, 2))print(A + B) # ERROR ! |

Post a Comment