The numpy.cov()
method estimates the covariance matrix, given data and weights.
Example
import numpy as np
# create an array
array1 = np.array([[0, 3, 7],
[1, 4, 6],
[2, 5, 8]])
# calculate the covariance of the array
covariance = np.cov(array1)
print(covariance)
'''
Output:
[[12.33333333 8.66666667 10.5 ]
[ 8.66666667 6.33333333 7.5 ]
[10.5 7.5 9. ]]
'''
cov() Syntax
The syntax of the numpy.cov()
method is:
numpy.cov(array, y = None, rowvar = True, bias = False, ddof = None, fweights = None, aweights = None, dtype = None)
cov() Arguments
The numpy.cov()
method takes the following arguments:
array
- array containing numbers whose covariance is desired (can bearray_like
)y
(optional) - an additional set of variables and observations (array_like
)rowvar
(optional) - IfTrue
, each row represents a variable, otherwise, each column represents a variablebias
(optional) - normalizes the array ifTrue
ddof
(optional) - specifies whether to preserve the shape of the original array (bool
)fweights
(optional) - integer frequency weights; the number of times each observation vector is repeated (array of int
)aweights
(optional) - observation vector weights (array of int
)dtype
(optional) - data type of the result
cov() Return Value
The numpy.cov()
method returns a covariance matrix.
Covariance
Covariance is a statistical measure that describes the relationship between two random variables. It measures how changes in one variable are associated with changes in another variable.
Positive covariance means the variables tend to increase or decrease together, while negative covariance means they move in opposite directions.
A covariance of zero implies no linear relationship.
Example 1: Find the Covariance of an ndArray
import numpy as np
# create arrays
array1 = np.array([[0, 1, 2],
[0, 1, 2]])
array2 = np.array([[0, 1, 2],
[2, 1, 0]])
# calculate the covariance of the arrays
covariance1 = np.cov(array1)
covariance2 = np.cov(array2)
print(covariance1 , '\n')
print(covariance2 )
Output
[[1. 1.] [1. 1.]] [[ 1. -1.] [-1. 1.]]
Here, array1 correlates perfectly and array2 also does the same but in opposite directions.
Example 2: Specifying the Data Type of the Covariance Matrix
The dtype
parameter can be used to control the data type of the covariance matrix.
import numpy as np
# create an array
array1 = np.array([[0, 3, 7],
[1, 4, 6],
[2, 5, 8]])
# calculate the covariance of the array
covariance1 = np.cov(array1)
# calculate the covariance of the array as float16
covariance2 = np.cov(array1, dtype = np.float16)
print(covariance1 ,'\n')
print(covariance2)
Output
[[12.33333333 8.66666667 10.5 ] [ 8.66666667 6.33333333 7.5 ] [10.5 7.5 9. ]] [[12.336 8.664 10.5 ] [ 8.664 6.332 7.5 ] [10.5 7.5 9. ]]
Note: Using a lower precision dtype
, such as float16
, can lead to a loss of accuracy.
Example 3: Using Optional rowvar Argument
If rowvar
is set to True
(default), each row represents a variable, with observations in the columns.
If rowvar
is set to False
, the relationship is transposed: each column represents a variable, while the rows contain observations.
import numpy as np
# create an array
array1 = np.array([[0, 3, 7],
[1, 4, 6],
[2, 5, 8]])
# calculate the covariance of the array
covariance1 = np.cov(array1)
# calculate the covariance with columns as variables
covariance2 = np.cov(array1, rowvar = False)
print('With rows as variables\n', covariance1 ,'\n')
print('With columns as variables\n', covariance2)
Output
With rows as variables [[12.33333333 8.66666667 10.5 ] [ 8.66666667 6.33333333 7.5 ] [10.5 7.5 9. ]] With columns as variables [[1. 1. 0.5] [1. 1. 0.5] [0.5 0.5 1. ]]
Example 4: Create a Normalized Covariance Matrix
The optional argument bias
specifies whether to normalize the covariance matrix and the argument ddof
specifies the delta degrees of freedom.
import numpy as np
# create an array
array1 = np.array([[0, 3, 7],
[1, 4, 6],
[2, 5, 8]])
# calculate the covariance of the array
covariance1 = np.cov(array1)
# normalize the covariance matrix
covariance2 = np.cov(array1, bias = True)
# normalize the covariance matrix with ddof = 2
covariance3 = np.cov(array1, bias = True, ddof = 2)
print('Unnormalized Covariance Matrix\n', covariance1, '\n')
print('Normalized Covariance Matrix\n', covariance2, '\n')
print('Normalized Covariance Matrix With ddof = 2\n', covariance3, '\n')
Output
Unnormalized Covariance Matrix [[12.33333333 8.66666667 10.5 ] [ 8.66666667 6.33333333 7.5 ] [10.5 7.5 9. ]] Normalized Covariance Matrix [[8.22222222 5.77777778 7. ] [5.77777778 4.22222222 5. ] [7. 5. 6. ]] Normalized Covariance Matrix With ddof = 2 [[24.66666667 17.33333333 21. ] [17.33333333 12.66666667 15. ] [21. 15. 18. ]]
Note: ddof = 0
is the default value and ddof = 1
returns an unnormalized matrix.
Example 5: Using Weights
The aweight
and fweight
parameters allow us to specify weights for covariance estimate.
import numpy as np
# create an array
array1 = np.array([[0, 3, 7],
[1, 4, 6],
[2, 5, 8]])
# specify weights
a = np.array([3, 1, 2])
f = np.array([2, 1, 3])
# calculate the covariance of the array
covariance1 = np.cov(array1)
# calculate the covariance of the array with aweights provided
covariance2 = np.cov(array1, aweights = a)
# calculate the covariance of the array with fweights provided
covariance3 = np.cov(array1, fweights = f)
# calculate the covariance of the array with both aweights and fweights
covariance4 = np.cov(array1, aweights = a, fweights = f)
print('Unweighted Covariance Matrix\n', covariance1, '\n')
print('Covariance Matrix with Observation Vector weight\n', covariance2, '\n')
print('Covariance Matrix with frequency weight\n', covariance3, '\n')
print('Covariance Matrix with both weights\n', covariance4, '\n')
Output
Unweighted Covariance Matrix [[12.33333333 8.66666667 10.5 ] [ 8.66666667 6.33333333 7.5 ] [10.5 7.5 9. ]] Covariance Matrix with Observation Vector weight [[16.04545455 11.5 13.77272727] [11.5 8.40909091 9.95454545] [13.77272727 9.95454545 11.86363636]] Covariance Matrix with frequency weight [[12. 8.4 10.2] [ 8.4 6. 7.2] [10.2 7.2 8.7]] Covariance Matrix with both weights [[13.86956522 9.86956522 11.86956522] [ 9.86956522 7.08695652 8.47826087] [11.86956522 8.47826087 10.17391304]]
Here,
aweights
represent the observation vector weight i.e. it quantifies the importance of an observation in the correlation.
fweights
represent the frequency weight i.e. it represents the number of times the observation was repeated.