I have a numpy array that contains 100 rows and 16026 columns. I have to find the median of each column. Thus, the median for each column will be calculated from 100 observations (in this case, 100 rows). For this, I use the following code:
for category in categories:
indices = np.random.randint(0, len(os.listdir(filepath + category)) - 1, 100)
tempArray = X_train[indices, ]
medArray = np.median(tempArray, axis=0)
print(medArray.shape)
And here is the result that I get:
(100, 16026)
(100, 16026)
(100, 16026)
(100, 16026)
My question is: why is the form medArray100 * 16026 and not 1 * 16026? Since I calculate the median of each column, I expect only one row with 16026 columns. What am I missing here?
Note that this X_trainis a sparse matrix.
X_train.shape
output:
(2034, 16026)
Any help in this regard is greatly appreciated.
Edit:
The above problem has been resolved using the function toarray().
tempArray = X_train[indices, ].toarray()
, , 0 . / ?