Why does np.median () return multiple rows?

I have a numpy array that contains 100 rows and 16026 columns. I have to find the median of each column. Thus, the median for each column will be calculated from 100 observations (in this case, 100 rows). For this, I use the following code:

for category in categories:
    indices = np.random.randint(0, len(os.listdir(filepath + category)) - 1, 100)
    tempArray = X_train[indices, ]
    medArray = np.median(tempArray, axis=0)
    print(medArray.shape)

And here is the result that I get:

(100, 16026)
(100, 16026)
(100, 16026)
(100, 16026)

My question is: why is the form medArray100 * 16026 and not 1 * 16026? Since I calculate the median of each column, I expect only one row with 16026 columns. What am I missing here?

Note that this X_trainis a sparse matrix.

X_train.shape

output:

(2034, 16026)

Any help in this regard is greatly appreciated.

Edit:

The above problem has been resolved using the function toarray().

tempArray = X_train[indices, ].toarray()

, , 0 . / ?

+3
3

- . :

 sample = [] 
    sample_size = 50
    idx = matplotlib.mlab.find(newsgroups_train.target==i)
    random_index = []
    for j in range(sample_size):
        random_index.append(randrange(0,len(idx)-1)) 

y = np.ma.masked_where(X_train[sample[0]].toarray() == 0, X_train[sample[0]].toarray())
    medArray = np.ma.median(y, axis=0).filled(0)
    print '============median ' + newsgroups_train.target_names[i] + '============='
    for k,word in enumerate(np.array(vectorizer.get_feature_names())[np.argsort(medArray)[::-1][0:10]]):
        print word + ':' + str(np.sort(medArray)[::-1][k])

.

+1

, , (16026,), - :

In [241]:

X_train=np.random.random((1000,16026)) #1000 can be any int.
indices = np.random.randint(0, 60, 100) #60 can be any int.
tempArray = X_train[indices, ]
medArray = np.median(tempArray, axis=0)
print(medArray.shape)

(16026,)

2d array:

In [243]:

X_train=np.random.random((100,2,16026))
indices = np.random.randint(0, 60, 100)
tempArray = X_train[indices, ]
medArray = np.median(tempArray, axis=0)
print(medArray.shape)


(2, 16026)

3d array.

sparse array, :

In [319]:

X_train = sparse.rand(112, 16026, 0.5, 'csr') #just make up a random sparse array
indices = np.random.randint(0, 60, 100)
tempArray = X_train[indices, ]
medArray = np.median(tempArray.toarray(), axis=0)
print(medArray.shape)
(16026,)

.toarray() . , 0 @zhangxaochen.

.

+1

The problem is that NumPy does not recognize sparse matrices as arrays or massive objects. For example, a call asanyarrayon a sparse matrix returns an array 0D, one element of which is the original sparse matrix:

In [8]: numpy.asanyarray(scipy.sparse.csc_matrix([[1,2,3],[4,5,6]]))
Out[8]:
array(<2x3 sparse matrix of type '<type 'numpy.int64'>'
        with 6 stored elements in Compressed Sparse Column format>, dtype=object)

Like most NumPy, it numpy.medianrelies on having an array or object similar to an array. The routines on which he relies, especially sort, will not understand what they are looking at if you give him a sparse matrix.

+1
source

All Articles