Simple pandas / numpy 'indexing in vectorized calculations

Question

Simple pandas / numpy 'indexing in vectorized calculations

Sorry for the main question. I'm sure the answer is pretty simple, but I hit my head against the wall a bit, trying to figure it out. I am new to python but understand the concept of vectorized computing. For example, in the following (rather trivial) code snippet:

import pandas as pd

ndx = ['a', 'b', 'c', 'd', 'e', 'f']
first = [3, 7, 2, 5, 9, 4]
second = [8, 9, 7, 3, 3, 7]

first = pd.DataFrame(first, index = ndx)
second = pd.DataFrame(second, index = ndx)

I know that first> second will return a Boolean array, True, where each element is greater than the corresponding element in b, corresponding to the indices. I understand that this tight index mapping is one of the benefits of using pandas, but ...

Question: how can I effectively refer to "offset" indexes in a vector operation? For example, what if I want to compare the next value in b with the current value in (first ['a']> second ['b'], first ['b']> second ['c'], ...)? Along the same lines, what if I want to return True only if the first ['a'] is larger than the second ['a] and second [' b ']?

I wrote code that does things like this, iterating through an array by index. Here is an example:

        if next.at[curr.index[i], 'OI'] > curr.OI[i] and \
        next.at[curr.index[i+1], 'OI'] > curr.OI[i+1] and \
        next.at[curr.index[i], 'Vol'] > curr.Vol[i] and \
        next.at[curr.index[i+1], 'Vol'] > curr.Vol[i+1]:

(next and curr are DataFrames, OI and Vol are the columns in these data frames, and I am my counter.) I know that this is not pythonic, but also too slow (which ... hmm ... maybe why it not pythonic? lol)

Thanks in advance.

Summary: The general question is how to refer to offset elements in pandas (and numpy).

EDIT: Jaime TomAugspurger np pd . ... .

: pandas ? , , , , . , , ( ) - . pandas shift() :

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-35914edbe0ff> in <module>()
----> 1 aa = q['OI'] > r['OI']

C:\Python27\lib\site-packages\pandas\core\ops.pyc in wrapper(self, other)
    540             name = _maybe_match_name(self, other)
    541             if len(self) != len(other):
--> 542                 raise ValueError('Series lengths must match to compare')
    543             return self._constructor(na_op(self.values, other.values),
    544                                      index=self.index, name=name)

ValueError: Series lengths must match to compare

, , , . ( ( ), .) ? .

+3

python arrays numpy pandas

user3241893 04 . '14 23:51

1

Jaime · Answer 1 · 2014-02-05T00:44:34+0000

pandas, numpy , . " " - :

>>> first = np.array([3, 7, 2, 5, 9, 4])
>>> second = np.array([8, 9, 7, 3, 3, 7])
>>> first[:-1] > second[1:]
array([False, False, False,  True,  True], dtype=bool)

, , first second -.

Simple pandas / numpy 'indexing in vectorized calculations

More articles: