Most frequent in the range

Question

Most frequent in the range

I have a string of slength n. What is the most efficient data structure / algorithm for finding the most frequent character in a range i..j?

The string does not change with time, I just need to repeat the request, that request is the most frequent among the char s[i], s[i + 1]..., s[j].

+5

c ++ complexity-theory character frequency mode

Momonga Jan 24 '13 at 13:50

source share

8 answers

Luchian grigore · Answer 1 · 2013-01-24T13:53:01+0000

An array in which you hold the number of occurrences of each character. You increase the corresponding value when repeating a row. In this case, you can remember the current max in the array; alternatively, look for the highest value in the array at the end.

Pseudo code

arr = [0]
for ( char in string )
   arr[char]++
mostFrequent = highest(arr)

ssegvic · Answer 2 · 2013-01-24T13:57:55+0000

, . , j + 1 i, s [i], s [i + 1],..., s [j].

Python. , - , , 256 .

def buildIntegralDistributions(s):
    IDs=[]        # integral distribution
    D=[0]*256
    IDs.append(D[:])
    for x in s:
        D[ord(x)]+=1
        IDs.append(D[:])
    return IDs

def getIntervalDistribution(IDs, i,j):
    D=[0]*256        
    for k in range(256):
        D[k]=IDs[j][k]-IDs[i][k]
    return D

s='abababbbb'
IDs=buildIntegralDistributions(s)
Dij=getIntervalDistribution(IDs, 2,4)

>>> s[2:4]
'ab'
>>> Dij[ord('a')]  # how many 'a'-s in s[2:4]?
1
>>> Dij[ord('b')]  # how many 'b'-s in s[2:4]?
1

Ivaylo Strandjev · Answer 3 · 2013-01-24T14:01:16+0000

, , . - :

"abcdabc"

0:

count['a'] = 1
count['b'] = 0
etc...

1:

....
count['a'] = 1
count['b'] = 1
count['c'] = 0
etc...

2:

....
count['a'] = 1
count['b'] = 1
count['c'] = 1
....

. 6:

....
count['a'] = 2
count['b'] = 2
count['c'] = 2
count['d'] = 1
... all others are 0

(i, j) - count[j] - count[i-1] ( i = 0!).

, , , 10 ^ 6 128 ( , ASCII).

- , .

TemplateRex · Answer 4 · 2013-01-24T16:00:43+0000

.

O(1), (, , ), O(N log N) .

O(N), @Luchian Grigore, O(N) (, O(K) K -).

Arpit · Answer 5 · 2013-01-24T14:07:34+0000

string="something"
arrCount[string.length()];

freq()

freq(char accessedChar){
arrCount[string.indexOf(x)]+=1
}

char string.charAt(arrCount.max())

Neel Basu · Answer 6 · 2013-01-24T14:08:51+0000

, , i j .

,

struct occurences{
    char c;
    std::list<int> positions;
};

std::list<occurences> . positions.

, i.. j

Stuart Olsen · Answer 7 · 2013-01-24T17:32:51+0000

, , . , , , undefined. , , 0x00-0x7F, , UTF-8, :

char frequncies [256] = {};
frequencies ['á'] = 9; // Oops. If our implementation represents char using a
                       // signed eight-bit integer, we just referenced memory
                       // outside of our array bounds!

, , :

template <typename charT>
charT most_frequent (const basic_string <charT>& str)
{
    constexpr auto charT_max = numeric_limits <charT>::max ();
    constexpr auto charT_min = numeric_limits <charT>::lowest ();
    size_t frequencies [charT_max - charT_min + 1] = {};

    for (auto c : str)
        ++frequencies [c - charT_min];

    charT most_frequent;
    size_t count = 0;
    for (charT c = charT_min; c < charT_max; ++c)
        if (frequencies [c - charT_min] > count)
        {
            most_frequent = c;
            count = frequencies [c - charT_min];
        }

    // We have to check charT_max outside of the loop,
    // as otherwise it will probably never terminate
    if (frequencies [charT_max - charT_min] > count)
        return charT_max;

    return most_frequent;
}

, ( construct_array), std::array <size_t, numeric_limits <charT>::max () - numeric_limits <charT>::lowest () + 1>. max for , . std::map <std::string, std::array <...>> . :

char most_frequent (string s)
{
    static map <string, array <...>> cache;

    if (cache.count (s) == 0)
        map [s] = construct_array (s);
    // find the most frequent character, as above, replacing `frequencies`
    // with map [s], then return it
}

. , . , - , , - , ; , , , .

Jonathan mee · Answer 8 · 2018-12-06T16:06:34+0000

unordered_map :

pair<char, int> fast(const string& s) {
    unordered_map<char, int> result;

    for(const auto i : s) ++result[i];
    return *max_element(cbegin(result), cend(result), [](const auto& lhs, const auto& rhs) { return lhs.second < rhs.second; });
}

, , , , find_first_not_of :

pair<char, int> light(string& s) {
    pair<char, int> result;
    int start = 0;

    sort(begin(s), end(s));
    for(auto finish = s.find_first_not_of(s.front()); finish != string::npos; start = finish, finish = s.find_first_not_of(s[start], start)) if(const int second = finish - start; second > result.second) result = make_pair(s[start], second);
    if(const int second = size(s) - start; second > result.second) result = make_pair(s[start], second);
    return result;
}

It should be noted that both of these functions have a non-empty string precondition. In addition, if there is a relationship for most characters in a string, both functions return the character that is initially lexenographic as having the highest value.

Live Example

Most frequent in the range

More articles: