Opened 3 years ago

Closed 22 months ago

Last modified 21 months ago

#626 closed enhancement (fixed)

intbitset: harmonise pop() behaviour

Reported by: simko Owned by: Samuele Kaplun <samuele.kaplun@…>
Priority: minor Milestone:
Component: MiscUtil Version:
Keywords: Cc:

Description

As mentioned in ticket:621, we may want to improve a little bit the
description and/or behaviour of intbitset's pop(). Notably, people
may be using search engine's API functions returning intbitsets that
look like lists. Here, pop() has ordered meaning for lists, while
for sets it pops any random element:

In [2]: from invenio.intbitset import intbitset

In [3]: l = [1, 10, 2, 20]

In [4]: s = set(l)

In [5]: i = intbitset(l)

In [6]: l, s, i
Out[6]: ([1, 10, 2, 20], set([1, 10, 20, 2]), intbitset([1, 2, 10, 20]))

In [7]: xl, xs, xi = l.pop(), s.pop(), i.pop()

In [8]: l, s, i
Out[8]: ([1, 10, 2], set([10, 20, 2]), intbitset([2, 10, 20]))

In [9]: xl, xs, xi
Out[9]: (20, 1, 1)

The difference between lists and intbitsets is strictly taken OK,
because intbitsets emulate the API of sets, so pop() removes an
arbitrary set element. However, behind the scenes intbitset's pop()
calls intBitSetGetNext() that does an ordered removal, not an
"arbitrary" removal; so we can document this better for end users.

More to the point, intbitset has a native notion of element order,
being a set of increasing integers; it does resemble ordered lists of
integers in this respect. intbitset can be considered as a kind of
ordered set of increasing integers that emulates set API, so having
some facets of lists and some facets of sets, as it were.

Therefore we may want to improve the docstring of intbitset's pop()
in order to reflect this mixed nature of intbitsets: (i) at least by
documenting the non-arbitrary parts, but (ii) more to the point, we
may want to alter perhaps the meaning of what pop() returns, so that
intbitset would resemble lists more (i.e. returning last, not first
element). It will still be an "arbitrary" removal from the set API
point of view, but it will resemble more to what people may be used to
from the list API point of view, if they think of intbitsets as of
lists of increasing integers.

P.S. Not thinking here about list-specific calls like pop(n).

Change History (7)

comment:1 Changed 22 months ago by skaplun

I agree with you on the non arbitrary implementation of pop() in intbitset. However, for performance reasons, it might be nicer to still return the smaller bit, because in order to find the bit to return, the implementation still has to scroll the whole bitset.

  • If it has to return the smallest bit it can stop the scrolling as soon as possible
  • If it has to return the largest bit it has to stop as late as possible, hence with a performance impact.

So what if we fully document this behavior? Alternatively one can imagine to add a flag to the .pop() function saying:

def pop(last=False):
    pass

In this way the faster implementation is used by default, but one can still use the slower and more stack-friendly one.

comment:2 Changed 22 months ago by Samuele Kaplun <samuele.kaplun@…>

  • Owner set to Samuele Kaplun <samuele.kaplun@…>
  • Resolution set to fixed
  • Status changed from new to closed

In [6ad995694e33b9d5d83fe5b09bf988f85050e2b3]:

intbitset: pop last element

  • When pop is called on a list, the last element is returned. When it is called on a set, an arbitrary element is returned. So far intbitset was returning the smallest element. In order to expose a behaviour more similar to the expectation of an ordered list, intbitset now returns the largest element. (closes #626)

comment:3 Changed 21 months ago by Nikola Yolov <nikola.yolov@…>

In 6511a023ef60bf53a289e13257e8053e939604c6:

intbitset: fixes pop() function

  • Fixes pop() function. (addresses #626)

comment:4 Changed 21 months ago by Samuele Kaplun <samuele.kaplun@…>

In 6ad995694e33b9d5d83fe5b09bf988f85050e2b3:

intbitset: pop last element

  • When pop is called on a list, the last element is returned. When it is called on a set, an arbitrary element is returned. So far intbitset was returning the smallest element. In order to expose a behaviour more similar to the expectation of an ordered list, intbitset now returns the largest element. (closes #626)

comment:5 Changed 21 months ago by Nikola Yolov <nikola.yolov@…>

In 6511a023ef60bf53a289e13257e8053e939604c6:

intbitset: fixes pop() function

  • Fixes pop() function. (addresses #626)

comment:6 Changed 21 months ago by Samuele Kaplun <samuele.kaplun@…>

In 6ad995694e33b9d5d83fe5b09bf988f85050e2b3:

intbitset: pop last element

  • When pop is called on a list, the last element is returned. When it is called on a set, an arbitrary element is returned. So far intbitset was returning the smallest element. In order to expose a behaviour more similar to the expectation of an ordered list, intbitset now returns the largest element. (closes #626)

comment:7 Changed 21 months ago by Nikola Yolov <nikola.yolov@…>

In 6511a023ef60bf53a289e13257e8053e939604c6:

intbitset: fixes pop() function

  • Fixes pop() function. (addresses #626)
Note: See TracTickets for help on using tickets.