Save This Page
Home » mahout-collections-1.0-src » org.apache.mahout.math.bitvector » [javadoc | source]
    1   /*
    2   Copyright 1999 CERN - European Organization for Nuclear Research.
    3   Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose 
    4   is hereby granted without fee, provided that the above copyright notice appear in all copies and 
    5   that both that copyright notice and this permission notice appear in supporting documentation. 
    6   CERN makes no representations about the suitability of this software for any purpose. 
    7   It is provided "as is" without expressed or implied warranty.
    8   */
    9   package org.apache.mahout.math.bitvector;
   10   
   11   import org.apache.mahout.math.PersistentObject;
   12   import org.apache.mahout.math.function.IntProcedure;
   13   
   14   /**
   15    * Fixed sized (non resizable) bitvector.
   16    * Upon instance construction a bitvector is told to hold a fixed number of bits - it's size.
   17    * The size can be any number (need not be a power of 2 or so).
   18    * The bits of a <tt>BitVector</tt> are indexed by nonnegative integers. 
   19    * Any attempt to access a bit at an <tt>index&lt;0 || index&gt;=size()</tt> will throw an <tt>IndexOutOfBoundsException</tt>.
   20    * <p>
   21    * Individual indexed bits can be examined, set, or cleared.
   22    * Subranges can quickly be extracted, copied and replaced.
   23    * Quick iteration over subranges is provided by optimized internal iterators (<tt>forEach()</tt> methods).
   24    * One <code>BitVector</code> may be used to modify the contents of another 
   25    * <code>BitVector</code> through logical AND, OR, XOR and other similar operations.
   26    * <p>
   27    * All operations consider the bits <tt>0..size()-1</tt> and nothing else.
   28    * Operations involving two bitvectors (like AND, OR, XOR, etc.) will throw an <tt>IllegalArgumentException</tt> if the secondary bit vector has a size smaller than the receiver.
   29    * <p>
   30    * A <tt>BitVector</tt> is never automatically resized,
   31    * but it can manually be grown or shrinked via <tt>setSize(...)</tt>.
   32    * <p>
   33    * For use cases that need to store several bits per information entity, quick accessors are provided that interpret subranges as 64 bit <tt>long</tt> integers.
   34    * <p>
   35    * Why this class? Fist, <tt>boolean[]</tt> take one byte per stored bit. This class takes one bit per stored bit.
   36    * Second, many applications find the semantics of <tt>java.util.BitSet</tt> not particularly helpful for their needs.
   37    * Third, operations working on all bits of a bitvector are extremely quick.
   38    * For example, on NT, Pentium Pro 200 Mhz, SunJDK1.2.2, java -classic, for two bitvectors A,B (both much larger than processor cache), the following results are obtained.
   39    * <ul>
   40    * <li><tt>A.and(B)</tt> i.e. A = A & B --> runs at about 35 MB/sec
   41    * <li><tt>A.cardinality()</tt>, i.e. determining the selectivity, the number of bits in state "true" --> runs at about 80 MB/sec
   42    * <li>Similar performance for <tt>or, xor, andNot, not, copy, replace, partFromTo, indexOf, clear</tt> etc.
   43    * </ul>
   44    * If you need extremely quick access to individual bits: Although getting and setting individual bits with methods <tt>get(...)</tt>, <tt>set(...)</tt> and <tt>put(...)</tt>is quick, it is even quicker (<b>but not safe</b>) to use <tt>getQuick(...)</tt> and <tt>putQuick(...)</tt> or even <tt>QuickBitVector</tt>.
   45    * <p>
   46    * <b>Note</b> that this implementation is not synchronized.
   47    *
   48    * @see     QuickBitVector
   49    * @see     BitMatrix
   50    * @see     java.util.BitSet
   51    */
   52   
   53   /** @deprecated until unit tests are in place.  Until this time, this class/interface is unsupported. */
   54   @Deprecated
   55   public class BitVector extends PersistentObject {
   56     /*
   57      * Bits are packed into arrays of "units."  Currently a unit is a long,
   58      * which consists of 64 bits, requiring 6 address bits.  The choice of unit
   59      * is determined purely by performance concerns.
   60      */
   61   
   62     /**
   63      * The bits of this object.  The ith bit is stored in bits[i/64] at bit position i % 64 (where bit position 0 refers
   64      * to the least significant bit and 63 refers to the most significant bit).
   65      */
   66     private long[] bits;
   67   
   68     private int nbits; //the size
   69   
   70     // IntProcedure for method indexOfFromTo(...)
   71   
   72     private static class IndexProcedure implements IntProcedure {
   73   
   74       private int foundPos = -1;
   75   
   76       @Override
   77       public boolean apply(int index) {
   78         foundPos = index;
   79         return false;
   80       }
   81     }
   82   
   83     /**
   84      * You normally need not use this method. Use this method only if performance is critical. Constructs a bit vector
   85      * with the given backing bits and size. <b>WARNING:</b> For efficiency reasons and to keep memory usage low, <b>the
   86      * array is not copied</b>. So if subsequently you modify the specified array directly via the [] operator, be sure
   87      * you know what you're doing.
   88      *
   89      * <p>A bitvector is modelled as a long array, i.e. <tt>long[] bits</tt> holds bits of a bitvector. Each long value
   90      * holds 64 bits. The i-th bit is stored in bits[i/64] at bit position i % 64 (where bit position 0 refers to the
   91      * least significant bit and 63 refers to the most significant bit).
   92      *
   93      * @throws IllegalArgumentException if <tt>size &lt; 0 || size &gt; bits.length*64</tt>.
   94      */
   95     public BitVector(long[] bits, int size) {
   96       elements(bits, size);
   97     }
   98   
   99     /**
  100      * Constructs a bit vector that holds <tt>size</tt> bits. All bits are initially <tt>false</tt>.
  101      *
  102      * @param size the number of bits the bit vector shall have.
  103      * @throws IllegalArgumentException if <tt>size &lt; 0</tt>.
  104      */
  105     public BitVector(int size) {
  106       this(QuickBitVector.makeBitVector(size, 1), size);
  107     }
  108   
  109     /**
  110      * Performs a logical <b>AND</b> of the receiver with another bit vector (A = A & B). The receiver is modified so that
  111      * a bit in it has the value <code>true</code> if and only if it already had the value <code>true</code> and the
  112      * corresponding bit in the other bit vector argument has the value <code>true</code>.
  113      *
  114      * @param other a bit vector.
  115      * @throws IllegalArgumentException if <tt>size() &gt; other.size()</tt>.
  116      */
  117     public void and(BitVector other) {
  118       if (this == other) {
  119         return;
  120       }
  121       checkSize(other);
  122       long[] theBits = this.bits; // cached for speed.
  123       long[] otherBits = other.bits; //cached for speed.
  124       for (int i = theBits.length; --i >= 0;) {
  125         theBits[i] &= otherBits[i];
  126       }
  127     }
  128   
  129     /**
  130      * Clears all of the bits in receiver whose corresponding bit is set in the other bitvector (A = A \ B). In other
  131      * words, determines the difference (A=A\B) between two bitvectors.
  132      *
  133      * @param other a bitvector with which to mask the receiver.
  134      * @throws IllegalArgumentException if <tt>size() &gt; other.size()</tt>.
  135      */
  136     public void andNot(BitVector other) {
  137       checkSize(other);
  138       long[] theBits = this.bits; // cached for speed.
  139       long[] otherBits = other.bits; //cached for speed.
  140       for (int i = theBits.length; --i >= 0;) {
  141         theBits[i] &= ~otherBits[i];
  142       }
  143     }
  144   
  145     /**
  146      * Returns the number of bits currently in the <tt>true</tt> state. Optimized for speed. Particularly quick if the
  147      * receiver is either sparse or dense.
  148      */
  149     public int cardinality() {
  150       int cardinality = 0;
  151       int fullUnits = numberOfFullUnits();
  152       int bitsPerUnit = QuickBitVector.BITS_PER_UNIT;
  153   
  154       // determine cardinality on full units
  155       long[] theBits = bits;
  156       for (int i = fullUnits; --i >= 0;) {
  157         long val = theBits[i];
  158         if (val == -1L) { // all bits set?
  159           cardinality += bitsPerUnit;
  160         } else if (val != 0L) { // more than one bit set?
  161           for (int j = bitsPerUnit; --j >= 0;) {
  162             if ((val & (1L << j)) != 0) {
  163               cardinality++;
  164             }
  165           }
  166         }
  167       }
  168   
  169       // determine cardinality on remaining partial unit, if any.
  170       for (int j = numberOfBitsInPartialUnit(); --j >= 0;) {
  171         if ((theBits[fullUnits] & (1L << j)) != 0) {
  172           cardinality++;
  173         }
  174       }
  175   
  176       return cardinality;
  177     }
  178   
  179     /** Checks if the given range is within the contained array's bounds. */
  180     private static void checkRangeFromTo(int from, int to, int theSize) {
  181       if (from < 0 || from > to || to >= theSize) {
  182         throw new IndexOutOfBoundsException("from: " + from + ", to: " + to + ", size=" + theSize);
  183       }
  184     }
  185   
  186     /** Sanity check for operations requiring another bitvector with at least the same size. */
  187     protected void checkSize(BitVector other) {
  188       if (nbits > other.size()) {
  189         throw new IllegalArgumentException("Incompatible sizes: size=" + nbits + ", other.size()=" + other.size());
  190       }
  191     }
  192   
  193     /** Clears all bits of the receiver. */
  194     public void clear() {
  195       long[] theBits = this.bits;
  196       for (int i = theBits.length; --i >= 0;) {
  197         theBits[i] = 0L;
  198       }
  199   
  200       //new LongArrayList(bits).fillFromToWith(0,size()-1,0L);
  201     }
  202   
  203     /**
  204      * Changes the bit with index <tt>bitIndex</tt> to the "clear" (<tt>false</tt>) state.
  205      *
  206      * @param bitIndex the index of the bit to be cleared.
  207      * @throws IndexOutOfBoundsException if <tt>bitIndex&lt;0 || bitIndex&gt;=size()</tt>
  208      */
  209     public void clear(int bitIndex) {
  210       if (bitIndex < 0 || bitIndex >= nbits) {
  211         throw new IndexOutOfBoundsException(String.valueOf(bitIndex));
  212       }
  213       QuickBitVector.clear(bits, bitIndex);
  214     }
  215   
  216     /**
  217      * Cloning this <code>BitVector</code> produces a new <code>BitVector</code> that is equal to it. The clone of the bit
  218      * vector is another bit vector that has exactly the same bits set to <code>true</code> as this bit vector and the
  219      * same current size, but independent state.
  220      *
  221      * @return a deep copy of this bit vector.
  222      */
  223     @Override
  224     public Object clone() {
  225       BitVector clone = (BitVector) super.clone();
  226       if (this.bits != null) {
  227         clone.bits = this.bits.clone();
  228       }
  229       return clone;
  230     }
  231   
  232     /**
  233      * Returns a deep copy of the receiver; calls <code>clone()</code> and casts the result.
  234      *
  235      * @return a deep copy of the receiver.
  236      */
  237     public BitVector copy() {
  238       return (BitVector) clone();
  239     }
  240   
  241     /**
  242      * You normally need not use this method. Use this method only if performance is critical. Returns the bit vector's
  243      * backing bits. <b>WARNING:</b> For efficiency reasons and to keep memory usage low, <b>the array is not copied</b>.
  244      * So if subsequently you modify the returned array directly via the [] operator, be sure you know what you're doing.
  245      *
  246      * <p>A bitvector is modelled as a long array, i.e. <tt>long[] bits</tt> holds bits of a bitvector. Each long value
  247      * holds 64 bits. The i-th bit is stored in bits[i/64] at bit position i % 64 (where bit position 0 refers to the
  248      * least significant bit and 63 refers to the most significant bit).
  249      */
  250     public long[] elements() {
  251       return bits;
  252     }
  253   
  254     /**
  255      * You normally need not use this method. Use this method only if performance is critical. Sets the bit vector's
  256      * backing bits and size. <b>WARNING:</b> For efficiency reasons and to keep memory usage low, <b>the array is not
  257      * copied</b>. So if subsequently you modify the specified array directly via the [] operator, be sure you know what
  258      * you're doing.
  259      *
  260      * <p>A bitvector is modelled as a long array, i.e. <tt>long[] bits</tt> holds bits of a bitvector. Each long value
  261      * holds 64 bits. The i-th bit is stored in bits[i/64] at bit position i % 64 (where bit position 0 refers to the
  262      * least significant bit and 63 refers to the most significant bit).
  263      *
  264      * @param bits the backing bits of the bit vector.
  265      * @param size the number of bits the bit vector shall hold.
  266      * @throws IllegalArgumentException if <tt>size &lt; 0 || size &gt; bits.length*64</tt>.
  267      */
  268     public void elements(long[] bits, int size) {
  269       if (size < 0 || size > bits.length * QuickBitVector.BITS_PER_UNIT) {
  270         throw new IllegalArgumentException();
  271       }
  272       this.bits = bits;
  273       this.nbits = size;
  274     }
  275   
  276     /**
  277      * Compares this object against the specified object. The result is <code>true</code> if and only if the argument is
  278      * not <code>null</code> and is a <code>BitVector</code> object that has the same size as the receiver and the same
  279      * bits set to <code>true</code> as the receiver. That is, for every nonnegative <code>int</code> index
  280      * <code>k</code>,
  281      * <pre>((BitVector)obj).get(k) == this.get(k)</pre>
  282      * must be true.
  283      *
  284      * @param obj the object to compare with.
  285      * @return <code>true</code> if the objects are the same; <code>false</code> otherwise.
  286      */
  287     public boolean equals(Object obj) {
  288       if (obj == null || !(obj instanceof BitVector)) {
  289         return false;
  290       }
  291       if (this == obj) {
  292         return true;
  293       }
  294   
  295       BitVector other = (BitVector) obj;
  296       if (size() != other.size()) {
  297         return false;
  298       }
  299   
  300       int fullUnits = numberOfFullUnits();
  301       // perform logical comparison on full units
  302       for (int i = fullUnits; --i >= 0;) {
  303         if (bits[i] != other.bits[i]) {
  304           return false;
  305         }
  306       }
  307   
  308       // perform logical comparison on remaining bits
  309       int i = fullUnits * QuickBitVector.BITS_PER_UNIT;
  310       for (int times = numberOfBitsInPartialUnit(); --times >= 0;) {
  311         if (get(i) != other.get(i)) {
  312           return false;
  313         }
  314         i++;
  315       }
  316   
  317       return true;
  318     }
  319   
  320     /**
  321      * Applies a procedure to each bit index within the specified range that holds a bit in the given state. Starts at
  322      * index <tt>from</tt>, moves rightwards to <tt>to</tt>. Useful, for example, if you want to copy bits into an image
  323      * or somewhere else. <p> Optimized for speed. Particularly quick if one of the following conditions holds <ul>
  324      * <li><tt>state==true</tt> and the receiver is sparse (<tt>cardinality()</tt> is small compared to <tt>size()</tt>).
  325      * <li><tt>state==false</tt> and the receiver is dense (<tt>cardinality()</tt> is large compared to <tt>size()</tt>).
  326      * </ul>
  327      *
  328      * @param from      the leftmost search position, inclusive.
  329      * @param to        the rightmost search position, inclusive.
  330      * @param state     element to search for.
  331      * @param procedure a procedure object taking as argument the current bit index. Stops iteration if the procedure
  332      *                  returns <tt>false</tt>, otherwise continues.
  333      * @return <tt>false</tt> if the procedure stopped before all elements where iterated over, <tt>true</tt> otherwise.
  334      * @throws IndexOutOfBoundsException if (<tt>size()&gt;0 && (from&lt;0 || from&gt;to || to&gt;=size())</tt>).
  335      */
  336     public boolean forEachIndexFromToInState(int from, int to, boolean state,
  337                                              IntProcedure procedure) {
  338       /*
  339       // this version is equivalent to the low level version below, but about 100 times slower for large ranges.
  340       if (nbits==0) return true;
  341       checkRangeFromTo(from, to, nbits);
  342       final long[] theBits = this.bits; // cached for speed
  343       int length=to-from+1;
  344       for (int i=from; --length >= 0; i++) {
  345         if (QuickBitVector.get(theBits,i)==state) {
  346           if (!function.apply(i)) return false;
  347         }
  348       }
  349       return true;
  350       */
  351   
  352   
  353       /*
  354       * This low level implementation exploits the fact that for any full unit one can determine in O(1)
  355       * whether it contains at least one true bit,
  356       * and whether it contains at least one false bit.
  357       * Thus, 64 bits can often be skipped with one simple comparison if the vector is either sparse or dense.
  358       *
  359       * However, careful coding must be done for leading and/or trailing units which are not entirely contained in the query range.
  360       */
  361   
  362       if (nbits == 0) {
  363         return true;
  364       }
  365       checkRangeFromTo(from, to, nbits);
  366   
  367       // Cache some vars for speed.
  368       long[] theBits = this.bits;
  369       int bitsPerUnit = QuickBitVector.BITS_PER_UNIT;
  370   
  371       // Prepare
  372       int fromUnit = QuickBitVector.unit(from);
  373       int toUnit = QuickBitVector.unit(to);
  374       int i = from; // current bitvector index
  375   
  376       // Iterate over the leading partial unit, if any.
  377       int bitIndex = QuickBitVector.offset(from);
  378       int partialWidth;
  379       if (bitIndex > 0) { // There exists a leading partial unit.
  380         partialWidth = Math.min(to - from + 1, bitsPerUnit - bitIndex);
  381         for (; --partialWidth >= 0; i++) {
  382           if (QuickBitVector.get(theBits, i) == state) {
  383             if (!procedure.apply(i)) {
  384               return false;
  385             }
  386           }
  387         }
  388         fromUnit++; // leading partial unit is done.
  389       }
  390   
  391       if (i > to) {
  392         return true;
  393       } // done
  394   
  395       // If there is a trailing partial unit, then there is one full unit less to test.
  396       bitIndex = QuickBitVector.offset(to);
  397       if (bitIndex < bitsPerUnit - 1) {
  398         toUnit--; // trailing partial unit needs to be tested extra.
  399         partialWidth = bitIndex + 1;
  400       } else {
  401         partialWidth = 0;
  402       }
  403   
  404       // Iterate over all full units, if any.
  405       // (It does not matter that iterating over partial units is a little bit slow,
  406       // the only thing that matters is that iterating over full units is quick.)
  407       long comparator;
  408       if (state) {
  409         comparator = 0L;
  410       } else {
  411         comparator = ~0L;
  412       } // all 64 bits set
  413   
  414       for (int unit = fromUnit; unit <= toUnit; unit++) {
  415         long val = theBits[unit];
  416         if (val != comparator) {
  417           // at least one element within current unit matches.
  418           // iterate over all bits within current unit.
  419           if (state) {
  420             for (int j = 0, k = bitsPerUnit; --k >= 0; i++) {
  421               if ((val & (1L << j++)) != 0L) { // is bit set?
  422                 if (!procedure.apply(i)) {
  423                   return false;
  424                 }
  425               }
  426             }
  427           } else {
  428             for (int j = 0, k = bitsPerUnit; --k >= 0; i++) {
  429               if ((val & (1L << j++)) == 0L) { // is bit cleared?
  430                 if (!procedure.apply(i)) {
  431                   return false;
  432                 }
  433               }
  434             }
  435           }
  436         } else {
  437           i += bitsPerUnit;
  438         }
  439       }
  440   
  441       // Iterate over trailing partial unit, if any.
  442       for (; --partialWidth >= 0; i++) {
  443         if (QuickBitVector.get(theBits, i) == state) {
  444           if (!procedure.apply(i)) {
  445             return false;
  446           }
  447         }
  448       }
  449   
  450       return true;
  451     }
  452   
  453     /**
  454      * Returns from the bitvector the value of the bit with the specified index. The value is <tt>true</tt> if the bit
  455      * with the index <tt>bitIndex</tt> is currently set; otherwise, returns <tt>false</tt>.
  456      *
  457      * @param bitIndex the bit index.
  458      * @return the value of the bit with the specified index.
  459      * @throws IndexOutOfBoundsException if <tt>bitIndex&lt;0 || bitIndex&gt;=size()</tt>
  460      */
  461     public boolean get(int bitIndex) {
  462       if (bitIndex < 0 || bitIndex >= nbits) {
  463         throw new IndexOutOfBoundsException(String.valueOf(bitIndex));
  464       }
  465       return QuickBitVector.get(bits, bitIndex);
  466     }
  467   
  468     /**
  469      * Returns a long value representing bits of the receiver from index <tt>from</tt> to index <tt>to</tt>. Bits are
  470      * returned as a long value with the return value having bit 0 set to bit <code>from</code>, ..., bit
  471      * <code>to-from</code> set to bit <code>to</code>. All other bits of the return value are set to 0. If
  472      * <tt>to-from+1==0</tt> then returns zero (<tt>0L</tt>).
  473      *
  474      * @param from index of start bit (inclusive).
  475      * @param to   index of end bit (inclusive).
  476      * @return the specified bits as long value.
  477      * @throws IndexOutOfBoundsException if <tt>from&lt;0 || from&gt;=size() || to&lt;0 || to&gt;=size() || to-from+1<0 ||
  478      *                                   to-from+1>64</tt>
  479      */
  480     public long getLongFromTo(int from, int to) {
  481       int width = to - from + 1;
  482       if (width == 0) {
  483         return 0L;
  484       }
  485       if (from < 0 || from >= nbits || to < 0 || to >= nbits || width < 0 || width > QuickBitVector.BITS_PER_UNIT) {
  486         throw new IndexOutOfBoundsException("from:" + from + ", to:" + to);
  487       }
  488       return QuickBitVector.getLongFromTo(bits, from, to);
  489     }
  490   
  491     /**
  492      * Returns from the bitvector the value of the bit with the specified index; <b>WARNING:</b> Does not check
  493      * preconditions. The value is <tt>true</tt> if the bit with the index <tt>bitIndex</tt> is currently set; otherwise,
  494      * returns <tt>false</tt>.
  495      *
  496      * <p>Provided with invalid parameters this method may return invalid values without throwing any exception. <b>You
  497      * should only use this method when you are absolutely sure that the index is within bounds.</b> Precondition
  498      * (unchecked): <tt>bitIndex &gt;= 0 && bitIndex &lt; size()</tt>.
  499      *
  500      * @param bitIndex the bit index.
  501      * @return the value of the bit with the specified index.
  502      */
  503     public boolean getQuick(int bitIndex) {
  504       return QuickBitVector.get(bits, bitIndex);
  505     }
  506   
  507     /**
  508      * Returns a hash code value for the receiver. The hash code depends only on which bits have been set within the
  509      * receiver. The algorithm used to compute it may be described as follows.<p> Suppose the bits in the receiver were to
  510      * be stored in an array of <code>long</code> integers called, say, <code>bits</code>, in such a manner that bit
  511      * <code>k</code> is set in the receiver (for nonnegative values of <code>k</code>) if and only if the expression
  512      * <pre>((k&gt;&gt;6) &lt; bits.length) && ((bits[k&gt;&gt;6] & (1L &lt;&lt; (bit & 0x3F))) != 0)</pre>
  513      * is true. Then the following definition of the <code>hashCode</code> method would be a correct implementation of the
  514      * actual algorithm:
  515      * <pre>
  516      * public int hashCode() {
  517      *      long h = 1234;
  518      *      for (int i = bits.length; --i &gt;= 0; ) {
  519      *           h ^= bits[i] * (i + 1);
  520      *      }
  521      *      return (int)((h &gt;&gt; 32) ^ h);
  522      * }</pre>
  523      * Note that the hash code values change if the set of bits is altered.
  524      *
  525      * @return a hash code value for the receiver.
  526      */
  527     public int hashCode() {
  528       long h = 1234;
  529       for (int i = bits.length; --i >= 0;) {
  530         h ^= bits[i] * (i + 1);
  531       }
  532   
  533       return (int) ((h >> 32) ^ h);
  534     }
  535   
  536     /**
  537      * Returns the index of the first occurrence of the specified state. Returns <code>-1</code> if the receiver does not
  538      * contain this state. Searches between <code>from</code>, inclusive and <code>to</code>, inclusive. <p> Optimized for
  539      * speed. Preliminary performance (200Mhz Pentium Pro, JDK 1.2, NT): size=10^6, from=0, to=size-1, receiver contains
  540      * matching state in the very end --> 0.002 seconds elapsed time.
  541      *
  542      * @param state state to search for.
  543      * @param from  the leftmost search position, inclusive.
  544      * @param to    the rightmost search position, inclusive.
  545      * @return the index of the first occurrence of the element in the receiver; returns <code>-1</code> if the element is
  546      *         not found.
  547      * @throws IndexOutOfBoundsException if (<tt>size()&gt;0 && (from&lt;0 || from&gt;to || to&gt;=size())</tt>).
  548      */
  549     public int indexOfFromTo(int from, int to, boolean state) {
  550       IndexProcedure indexProcedure = new IndexProcedure();
  551       forEachIndexFromToInState(from, to, state, indexProcedure);
  552       return indexProcedure.foundPos;
  553     }
  554   
  555     /** Performs a logical <b>NOT</b> on the bits of the receiver (A = ~A). */
  556     public void not() {
  557       long[] theBits = this.bits;
  558       for (int i = theBits.length; --i >= 0;) {
  559         theBits[i] = ~theBits[i];
  560       }
  561     }
  562   
  563     /**
  564      * Returns the number of bits used in the trailing PARTIAL unit. Returns zero if there is no such trailing partial
  565      * unit.
  566      */
  567     protected int numberOfBitsInPartialUnit() {
  568       return QuickBitVector.offset(nbits);
  569     }
  570   
  571     /** Returns the number of units that are FULL (not PARTIAL). */
  572     protected int numberOfFullUnits() {
  573       return QuickBitVector.unit(nbits);
  574     }
  575   
  576     /**
  577      * Performs a logical <b>OR</b> of the receiver with another bit vector (A = A | B). The receiver is modified so that
  578      * a bit in it has the value <code>true</code> if and only if it either already had the value <code>true</code> or the
  579      * corresponding bit in the other bit vector argument has the value <code>true</code>.
  580      *
  581      * @param other a bit vector.
  582      * @throws IllegalArgumentException if <tt>size() &gt; other.size()</tt>.
  583      */
  584     public void or(BitVector other) {
  585       if (this == other) {
  586         return;
  587       }
  588       checkSize(other);
  589       long[] theBits = this.bits; // cached for speed.
  590       long[] otherBits = other.bits; //cached for speed.
  591       for (int i = theBits.length; --i >= 0;) {
  592         theBits[i] |= otherBits[i];
  593       }
  594     }
  595   
  596     /**
  597      * Constructs and returns a new bit vector which is a copy of the given range. The new bitvector has
  598      * <tt>size()==to-from+1</tt>.
  599      *
  600      * @param from the start index within the receiver, inclusive.
  601      * @param to   the end index within the receiver, inclusive.
  602      * @throws IndexOutOfBoundsException if <tt>size()&gt;0 && (from&lt;0 || from&gt;to || to&gt;=size()))</tt>.
  603      */
  604     public BitVector partFromTo(int from, int to) {
  605       if (nbits == 0 || to == from - 1) {
  606         return new BitVector(0);
  607       }
  608       checkRangeFromTo(from, to, nbits);
  609   
  610       int width = to - from + 1;
  611       BitVector part = new BitVector(width);
  612       part.replaceFromToWith(0, width - 1, this, from);
  613       return part;
  614     }
  615   
  616     /**
  617      * Sets the bit with index <tt>bitIndex</tt> to the state specified by <tt>value</tt>.
  618      *
  619      * @param bitIndex the index of the bit to be changed.
  620      * @param value    the value to be stored in the bit.
  621      * @throws IndexOutOfBoundsException if <tt>bitIndex&lt;0 || bitIndex&gt;=size()</tt>
  622      */
  623     public void put(int bitIndex, boolean value) {
  624       if (bitIndex < 0 || bitIndex >= nbits) {
  625         throw new IndexOutOfBoundsException(String.valueOf(bitIndex));
  626       }
  627       if (value) {
  628         QuickBitVector.set(bits, bitIndex);
  629       } else {
  630         QuickBitVector.clear(bits, bitIndex);
  631       }
  632     }
  633   
  634     /**
  635      * Sets bits of the receiver from index <code>from</code> to index <code>to</code> to the bits of <code>value</code>.
  636      * Bit <code>from</code> is set to bit 0 of <code>value</code>, ..., bit <code>to</code> is set to bit
  637      * <code>to-from</code> of <code>value</code>. All other bits stay unaffected. If <tt>to-from+1==0</tt> then does
  638      * nothing.
  639      *
  640      * @param value the value to be copied into the receiver.
  641      * @param from  index of start bit (inclusive).
  642      * @param to    index of end bit (inclusive).
  643      * @throws IndexOutOfBoundsException if <tt>from&lt;0 || from&gt;=size() || to&lt;0 || to&gt;=size() || to-from+1<0 ||
  644      *                                   to-from+1>64</tt>.
  645      */
  646     public void putLongFromTo(long value, int from, int to) {
  647       int width = to - from + 1;
  648       if (width == 0) {
  649         return;
  650       }
  651       if (from < 0 || from >= nbits || to < 0 || to >= nbits || width < 0 || width > QuickBitVector.BITS_PER_UNIT) {
  652         throw new IndexOutOfBoundsException("from:" + from + ", to:" + to);
  653       }
  654       QuickBitVector.putLongFromTo(bits, value, from, to);
  655     }
  656   
  657     /**
  658      * Sets the bit with index <tt>bitIndex</tt> to the state specified by <tt>value</tt>; <b>WARNING:</b> Does not check
  659      * preconditions.
  660      *
  661      * <p>Provided with invalid parameters this method may set invalid values without throwing any exception. <b>You
  662      * should only use this method when you are absolutely sure that the index is within bounds.</b> Precondition
  663      * (unchecked): <tt>bitIndex &gt;= 0 && bitIndex &lt; size()</tt>.
  664      *
  665      * @param bitIndex the index of the bit to be changed.
  666      * @param value    the value to be stored in the bit.
  667      */
  668     public void putQuick(int bitIndex, boolean value) {
  669       if (value) {
  670         QuickBitVector.set(bits, bitIndex);
  671       } else {
  672         QuickBitVector.clear(bits, bitIndex);
  673       }
  674     }
  675   
  676     /**
  677      * Replaces the bits of the receiver in the given range with the bits of another bit vector. Replaces the range
  678      * <tt>[from,to]</tt> with the contents of the range <tt>[sourceFrom,sourceFrom+to-from]</tt>, all inclusive. If
  679      * <tt>source==this</tt> and the source and destination range intersect in an ambiguous way, then replaces as if using
  680      * an intermediate auxiliary copy of the receiver. <p> Optimized for speed. Preliminary performance (200Mhz Pentium
  681      * Pro, JDK 1.2, NT): replace 10^6 ill aligned bits --> 0.02 seconds elapsed time.
  682      *
  683      * @param from       the start index within the receiver, inclusive.
  684      * @param to         the end index within the receiver, inclusive.
  685      * @param source     the source bitvector to copy from.
  686      * @param sourceFrom the start index within <tt>source</tt>, inclusive.
  687      * @throws IndexOutOfBoundsException if <tt>size()&gt;0 && (from&lt;0 || from&gt;to || to&gt;=size() || sourceFrom<0
  688      *                                   || sourceFrom+to-from+1>source.size()))</tt>.
  689      */
  690     public void replaceFromToWith(int from, int to, BitVector source, int sourceFrom) {
  691       if (nbits == 0 || to == from - 1) {
  692         return;
  693       }
  694       checkRangeFromTo(from, to, nbits);
  695       int length = to - from + 1;
  696       if (sourceFrom < 0 || sourceFrom + length > source.size()) {
  697         throw new IndexOutOfBoundsException();
  698       }
  699   
  700       if (source.bits == this.bits && from <= sourceFrom && sourceFrom <= to) { // dangerous intersection
  701         source = source.copy();
  702       }
  703   
  704       long[] theBits = this.bits; // cached for speed.
  705       long[] sourceBits = source.bits; // cached for speed.
  706   
  707       /*
  708       This version is equivalent to the version below but 20 times slower...
  709       for (int i=from; --length >= 0; i++, sourceFrom++) {
  710         QuickBitVector.put(theBits,i,QuickBitVector.get(sourceBits,sourceFrom));
  711       }
  712       */
  713   
  714       // Low level implementation for speed.
  715       // This could be done even faster by implementing on even lower levels. But then the code would probably become a "don't touch" piece.
  716       int width = to - from + 1;
  717       int blocks = QuickBitVector.unit(width); // width/64
  718       int bitsPerUnit = QuickBitVector.BITS_PER_UNIT;
  719       int bitsPerUnitMinusOne = bitsPerUnit - 1;
  720   
  721       // copy entire 64 bit blocks, if any.
  722       for (int i = blocks; --i >= 0;) {
  723         long val = QuickBitVector.getLongFromTo(sourceBits, sourceFrom, sourceFrom + bitsPerUnitMinusOne);
  724         QuickBitVector.putLongFromTo(theBits, val, from, from + bitsPerUnitMinusOne);
  725         sourceFrom += bitsPerUnit;
  726         from += bitsPerUnit;
  727       }
  728   
  729       // copy trailing bits, if any.
  730       int offset = QuickBitVector.offset(width); //width%64
  731       long val = QuickBitVector.getLongFromTo(sourceBits, sourceFrom, sourceFrom + offset - 1);
  732       QuickBitVector.putLongFromTo(theBits, val, from, from + offset - 1);
  733     }
  734   
  735     /**
  736      * Sets the bits in the given range to the state specified by <tt>value</tt>. <p> Optimized for speed. Preliminary
  737      * performance (200Mhz Pentium Pro, JDK 1.2, NT): replace 10^6 ill aligned bits --> 0.002 seconds elapsed time.
  738      *
  739      * @param from  the start index, inclusive.
  740      * @param to    the end index, inclusive.
  741      * @param value the value to be stored in the bits of the range.
  742      * @throws IndexOutOfBoundsException if <tt>size()&gt;0 && (from&lt;0 || from&gt;to || to&gt;=size())</tt>.
  743      */
  744     public void replaceFromToWith(int from, int to, boolean value) {
  745       if (nbits == 0 || to == from - 1) {
  746         return;
  747       }
  748       checkRangeFromTo(from, to, nbits);
  749       long[] theBits = this.bits; // cached for speed
  750   
  751       int fromUnit = QuickBitVector.unit(from);
  752       int fromOffset = QuickBitVector.offset(from);
  753       int toUnit = QuickBitVector.unit(to);
  754       int toOffset = QuickBitVector.offset(to);
  755   
  756       long filler;
  757       if (value) {
  758         filler = ~0L;
  759       } else {
  760         filler = 0L;
  761       }
  762   
  763       int bitIndex = from;
  764       if (fromUnit == toUnit) { // only one unit to do
  765         QuickBitVector.putLongFromTo(theBits, filler, bitIndex, bitIndex + to - from);
  766         //slower: for (; bitIndex<=to; ) QuickBitVector.put(theBits,bitIndex++,value);
  767         return;
  768       }
  769   
  770       // treat leading partial unit, if any.
  771       int bitsPerUnit = QuickBitVector.BITS_PER_UNIT;
  772       if (fromOffset > 0) { // fix by Olivier Janssens
  773         QuickBitVector.putLongFromTo(theBits, filler, bitIndex, bitIndex + bitsPerUnit - fromOffset);
  774         bitIndex += bitsPerUnit - fromOffset + 1;
  775         /* slower:
  776         for (int i=bitsPerUnit-fromOffset; --i >= 0; ) {
  777           QuickBitVector.put(theBits,bitIndex++,value);
  778         }*/
  779         fromUnit++;
  780       }
  781       if (toOffset < bitsPerUnit - 1) {
  782         toUnit--;
  783       } // there is a trailing partial unit
  784   
  785       // treat full units, if any.
  786       for (int i = fromUnit; i <= toUnit;) {
  787         theBits[i++] = filler;
  788       }
  789       if (fromUnit <= toUnit) {
  790         bitIndex += (toUnit - fromUnit + 1) * bitsPerUnit;
  791       }
  792   
  793       // treat trailing partial unit, if any.
  794       if (toOffset < bitsPerUnit - 1) {
  795         QuickBitVector.putLongFromTo(theBits, filler, bitIndex, to);
  796         /* slower:
  797         for (int i=toOffset+1; --i >= 0; ) {
  798           QuickBitVector.put(theBits,bitIndex++,value);
  799         }*/
  800       }
  801     }
  802   
  803     /**
  804      * Changes the bit with index <tt>bitIndex</tt> to the "set" (<tt>true</tt>) state.
  805      *
  806      * @param bitIndex the index of the bit to be set.
  807      * @throws IndexOutOfBoundsException if <tt>bitIndex&lt;0 || bitIndex&gt;=size()</tt>
  808      */
  809     public void set(int bitIndex) {
  810       if (bitIndex < 0 || bitIndex >= nbits) {
  811         throw new IndexOutOfBoundsException(String.valueOf(bitIndex));
  812       }
  813       QuickBitVector.set(bits, bitIndex);
  814     }
  815   
  816     /**
  817      * Shrinks or expands the receiver so that it holds <tt>newSize</tt> bits. If the receiver is expanded, additional
  818      * <tt>false</tt> bits are added to the end. If the receiver is shrinked, all bits between the old size and the new
  819      * size are lost; their memory is subject to garbage collection. (This method introduces a new backing array of
  820      * elements. WARNING: if you have more than one BitVector or BitMatrix sharing identical backing elements, be sure you
  821      * know what you are doing.)
  822      *
  823      * @param newSize the number of bits the bit vector shall have.
  824      * @throws IllegalArgumentException if <tt>size &lt; 0</tt>.
  825      */
  826     public void setSize(int newSize) {
  827       if (newSize != size()) {
  828         BitVector newVector = new BitVector(newSize);
  829         newVector.replaceFromToWith(0, Math.min(size(), newSize) - 1, this, 0);
  830         elements(newVector.elements(), newSize);
  831       }
  832     }
  833   
  834     /** Returns the size of the receiver. */
  835     public int size() {
  836       return nbits;
  837     }
  838   
  839     /**
  840      * Returns a string representation of the receiver. For every index for which the receiver contains a bit in the "set"
  841      * (<tt>true</tt>) state, the decimal representation of that index is included in the result. Such indeces are listed
  842      * in order from lowest to highest, separated by ",&nbsp;" (a comma and a space) and surrounded by braces.
  843      *
  844      * @return a string representation of this bit vector.
  845      */
  846     public String toString() {
  847       StringBuilder buffer = new StringBuilder(nbits);
  848       buffer.append('{');
  849   
  850       String separator = "";
  851       for (int i = 0; i < nbits; i++) {
  852         if (get(i)) {
  853           buffer.append(separator);
  854           separator = ", ";
  855           buffer.append(i);
  856         }
  857       }
  858   
  859       buffer.append('}');
  860       return buffer.toString();
  861     }
  862   
  863     /**
  864      * Performs a logical <b>XOR</b> of the receiver with another bit vector (A = A ^ B). The receiver is modified so that
  865      * a bit in it has the value <code>true</code> if and only if one of the following statements holds: <ul> <li>The bit
  866      * initially has the value <code>true</code>, and the corresponding bit in the argument has the value
  867      * <code>false</code>. <li>The bit initially has the value <code>false</code>, and the corresponding bit in the
  868      * argument has the value <code>true</code>. </ul>
  869      *
  870      * @param other a bit vector.
  871      * @throws IllegalArgumentException if <tt>size() &gt; other.size()</tt>.
  872      */
  873     public void xor(BitVector other) {
  874       checkSize(other);
  875       long[] theBits = this.bits; // cached for speed.
  876       long[] otherBits = other.bits; //cached for speed.
  877       for (int i = theBits.length; --i >= 0;) {
  878         theBits[i] ^= otherBits[i];
  879       }
  880     }
  881   }

Save This Page
Home » mahout-collections-1.0-src » org.apache.mahout.math.bitvector » [javadoc | source]