$Header: /usr/home/web681a3/repository/documentation/ranges.htm,v 1.2 2006/04/29 00:24:44 joerg Exp $
A range is a list of things beginning from a start, being in a sequence up to an end. For example the numbers from 13 to 42, the days from 19.12.1919 to 12.12.1942, objects in an array from the 13th up to the 42nd, objects in a list ...
A range can be specified by giving its begin and end point. So for the numbers from 13 to 42, we can write [13, 42 ). For a range definition like [13, 42) there is the question, do the 13 and the 42 belong to the range or not. We use [] to include the begin and end and we use () to exclude them. So [13, 42) means the 13 is included and the 42 is excluded. In programming science these kind of ranges are called "half-open". (For the mathematicians: in the topology of real numbers [13, 42] is indeed closed whereas (13, 42) is open. But for a discrete topology like the set of natural numbers N each subset is both closed and open. So as a subset of N [13, 42) is open and closed at the same time. Because in programming science all sets are finite and thus discrete the word "half-open" makes no sense. Thus the programming science notion of half-open is inspired by but not identical to the topological one.)
In daily life we normally use closed ranges. So when you say "i'm in vacation from the 18th to the 20th of August" you're 3 day of. And i know: your grandmother always used closed ranges and has lived for 87 years.
One of the greatest results of programming science in the 1990th is C++'s STL, which is a library for containers, iterators and algorithms. The STL uses half-open ranges ( "Generic programming and the STL" by Matthew H. Austern, chapter 2.1.2) And one the best programmers of the known parts of our galaxy also votes for this asymmetric approach ( "C traps and pitfalls" Andrew Koenig, chapter 3.6).
ySo why? Matthew Austern: "What it really comes down to is convenience. The notation for half-open intervals may look unfamiliar at first, but it turns out to be easier, both for formulation definitions and for the more mundane purposes of writing and using algorithms, than the alternatives."
0. handle differnt 'end of month dates' automatically. to find if something is in february say < 1.3. not <28.2. which is not always the complete truth.
1. Reduce the number of "off-by-one" errors.
2. The number of elements in the range [n, m) is just m-n (and not m-n+1).
3. The empty range is [n, n) (and not [n, n-1], which can be a problem if n is an iterator already pointing the first element of a list, or if n == 0).
4. For floats you can write [13, 42) (instead of [13, 41.999999999999] ).
5. If the precision of the floats change you can keep [13, 42) (and not change it to 41.999999999999999999999999] ).
6. The same arguments 4 and 5 for a range of strings ["a", "f") (and not ["a", "ezzzzzzzzzzzzzzzzz"] ).
7. And for strings: is 'z' really the largest char? In the ANSI ASCII coding "e~" > "ezzzzzzzzzzzzzzzzz". And if you say ["a", "e~~~~~~~~~~~~~~~~~~~~~"] the behavior depends on the character set and locales.
8. And for variable length strings (like the C++ string type)? It's not
possible to write "e
9. And for dates we have reason 5. Additionally begin and end is equal for the hours if you don't need them: [2000.12.31.000000, 2001.01.13.000000) (instead of [2000.12.31.000000, 2001.01.12.235959] ).
10. Its easy to check for the completeness of a number of ranges: ["a". "f"), ["f". "k"), ["k". "o"), ["o". "q"), ["q". "u"), ["u". "w"), ["w". "z") (instead of ["a". "e"], ["f". "j"], ["k". "n"], ["o". "p"], ["q". "t"], ["u". "v"], ["w". "z"] ).
11. For dates there is a maximum (infinity), which is for example 9999.12.31. In some range algorithms (for example "split range") there is created a new range just after another. So you have [b1, e1) and you want to create [e1, e2). In such algorithms, [e1, e2) is discarded, if e1 >= e2. The case that e1 = 9999.12.31 works perfect in these algorithms. (In case of []-ranges 9999.12.31 needs a special treatment, because 9999.12.31 + 1 is not possible.)
12. The +1 and -1 are almost never used, when handling ranges. This is an advantage if they are expensive (as it is for dates).
13. If you write a find in a range, the fact that there was nothing found can easily indicated by returning the end as the found position: if( find( [begin, end) ) == end ) nothing found.
14. If there is an insert_at function, appending at the end is just an insert_at( [begin, end), end) like appending at the beginning is a insert_at( [begin, end), begin).
15. In languages, which start the array subscripts with 0 (like C, C++, JAVA, NCL) the upper bound is equal to the size.
0. unexperienced programmers get confused. if bad programmers are confused, they will work slower and hence will produce less bad code.
1. you have to translate ranges for end-users. in any case the range [A, KZZZZZZZZZZZZZZZZZZZZZZZ] and [A, L) have to be translated to "A, K".
this web page was designed by joerg kunze.
copyright; 2000, 2006 joerg kunze
this web page is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the license or (at your opinion) any later version.
this web page is distributed in the hope that it is useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. see the GNU General Public License for details.
you should have received a copy of the GNU General Public License along with these web page; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
$Log: ranges.htm,v $
Revision 1.2 2006/04/29 00:24:44 joerg
add Header and Log cvs keywords