TWO-VARIABLE DATA Data sets are limited to arrays of at most twenty columns and 300 rows.

Master Index Current Directory Index Go to SkepticTank Go to Human Rights activist Keith Henson Go to Scientology cult

Skeptic Tank!

TWO-VARIABLE DATA Data sets are limited to arrays of at most twenty columns and 300 rows. The Rows and Cols values in the main menu serves to show how much data is currently on display. These indices can be reset to define the size of a new array, or to limit the display to the first two or three columns of a large array. (Even if only two columns are on display, the other columns are not forgotten, however.) Any data set can be Edited; in particular, a new data set can be created by editing an array of Zeros. When in editing mode, move the cursor with the arrow keys. To provide input, just start typing the numerical data; the program will go into input mode. Press RETURN after each entry; the cursor automatically jumps to the next entry. The default mode is row-by-row entry; pressing Ctrl-I will switch to column-by-column entry. In addition to data entry, you can also Swap columns, and Insert and Delete rows when in editing mode. In the event that data has been input in transposed form, requesting Transpose will right it. Press Escape when the required editing is done; or press U to Undo all the edits and recover the original data set. An edited data set is NOT automatically saved to disk, not even if the source file resides there; you must use Ctrl-S. If an array does not fit on the screen, the arrow keys will bring parts into view; PgUp and PgDn move a screen at a time. The display always warns of hidden entries. When a data set is saved, it is given a file name; you can also Label the columns of each array. Each heading is limited to at most twelve characters. When the column index is changed, the current heading is put on display. The default headings are "Column #". The headings are stored with the data. Column indices are not changed by the usual input- return process. Instead, the C keypress simply advances the display (cyclically) until the desired value is reached. You can obtain scatter Plots for any data sets. Each of the coordinate axes has a variable assigned to it; these assignments are designated as Hor and Ver, and can be changed at any time. Each column has an associated icon (a box for column 2, a plus sign for column 3, and so on); these are not adjustable, and they repeat after column 11. The program uses the icon associated with the Ver variable to draw the scatter plot. To obtain simultaneous scatter plots, see the discussion for Ctrl- A and Ctrl-W below. Press Data to refresh the numerical display. There are two other 2-variable menus: Fit Equations and Transform Data, described next. The Fit Equations menu is where you try to discover algebraic relationships between the variables. You can superimpose the graph of any equation of the form Y=f(X) on the current scatter plot; request Other and enter the desired formula for f(X). Most likely you want to fit a straight line to the data. This can be done in two automatic ways: the Median-Median line and the Least- Squares (regression) line. The resulting equation is put on display at the top of the screen after the line is plotted. In the least-squares case, the correlation coefficient is also displayed (along with the slope and the Y-intercept). In the median-median case, you can request Summary to see the three summary points and their coordinates. There are five other automatic curve-fitting modes, besides the linear Type. They are semilog, log-log, and power, quadratic, and exponential. In the first case, the Y data is transformed (by applying the logarithm function Ln to it) before the linear method (median-median or least-squares) is applied; in the second and third cases, logarithms of both X and Y are calculated. The power fit is really a special case of the log-log fit, in which the parameters of y = k*x^m are displayed directly. The exponential fit is a redundant version of the semilog fit. The quadratic least-squares fit finds the quadratic function that minimizes the sum of the squares of the residuals. The median-median quadratic fit just passes a quadratic function through the three summary points. Whatever the fitted equation, you will want to see the errors of approximation (the differences between the Y data and the Y fit). These Residuals may be added to the data set, by assigning them to a particular Column. When R is pressed, the new data set appears on the screen; the column heading is automatically set to "Residuals". (You can always refresh either the Plot or the Data, by the way, when one or the other disappears from view.) You can also calculate a predicted Y-value based on an arbitrary X-value, or vice-versa. The second problem is subject to difficulty, for you are searching for X-values that might not exist, or that might exist outside the displayed domain, or that might exist in profusion. In any event, it may be necessary to Escape the search process if it does not seem to be getting anywhere. Pressing either Plot or Data brings the requested item into view. (For instance, you might want to erase an undesirable fitted curve, or you might want to take a look at the data without returning to the main menu.) In the former case, it is helpful that the Horizontal and Vertical variables are visible in the menu, and are changeable (in case you want to display a different plot, for example). The Transform Data menu allows you to apply arbitrary functions to data columns. Press F to input a function. The variables X, Y, and Z can be assigned to any of the columns. For instance, you can find the difference between columns 2 and 3 by setting F = X-Y and assigning X and Y to columns 2 and 3. You may also assign an Offset to each variable, which internally assigns a vertical shift to the assigned column. For example, assign both X and Y to column 2, but assign Y an offset of -1; now the function X-Y tables the difference between each entry of column 2 and the entry just above it (entries above or below the matrix are assumed to be zeros). In any case, you must specify which column is to serve as the Destination. Finally, Go executes the transformation; the destination column is relabelled "Transf". An incidental feature is that the variable I is interpreted as the row number of any entry, so that you can create an indexed column by applying the function F = I, for example. Before an unsaved data set can be discarded (by leaving the program or by requesting Old, for example), the program will ask for permission to discard. A response of "No" leaves things as they are, and the data can then be saved. Each Plot request automatically sizes the window so that the data points just fit, which makes it difficult to produce simultaneous scatter plots. Two special keys are helpful here: When the Ctrl-Accumulator switch is on, every Plot request resizes the window outward only. In other words, all data sets plotted since the Ctrl-A switch was turned on will fit in the most recent window. Simultaneous scatter plots are still impossible, however, if the program continues to refresh the screen for each plot, so you can turn off the automatic refresh with Ctrl-W (once the window is large enough for all the data), and the program will only plot points. The multiple-scatter-plot procedure: Turn Ctrl-A on, request all the individual scatter plots, turn Ctrl-W off after the last one, then replot the earlier ones on the fixed axes. This may involve retrieving some Old files, by the way. If different icons are desired for the final version, this must be planned before the Ctrl-W switch is turned off. Note the following: The screen-refresh switch is active only in the main 2- variable menu, and only when there is a plot on the screen. When it is off, many of the other menu items are also disabled (anything that involves putting data back on the screen: Editing, Transforming data, Help, etc). Special Ctrl-Keys: Ctrl-Accumulator is a switch. When it is on, each plot request can only enlarge the previous graphing window. Ctrl-Dot Density alters the speed of curve-plotting, by determining how many points to be calculated. A lower density speeds things up, of course. Ctrl-Format is for specifiying the appearance of decimal output. Here you choose the total width of decimal output and also the number of decimal places when the decimal point is fixed in position (Alt-F switches between floating and fixed mode). For data displays, you can Skip a line between rows. Ctrl-Overlay allows you to superimpose text on scatter plots. Use the arrow keys to move the cursor to the desired location, press T to edit the text, then press W to center the text at the cursor position. This text is part of the data set, and is saved with the figure. When the scatter plot is refreshed, the text can be put back on the diagram by requesting All in the Overlay menu. To make the graphics text larger, press Alt-B. Use Ctrl-Print for hard copy. When an array is on the screen, the program will print it directly; if a plot is displayed, however, you must look at the print menu, just in case the target printer (Device) needs to be identified. The default is the standard dot matrix, in which case simply pressing P should produce the desired result. Ctrl-Save data sets to disk files. In addition to the two dimensions and the entries themselves, the program also stores the two format specifications (fieldwidth and decimal places). The column headings are saved to a file as well. N.B. The program only saves the data that is on display; if Rows and Columns have been set so as to display only PART of a larger data set, the remainder will probably be lost. Ctrl-W is active only in the Main 2-Variable Menu, and only if there is a plot on the screen. It disables further screen refreshes until Ctrl-W is pressed again. ONE-VARIABLE DATA As above, data is arranged in rectangular arrays, but now the columns are of no significance. (Only a single column label is ever on display.) A data file can have at most 6000 = 300*20 items in it. Given a data set, you usually want to see a Histogram. First the data must be organized into a specified number of Groups, each with a specified Width. Setting either value causes the other to be adjusted. The only values allowed for Groups are 2..100. You must also specify the Minimum start value for the lowest group. Instead of the graphical Histogram, you may wish to see the underlying Frequency table. Another description of the data is by Quantiles, whose number is set by the Groups value. For instance, requesting Quantiles when Groups=4 produces quartiles, or centiles when Groups=100. The second quartile and the fiftieth centile are the same as the median, of course. The Box and Whisker plot is a graphical method for seeing the data divided by quartiles. Requesting Statistics displays a list of standard data. In addition to the number of items, you will see the quartiles, min and max values, range, midrange, mean, standard deviation, and mean deviation. For purposes of comparison, you can Overlay a Normal distribution on the histogram; the overlay has the same mean, standard deviation and total weight (area). Use Ctrl-P to print the screen graphics. Press P for a numerical printout of the frequency table and quantiles for the specified number of groups. SIMULATION Rolling Dice: You can change the Number of faces and the number of dice Rolled with each toss. You can also select a Statistic to be tallied - sum of dice, low value, high value, number of different values, and product of values (if the number of dice rolled is too high, this statistic will produce meaningless data, however). As the trials proceed, the value of the statistic is kept up to date, as well as its range (low and high) and average. The tabulations are reset to zero if the dice data is changed (number of faces or number of dice). You can proceed One trial at a time, in which case the results are displayed in the right window, or else you can press M for many trials, in which case you must stop the trials with Escape. While the trials are running, pressing S will switch between an active display and a silent display; the latter runs more rapidly, of course. You can ask for a specific number of Trials. The Frequency table displays how many times each value of the statistic have occurred. If you press W, you will put the program into one of two Waiting modes. When One turn is requested now, the program runs as many trials as it takes for the statistic (sum, product, etc) to achieve a displayed objective - either matching a given Value (mode 1) or attaining all possible values (mode 2). It is possible to set an improbable (or impossible) goal, in which case Escape will be needed to halt the search. The variable that is now displayed (low, average, high) is the Waiting Time - how many trials are needed to accomplish the goal. Press W to change modes or to return to the non-waiting mode. The data produced by simulation can be put on file and examined using the data analysis subprograms. Press Alt-V to list the variables you wish to save; each is specified by a single letter: A for average, L for low, H for high, R for number of dice per Roll, N or F for the Number of Faces, T for the number of Trials, etc (other simulations augment this list). At any time, you can enter a data point (which is actually an n-tuple of values) by pressing Alt-New. Begin a new list of data points by pressing Alt-B. Save the entire list (as a matrix in which each column represents a variable) by pressing Alt-S and providing a file name. You can play with doctored dice: Press Ctrl-F to bring the Funny dice into play (or to switch them back out). You will also have to press Ctrl-A to Alter the pips on the faces; initially, they are standard - one pip on face number 1, two pips on face number 2, etc. Dealing Cards: This is essentially the same as the rolling dice simulation, EXCEPT that the cards are not independent of one another (are not replaced in the deck) the way dice are. In other words, each trial consists of removing a specified number of cards from the deck (before the next trial, they are of course put back). Dealing just 1 card per trial is equivalent to rolling one die. For data collection, variable D replaces R. You can play with doctored cards: Press Ctrl-F to bring the Funny deck into play (or to switch it back out). You will also have to press Ctrl-A to Alter the values on the cards; initially, they are standard - card number 1 is an ace, card number 2 is a deuce, etc. Throwing Darts: The principal parameters in this menu are the Radius of the bullseye and the Side length of the containing square (R and S in the data collection list). A single trial consists of a random throw at the square; a success (hit) occurs when the throw lies inside the circular target; the statistic has two values, 1 for hit and 0 for miss; the average over many trials is of course just the percentage of throws that are hits. The throws appear as dots on the screen. In the Waiting mode, you are waiting for a hit. Pitching Pennies: The principal parameters are the coin Radius and the Side length of a single square cell (R and S in the data collection list). A trial is classified as a hit if the coin lands entirely within a square, a miss otherwise; the statistic takes on only the values 0 and 1, as in throwing darts. When the trials are examined One at a time, they are displayed as circles on the screen. In the Waiting mode, you are waiting for a hit. PROBABILITY When a frequency distribution is plotted, only X-values for which the probability is large enough to be seen are actually plotted. This is why the displayed MinX and MaxX values do not always correspond to the tabled High and Low values for X. The graphing scale is chosen so that the modal frequency will fill the available space. Some of the routines have difficulty when large parameter values are input. This is usually because the program is being asked to deal with quantities that are very large and quantities that are very small in the same routine. If the total probability for a given distribution does not add up to 100%, that is a clear signal that the data is suspect. Binomial Menu: The parameters are N = number of trials and p = probability of success on an individual trial. The variable X is the total number of successes. Hypergeometric Menu: We are sampling from a two-stratum population, whose types are called Red and Blue. The parameters are R = number of reds, B = number of blues, and S = size of the sample. The variable X is the number of reds found in the sample. Dice Sum Menu: We toss N dice, each with faces numbered 1..k (equally likely, of course). X is the sum of the N values. When N=1, the distribution of X is uniform; when N=2, it is triangular. As N increases, the distri- bution approaches normality. Normal Menu: Given a normal variable X, the probability that X lies between Lo and Hi is calculated. The Mean and the Sigma (standard deviation) of the distribution are adjustable parameters. Dice Match Menu: We toss N dice, each with faces numbered 1..k (equally likely). Let X be the number of different values among the N obtained. The probability of finding a repetition is displayed in the menu. This routine runs into accuracy problems when N and k get large. K=365 corresponds to the classical birthday problem. Card Match Menu: A deck of N cards is shuffled. Let X be the number of cards that are found in their starting positions. The expected value of X is 1, regardless of the deck size. The probability that X=0 is virtually the same (1/e), once the deck size is ten or more. First Ace Menu: A deck of N cards that contains A aces is shuffled and dealt; let X be the position of the first ace in the deal. First Binomial Success Menu: A binomial experiment is repeated until the first success occurs; let X be the numbers of trials necessary. The probability of an individual success is of course p. The expected value of X is 1/p. Complete Set Menu: There are k equally likely different prizes available, one in each box of cereal. Let X be the number of boxes that must be bought, in order to obtain at least one of each prize; the possible values of X are k, k+1, ... . In other words, we roll a k- sided die until every face has turned up at least once; X is the number of rolls required. First Dice Match Menu: A k-sided die is rolled until some face has occurred for the second time. Let X be the number of rolls necessary for this to happen; the possible values of X are 2 ... k+1. If k=365, we have another birthday variable. There are a couple of Miscellaneous menus, which do not display probability distributions, but do give answers to probability questions: If a k-sided die is rolled X times (k <= X), what is the probability that every face will have appeared at least once? This is the cumulative distribution function for the probability function described above. If a k-sided die is rolled X times, what is the probability that some face will appear more than once (M times)? Answers given for M=2 and M=3. Accuracy problems occur in extreme examples. The following special Ctrl-Keys apply to the current probability distribution; if the example has not yet been calculated, there is a pause while this is done first: Ctrl-D: displays a bar graph on the screen. Ctrl-T: displays the table of values on the screen. Ctrl-H: prints the table of values. GENERAL INFORMATION PEANUT software should run on all IBM compatibles. It is only necessary that the appropriate graphics interface file be present. If these programs are copied, it is therefore important that the appropriate file *.BGI be copied, too. It is advisable but not necessary that the *.BGI files be in the same directory as the program file *.EXE; if the program can not find the desired *.BGI file there, it will search the root directory and the parent directory before giving up. The programs automatically try to select the finest graphics mode; to override the default selection, press Ctrl-G (this will be necessary with the ATT 6300, for example). The programs are compiled with version 7.0 of Borland Pascal. If the host computer has a numeric coprocessor (in other words, an 8087 chip), these programs will try to take advantage of it. All of the programs have associated documentation files *.DOC; you are reading part of one now! These files can be edited or printed with your word processor. Interaction with the computer takes two forms: Either the user is making menu selections or else the user is providing buffered input (that is, numbers or names). In the former case, no ENTER is required - touching a single key (perhaps in combination with the Ctrl key) does the job. In the latter case, however, the computer has to be told when the input is complete, and this requires ENTER as a signal. When the computer is waiting for this type of input, a box will open up on the screen, into which the necessary information is to be typed. One may edit the data in the box, using the left and right arrow keys to move the cursor. If the first keypress of an editing session is not an editing keypress (an arrow, say), the input box is emptied. There are a few standard two-key combinations. For example, Ctrl-E erases the graphics window, Ctrl-P is for printing, Ctrl-W gets the window reset menu, Ctrl-F gets function library menus, and Ctrl-END ends programs. Other Ctrl-keys are described below. Alt-C allows the user to assign new values to the twenty-six variables A..Z. Pressing the desired letter displays the current value of that letter, and pressing = activates the input process. Alt-F toggles between fixed point and floating point display formats (see below). Alt-M calls up a list of memory data. In each program, Ctrl-K calls up a menu of active Ctrl-Key combinations, and Alt-K calls up a menu of active Alt-Key combinations. These keys are usually not mentioned elsewhere in the menus. Whenever the program is in a scrolling mode (the arrow keypad used to examine a text or a table), one can request a search by pressing ENTER. The program finds the first instance of the string you enter, and places it in the window, usually on the top display line. The search is not case-sensitive. For example, to scroll through THIS file, together with a program-specific help file, just press ?. The necessary *.DOC files must be found in the current directory. The function interpreter built into the programs has been taught to understand most elementary function names (sin, cos, tan, csc, sec, cot, ln, log, exp, sinh, cosh, tanh, arcsin, arccos, arctan, int, sqr = square root, abs, and !) as well as some unconventional ones: root(n,x) = nth root of x; pow(n,x) = nth power of x; iter(n,f(x)) = n-fold iteration of f(x); max(a,b,..); min(a,b,..); sgn(x) = x/abs(x); frac(x) = x-int(x); binom(n,r) = n!/r!/(n-r)!; join(f|c,g|d,...,h) = function defined by y=f(x) for x<=c, y=g(x) for c


E-Mail Fredric L. Rice / The Skeptic Tank