Poor Richard's Data Warehouse
Tom Packert
Senior Financial Systems Analyst
Avanti/Case-Hoyt
I have never seen actual demographics on the types of companies that run multi-value based systems, but I don't think I am wrong in believing that the Pick market has many customers that are very careful with their money and don't have large IT resources. I don't think there are too many people reading this newsletter that are able to expend the resources it takes to develop a full fledged data warehouse. Instead, I hope to introduce an alternative way to build a bare bones information manipulation system that can answer many of the questions that a large scale data warehouse might answer for larger customers.
Actually, the term 'data warehouse' is not 100% accurate. The term describing what is known as a data warehouse should include the emphasis on the 'fulfillment' side of the warehouse. Stored items in a warehouse are not very useful unless we can get to them quickly and easily. Data locked away in a warehouse is useless without the corresponding delivery infrastructure. Most data warehouse include the delivery mechanisms.
The main benefit of a data warehouse lies in the ability to store many forms of data in a central location that facilitates viewing the data in ways that we have not thought of yet or which we should not attempt by bending a transaction processing system. This article will describe how to implement some data warehouse functions in Microsoft Excel so that multi-value customers can see that augmenting their core systems with Excel can give them some of the same benefits of a data warehouse.
Starting with Excel 97, the maximum number of rows in a worksheet is 65,536. This is certainly not on the scale of a true data warehouse, but it is within the limits of many multi-value systems. A modern PC with 64 or more megs of ram and processors of 500, 600, 700 megahertz will not be challenged much by a spreadsheet with only 65,536 rows.
Downloading transaction data from a multi-value system is the first step in analyzing the data. There are a lot of ways to do this and it is dependent on the particular version of Pick you are using. Wintegrate offers some good functionality to process and download multi-value data directly into Excel format. You can also program the download functions directly into Wintegrate in basic and transfer the files to the user's PC. In example 1, there is a program that implements a download that is controlled by the Pick host computer. The only requirement is that the user, be running Wintegrate when the host program is executed. The upside of using Wintegrate is that the data can be sent directly into Excel format on the PC and the transfer is fairly simple to implement. The downside to using Wintegrate is the rather costly per user license fees and the downloads are slower than other options.
Example 1
PROCREAD
BUFFER ELSE BUFFER = ""
*
CONO=BUFFER<1>
CONAME=BUFFER<2>
START.DATE=BUFFER<3>
END.DATE=BUFFER<4>
DIV=BUFFER<5>
DEPT=BUFFER<6>
CCTR=BUFFER<7>
*
PC.FILE="C:\DATA\DOWNLOAD\JOBTIME.XLS"
FILE
= "JOB.TIME"
*
ITEMS
= \SSELECT JOB.TIME WITH CONO = "\:CONO:\"\
IF
START.DATE = END.DATE THEN
ITEMS := \ AND WITH DATE =
"\:OCONV(START.DATE,'D2/'):\"\
END
ELSE
ITEMS := \ AND WITH DATE >= "\:OCONV(START.DATE,'D2/'):\"\
ITEMS := \ AND WITH DATE <=
"\:OCONV(END.DATE,'D2/'):\"\
END
IF
DIV # "ALL" THEN
ITEMS := \ AND WITH DIV =
"\:DIV:\"\
END
IF
DEPT # "ALL" THEN
ITEMS := \ AND WITH DEPT =
"\:DEPT:\"\
END
IF
CCTR # "ALL" THEN
ITEMS := \ AND WITH CCTR =
"\:CCTR:\"\
END
*
FIELDS
= \JOB.NO CUST.NAME ENT.DATE SHIFT HRS DCOST CALC.RATE OPER\
FIELDS
:= \ OPER.DESC DEPT CCTR PROD_QTY\
FIELDS := \
EMP.NO EMP.NAME\
*
OPTS = "XLS"
OPTS<2>='yes' ;*
overwrite
OPTS<3> =
"reformat" ;* data mode
OPTS<4> = "true"
;* suppress id
OPTS<5>='true' ;*
include field ids
OPTS<7> =
"\255,\r\n,\254,\r\n" ;* translation
STATUS = ""
CALL
WIN.IMPORT(PC.FILE,FILE,ITEMS,FIELDS,OPTS,STATUS)
Other options for moving the data to the PC are more dependent on what database and operating system you are running. We happen to use Unidata, so sending a Unidata file directly to a comma separated format and sending it to the PC is pretty easy to automate. Unidata supports a LIST option TO a Unix file name. In example 2, there is a program we use to move multi-value data out of Unidata and into a PC.
There may be a couple of minor problems you might run into on your system. The FTP and ZMODEM commands in the example below will probably need to be setup differently on your system. They are readily implemented, but they will need to be setup by your System administrator. We use a Unix script that allows us to move Unix files over to the NT file servers. ZMODEM is widely available for free, but needs to be installed on your system by the system administrator.
Also, Unidata does not process the output conversions when the data is written to the file. Any dictionary items which have MR2 or D2/ conversions need to converted to an "I" with OCONV(SOME.DATE,'D2/'). In attribute two. This is done so that the data that ends up in the file is the externally formatted data, not the internal format. I usually create a new dictionary item with an "EXT." prefix for any dictionary item that has an M or D conversion. I also create EXT. prefixed versions of anything that may contain a comma in the data. If the attribute "DESC" may have a comma then attribute two of the EXT.DESC, "I" type dictionary item is CONVERT(",","",DESC)
Example 2
PROCREAD
BUFFER ELSE BUFFER=''
CONO=BUFFER<1>
*
FILE.NAME
= \JTIME\:CONO:\.TXT\
*
LIST.STMT
= ""
LIST.STMT
:= \LIST JOB.TIME \
LIST.STMT :=
\JOB.NO \
LIST.STMT :=
\EXT.CUST.NAME \
LIST.STMT :=
\EXT.DATE \
LIST.STMT :=
\SHIFT \
LIST.STMT :=
\HRS \
LIST.STMT :=
\DCOST \
LIST.STMT :=
\CALC.RATE \
LIST.STMT :=
\OPER \
LIST.STMT :=
\OPER.DESC \
LIST.STMT :=
\DEPT \
LIST.STMT :=
\CCTR \
LIST.STMT :=
\PROD_QTY \
LIST.STMT :=
\EMP.NO \
LIST.STMT :=
\EXT.EMP.NAME \
LIST.STMT :=
\ID-SUPP \
LIST.STMT :=
\TO DELIM "," \
LIST.STMT :=
\/data/prod/AVANTI-FILES/_HOLD_/JTIME\:CONO:\.TXT\
*
DATA
"Y"
UDTEXECUTE
LIST.STMT
*
DOWNLOAD.TYPE='ZMODEM'
BEGIN
CASE
CASE DOWNLOAD.TYPE='FTP'
STMT = \rsh localhost -l cron
./send_aplus.sh \
STMT :=
\/data/prod/AVANTI-FILES/_HOLD_/\:FILE.NAME:\ \
STMT := \/prod/\:@LOGNAME:\/\:FILE.NAME
PCPERFORM STMT
CASE DOWNLOAD.TYPE='ZMODEM'
STMT = \sz \
STMT :=
\/data/prod/AVANTI-FILES/_HOLD_/\:FILE.NAME:\ \
PCPERFORM STMT
END CASE
Assuming we have found a way to automate the movement of the data from your multi-value system to the PC environment, the next step is to setup Excel to allow easy analysis. The key to the analysis capability is the Excel feature called a Pivot Table. Pivot Tables are an amazing tool. When Bill Gates talks about 'information at your finger tips', pivot tables are one of the things he is talking about. The term 'pivot table' is probably poorly named because it does not create a natural motivation to figure out what it does. Pivot Tables can be found on the "DATA" menu in Excel, but very few people, venture to the DATA menu and the pivot table command is easy to skip over and not realize what you missed.
In order to continue, I will make up some sample data as show in example 3. I have created a couple hundred rows of random data. The first step in creating a pivot table is to select all of the data that you will want the pivot table to see. In this example, it will be cells A1 through E200.
Example 3
|
Employee |
Date |
Operation |
CostCenter |
Hours |
|
Donald Trump |
11/01/99 |
Make Ready |
300 |
3.00
|
|
Bill Bradley |
11/10/99 |
Run |
400 |
6.00
|
|
Al Gore |
11/14/99 |
Make Ready |
500 |
7.00
|
|
JW Bush |
11/02/99 |
Downtime |
600 |
6.00
|
|
Donald Trump |
11/11/99 |
Make Ready |
300 |
0.00
|


After selecting all the columns and rows, then click on "DATA" in the menu bar, then "Pivot Table Report" and then you will see the first dialog box of the Pivot Table Wizard (Figure 1). As long as you are only using Excel as we are in this example, you only need to click on the Excel (default) button and then click next.
The
next step is to select the range of cells.
Since we highlighted the cells already, the default is set for us as A1:
E200 . The dollar signs ("$")
are added my Excel to indicate that the
cell reference is an absolute references and that it will not change if dragged
and dropped anywhere. If the range is correct, then just click on the NEXT
button.
The next dialog box
is where the power of the pivot table begins to show. It actually starts
getting fun now. By simply dragging and
dropping the buttons on the right, we will be constructing the pivot
table. Notice that the column headings
from the spreadsheet appear on the right; Employee, Date, Operation, Cost
Center Hours. Excel read the data that
we had highlighted and found the headings.
Now, we can drag any field from the right into the PAGE, ROW, COLUMN or
DATA sections of the pivot table wizard.
The Page section will control the data that is displayed, I will drad
the Employee to this section. This will
tell Excel to create a drop down box for the Employee which will allow me to look
at any one Employee or all of them.
Then I drag the Operation to the Row section, then Date to the column
section and the hours to the data section.
Sometimes, when you drag a field into the data section, it will default to a COUNT function. You can change this to a SUM function by double clicking on the button in the data section and selecting the SUM function. When you double click on the Count or SUM button, you will notice there are more options, the NUMBER button will allow you to format the numbers in the pivot table as you would any other cell in Excel.
WORKSHEET,
at least getting started. This will
create a new worksheet within the existing workbook and insert the pivot table
in the worksheet. The pivot table will
look like the following. The Operations
will be down the left in the rows, the dates will be the column headers and the
hours will be summed in the middle. The
Employee field that was dragged to the page section will allow the user to
quickly filter the information to display any one user. The sections are all optional, you can
create a pivot table with only a field dragged to the Row Section and a field
in the Data Section.

There is now a pivot table that is capable of quickly displaying the information for any employee and what hours they worked on what day.. Clicking on the arrow in cell B2 will display a drop down dialog. This is what dragging a field into the Page section does, it creates a cell with a selection list, selecting one of the values in the list will instantly filter the pivot table and display the results. The result is a powerful tool to instantly view the summary information on any employee. Not everyone will know how to create a pivot table, but once created, pivot tables are tools people get used to using quickly.

To
look and the data in the different way, you can go back to the pivot table
wizard and drag the fields into different sections. For example, you can create a meaningful report by dragging the
Operation to the page section and the Employee to the row section. Or you can drag both Employee and Operation
to the Page section. If you highlight
the original data again, then you can repeat the whole process and create
another pivot table with different fields in different sections in another
worksheet.
Using the Pivot Tables in Formulas: The next useful thing that can be done with pivot tables is to lookup the information in the Pivot Table in functions. This allows you to create pivot tables that summarize numbers and then utilize the summarized number in other formulas. Often this is easier that having to develop a formula to perform conditional sum. The right most columns and the bottom row contain the totals for the columns and row.
The total of the DOWNTIME hours that appears in the pivot table is 238. The following function will return the value 238 from the pivot table, but this formula can be placed in any cell in the worksheet. =GETPIVOTDATA(A1:AF8,"Downtime "). This formula tells Excel to look in the Pivot Table in the range A1: AF8 and find a total for "Downtime". It returns 238. A slightly more advanced version will return the total for Downtime for 11/30/99. =GETPIVOTDATA(A1:AF8,"Downtime 11/30/99 "). The GetPivotData is an extremely powerful function for populating cells with aggregate values. The GetPivotData functions only requires two parameters, where the pivot table can be found and what to look for. The value to look for is specified as text. The power of the GetPivotData can be further enhanced by removing the hard coded text and make the look for value based on the value of another cell. The formula, =GETPIVOTDATA(A1:AF8,CONCATENATE(B14," ",C14," ")) will return the value 7.00 also. Placing the text "Downtime" in Cell B14 and "11/30/99" in C14 will return the same result as if we had hard coded the text to look for in the formula. By referencing the values to use to lookup the information in the pivot table, it is possible to layout additional worksheets that are very flexible.
With Excel 2000, Microsoft has introduced "Pivot Charts". In Excel 97, you can draw a chart from the Pivot Table Data. But changing the chart is a two step process. You have to change the pivot table and then go change/view the chart. With Excel 2000, viewing a pie chart for each employee is as easy as selecting a new PAGE value from a drop down list in the chart itself.
As you can probably see, a download from a Pick system and a Pivot Table in Excel creates a powerful tool for analyzing information. This information is easy to manipulate and does not consume CPU time on the transaction processing system. I have only touched on the surface of what can be done here. But you could implement a prototype a data warehouse in only a few hours of time and the cost of a PC. The creation of the pivot tables can easily be automated in a macro. The macros can be placed in buttons and the users will require very little training to view their data in ways that was impossible just a few years ago. While it certainly does not compare in functionality to a true data warehouse. It certainly should be enough for many small companies or for larger companies just to get started exploring data in new ways.
If you have any questions about this article, feel free to email me at tom@phase1systems.com. I will put this article and sample spreadsheets up on my web site at www.phase1systems.com.