Poor Richard's Data Warehouse

Tom Packert

Senior Financial Systems Analyst

Avanti/Case-Hoyt

 

I have never seen actual demographics on the types of companies that run multi-value based systems, but I don't think I am wrong in believing that the Pick market has many customers that are very careful with their money and don't have large IT resources.  I don't think there are too many people reading this newsletter that are able to expend the resources it takes to develop a full fledged data warehouse.  Instead, I hope to introduce an alternative way to build a bare bones information manipulation system that can answer many of the questions that a large scale data warehouse might answer for larger customers.

Actually, the term 'data warehouse' is not 100% accurate. The term describing what is known as a data warehouse should include the emphasis on the 'fulfillment' side of the warehouse.  Stored items in a warehouse are not very useful unless we can get to them quickly and easily.  Data locked away in a warehouse is useless without the corresponding delivery infrastructure.  Most data warehouse include the delivery mechanisms.

The main benefit of a data warehouse lies in the ability to store many forms of data in a central location that facilitates viewing the data in ways that we have not thought of yet or which we should not attempt by bending a transaction processing system.  This article will describe how to implement some data warehouse functions in Microsoft Excel so that multi-value customers can see that augmenting their core systems with Excel can give them some of the same benefits of a data warehouse.

Starting with Excel 97, the maximum number of rows in a worksheet is 65,536.  This is certainly not on the scale of a true data warehouse, but it is within the limits of many multi-value systems.  A modern PC with 64 or more megs of ram and processors of 500, 600, 700 megahertz will not be challenged much by a spreadsheet with only 65,536 rows.

Downloading transaction data from a multi-value system is the first step in analyzing the data.  There are a lot of ways to do this and it is dependent on the particular version of Pick you are using.  Wintegrate offers some good functionality to process and download multi-value data directly into Excel format.   You can also program the download functions directly into Wintegrate in basic and transfer the files to the user's PC.  In example 1, there is a program that implements a download that is controlled by the Pick host computer.  The only requirement is that the user, be running Wintegrate when the host program is executed.  The upside of using Wintegrate is that the data can be sent directly into Excel format on the PC and the transfer is fairly simple to implement.  The downside to using Wintegrate is the rather costly per user license fees and the downloads are slower than other options. 

Example 1

PROCREAD BUFFER ELSE BUFFER = ""

*

CONO=BUFFER<1>

CONAME=BUFFER<2>

START.DATE=BUFFER<3>

END.DATE=BUFFER<4>

DIV=BUFFER<5>

DEPT=BUFFER<6>

CCTR=BUFFER<7>

*

PC.FILE="C:\DATA\DOWNLOAD\JOBTIME.XLS"

FILE = "JOB.TIME"

*

ITEMS = \SSELECT JOB.TIME WITH CONO = "\:CONO:\"\

IF START.DATE = END.DATE THEN

  ITEMS := \ AND WITH DATE = "\:OCONV(START.DATE,'D2/'):\"\

END ELSE

  ITEMS := \ AND WITH DATE >= "\:OCONV(START.DATE,'D2/'):\"\

  ITEMS := \ AND WITH DATE <= "\:OCONV(END.DATE,'D2/'):\"\

END

IF DIV # "ALL" THEN

  ITEMS := \ AND WITH DIV = "\:DIV:\"\

END

IF DEPT # "ALL" THEN

  ITEMS := \ AND WITH DEPT = "\:DEPT:\"\

END

IF CCTR # "ALL" THEN

  ITEMS := \ AND WITH CCTR = "\:CCTR:\"\

END

*

FIELDS = \JOB.NO CUST.NAME ENT.DATE SHIFT HRS DCOST CALC.RATE OPER\

FIELDS := \ OPER.DESC DEPT CCTR PROD_QTY\

FIELDS := \ EMP.NO EMP.NAME\

*

OPTS = "XLS"

OPTS<2>='yes' ;* overwrite

OPTS<3> = "reformat" ;* data mode

OPTS<4> = "true" ;* suppress id

OPTS<5>='true' ;* include field ids

OPTS<7> = "\255,\r\n,\254,\r\n" ;* translation

STATUS = ""

CALL WIN.IMPORT(PC.FILE,FILE,ITEMS,FIELDS,OPTS,STATUS)

 

Other options for moving the data to the PC are more dependent on what database and operating system you are running.  We happen to use Unidata,  so sending a Unidata file directly to a comma separated format and sending it to the PC is pretty easy to automate.  Unidata supports a LIST option TO a Unix file name. In example 2, there is a program we use to move multi-value data out of Unidata and into a PC. 

There may be a couple of minor problems you might run into on your system.  The FTP and ZMODEM commands in the example below will probably need to be setup differently on your system.  They are readily implemented, but they will need to be setup by your System administrator.  We use a Unix script that allows us to move Unix files over to the NT file servers.  ZMODEM is widely available for free, but needs to be installed on your system by the system administrator.

Also, Unidata does not process the output conversions when the data is written to the file.  Any dictionary items which have MR2 or D2/ conversions need to converted to an "I" with OCONV(SOME.DATE,'D2/').  In attribute two. This is done so that the data that ends up in the file is the externally formatted data, not the internal format.  I usually create a new dictionary item with an "EXT." prefix for any dictionary item that has an M or D conversion.  I also create EXT. prefixed versions of anything that may contain a comma in the data.  If the attribute "DESC" may have a comma then attribute two of the EXT.DESC, "I" type dictionary item is CONVERT(",","",DESC)

 

Example 2

PROCREAD BUFFER ELSE BUFFER=''

CONO=BUFFER<1>

*

FILE.NAME = \JTIME\:CONO:\.TXT\

*

LIST.STMT = ""

LIST.STMT := \LIST JOB.TIME \

LIST.STMT := \JOB.NO \

LIST.STMT := \EXT.CUST.NAME \

LIST.STMT := \EXT.DATE \

LIST.STMT := \SHIFT \

LIST.STMT := \HRS \

LIST.STMT := \DCOST \

LIST.STMT := \CALC.RATE \

LIST.STMT := \OPER \

LIST.STMT := \OPER.DESC \

LIST.STMT := \DEPT \

LIST.STMT := \CCTR \

LIST.STMT := \PROD_QTY \

LIST.STMT := \EMP.NO \

LIST.STMT := \EXT.EMP.NAME \

LIST.STMT := \ID-SUPP \

LIST.STMT := \TO DELIM "," \

LIST.STMT := \/data/prod/AVANTI-FILES/_HOLD_/JTIME\:CONO:\.TXT\

*

DATA "Y"

UDTEXECUTE LIST.STMT

*

DOWNLOAD.TYPE='ZMODEM'

BEGIN CASE

  CASE DOWNLOAD.TYPE='FTP'

    STMT = \rsh localhost -l cron ./send_aplus.sh \

    STMT := \/data/prod/AVANTI-FILES/_HOLD_/\:FILE.NAME:\ \

    STMT := \/prod/\:@LOGNAME:\/\:FILE.NAME

    PCPERFORM STMT

  CASE DOWNLOAD.TYPE='ZMODEM'

    STMT = \sz \

    STMT := \/data/prod/AVANTI-FILES/_HOLD_/\:FILE.NAME:\ \

    PCPERFORM STMT

END CASE

 

Assuming we have found a way to automate the movement of the data from your multi-value system to the PC environment, the next step is to setup Excel to allow easy analysis.  The key to the analysis capability is the Excel feature called a Pivot Table.  Pivot Tables are an amazing tool.  When Bill Gates talks about 'information at your finger tips', pivot tables are one of the things he is talking about.  The term 'pivot table' is probably poorly named because it does not create a natural motivation to figure out what it does.  Pivot Tables can be found on the "DATA" menu in Excel, but very few people, venture to the DATA menu and the pivot table command is easy to skip over and not realize what you missed.

In order to continue, I will make up some sample data as show in example 3.  I have created a couple hundred rows of random data. The first step in creating a pivot table is to select all of the data that you will want the pivot table to see.  In this example, it will be cells A1 through E200. 

Example 3

Employee

Date

Operation

CostCenter

Hours

Donald Trump

11/01/99

Make Ready

300

3.00

Bill Bradley

11/10/99

Run

400

6.00

Al Gore

11/14/99

Make Ready

500

7.00

JW Bush

11/02/99

Downtime

600

6.00

Donald Trump

11/11/99

Make Ready

300

0.00

Text Box: Figure 1

 


After selecting all the columns and rows, then click on "DATA" in the menu bar, then "Pivot Table Report" and then you will see the first dialog box of the Pivot Table Wizard (Figure 1).  As long as you are only using Excel as we are in this example, you only need to click on the Excel (default) button and then click next.

The next step is to select the range of cells.  Since we highlighted the cells already, the default is set for us as A1: E200 .  The dollar signs ("$") are added my Excel  to indicate that the cell reference is an absolute references and that it will not change if dragged and dropped anywhere. If the range is correct, then just click on the NEXT button.

The next dialog box is where the power of the pivot table begins to show. It actually starts getting fun now.  By simply dragging and dropping the buttons on the right, we will be constructing the pivot table.  Notice that the column headings from the spreadsheet appear on the right; Employee, Date, Operation, Cost Center Hours.  Excel read the data that we had highlighted and found the headings.  Now, we can drag any field from the right into the PAGE, ROW, COLUMN or DATA sections of the pivot table wizard.  The Page section will control the data that is displayed, I will drad the Employee to this section.  This will tell Excel to create a drop down box for the Employee which will allow me to look at any one Employee or all of them.  Then I drag the Operation to the Row section, then Date to the column section and the hours to the data section. 

Sometimes, when you drag a field into the data section, it will default to a COUNT function.  You can change this to a SUM function by double clicking on the button in the data section and selecting the SUM function.  When you double click on the Count or SUM button, you will notice there are more options, the NUMBER button will allow you to  format the numbers in the pivot table as you would any other cell in Excel.

 

WORKSHEET, at least getting started.  This will create a new worksheet within the existing workbook and insert the pivot table in the worksheet.  The pivot table will look like the following.  The Operations will be down the left in the rows, the dates will be the column headers and the hours will be summed in the middle.  The Employee field that was dragged to the page section will allow the user to quickly filter the information to display any one user.  The sections are all optional, you can create a pivot table with only a field dragged to the Row Section and a field in the Data Section.

 

 

 

 

 

 

 

There is now a pivot table that is capable of quickly displaying the information for any employee and what hours they worked on what day..  Clicking on the arrow in cell B2 will display a drop down dialog.  This is what dragging a field into the Page section does, it creates a cell with a selection list, selecting one of the values in the list will instantly filter the pivot table and display the results.  The result is a powerful tool to instantly view the summary information on any employee.  Not everyone will know how to create a pivot table, but once created, pivot tables are tools people get used to using quickly.

 

 

 

 

 

To look and the data in the different way, you can go back to the pivot table wizard and drag the fields into different sections.  For example, you can create a meaningful report by dragging the Operation to the page section and the Employee to the row section.  Or you can drag both Employee and Operation to the Page section.  If you highlight the original data again, then you can repeat the whole process and create another pivot table with different fields in different sections in another worksheet.

 

Using the Pivot Tables in Formulas: The next useful thing that can be done with pivot tables is to lookup the information in the Pivot Table in functions.  This allows you to create pivot tables that summarize numbers and then utilize the summarized number in other formulas.  Often this is easier that having to develop a formula to perform conditional sum.  The right most columns and the bottom row contain the totals for the columns and row.

The total of the DOWNTIME hours that appears in the pivot table is 238.  The following function will return the value 238 from the pivot table, but this formula can be placed in any cell in the worksheet. =GETPIVOTDATA(A1:AF8,"Downtime "). This formula tells Excel to look in the Pivot Table in the range A1: AF8 and find a total for "Downtime".  It returns 238.  A slightly more advanced version will return the total for Downtime for 11/30/99. =GETPIVOTDATA(A1:AF8,"Downtime 11/30/99 ").  The GetPivotData is an extremely powerful function for populating cells with aggregate values.  The GetPivotData functions only requires two parameters, where the pivot table can be found and what to look for.  The value to look for is specified as text.  The power of the GetPivotData can be further enhanced by removing the hard coded text and make the look for value based on the value of another cell. The formula, =GETPIVOTDATA(A1:AF8,CONCATENATE(B14," ",C14," ")) will return the value 7.00 also.  Placing the text "Downtime" in Cell B14 and "11/30/99" in C14 will return the same result as if we had hard coded the text to look for in the formula.  By referencing the values to use to lookup the information in the pivot table, it is possible to layout additional worksheets that are very flexible. 

With Excel 2000, Microsoft has introduced "Pivot Charts". In Excel 97, you can draw a chart from the Pivot Table Data.  But changing the chart is a two step process.  You have to change the pivot table and then go change/view the chart.  With Excel 2000, viewing a pie chart for each employee is as easy as selecting a new PAGE value from a drop down list in the chart itself.

As you can probably see, a download from a Pick system and a Pivot Table in Excel creates a powerful tool for analyzing information.  This information is easy to manipulate and does not consume CPU time on the transaction processing system.  I have only touched on the surface of what can be done here.  But you could implement a prototype a data warehouse in only a few hours of time and the cost of a PC.  The creation of the pivot tables can easily be automated in a macro.  The macros can be placed in buttons  and the users will require very little training to view their data in ways that was impossible just a few years ago.  While it certainly does not compare in functionality to a true data warehouse.  It certainly should be enough for many small companies or  for larger companies just to get started exploring data in new ways. 

If you have any questions about this article, feel free to email me at tom@phase1systems.com. I will put this article and sample spreadsheets up on my web site at www.phase1systems.com.