Mon, 18 Aug 2008

Excel report with no duplicates on master file.

My tasklist:

  1. Do a query, master-detail. To make it short, let's say the two tables are named MASTER and DETAIL.
    SELECT M.CUSTNO, M.TOTAL, D.ARTICL, D.AMOUNT 
    FROM MASTER M INNER JOIN DETAIL D ON M.CUSTNO = D.CUSTNO
    CUSTNO is a primary key on the MASTER table, and in the DETAIL table, there could be 1 or more rows with the same CUSTNO value. CUSTNO is alphanumeric.
  2. Present the result in an excel file. The Excel file should have 4 columns: CUSTNO, TOTAL, ARTICL, AMOUNT.
  3. For the presentation, the master columns (CUSTNO and TOTAL) should not be repeated if there is more than 1 row for the value of CUSTNO in the table DETAIL. In other words, below are the good and the bad style of presentation, on excel.

Here is the bad style of presentation. The client would like the cells in blue to be empty. You get this with the above SQL query.

CUSTNO     TOTAL ARTICL     AMOUNT
   L35      1000 Papers        700
   L35      1000 Pens          300
   M77       300 Papers        300

Here is how the client wants the data to be presented:

CUSTNO     TOTAL ARTICL     AMOUNT
   L35      1000 Papers        700
                 Pens          300
   M77       300 Papers        300

The reason is that the client wants to be able to sum the values of the column AMOUNT in the excel file. The other benefit is that with empty cells, the change of the column CUSTNO is more visible.

To perform this, I added an extra column to my query, and changed it to the following:

SELECT M.CUSTNO, M.TOTAL, D.ARTICL, D.AMOUNT, RRN(M) AS CUSTNN
FROM MASTER M INNER JOIN DETAIL D ON M.CUSTNO = D.CUSTNO
ORDER BY CUSTNO

Let me first explain what does mean RRN(M) before continuing. RRN stands for Relative Record Number. In a DB2 SQL syntax, RRN takes one argument: the name of one table in your query. Here, the table MASTER used an alias M, so, instead of RRN(MASTER), I have to use RRN(M). RRN(M) gives the relative record number of the row selected in the table MASTER, or, in other words, the physical position of the row in the physical file.

Actually, the result of my query will look like the following.

CUSTNO     TOTAL ARTICL     AMOUNT      CUSTNN
   L35      1000 Papers        700          78
   L35      1000 Pens          300          78
   M77       300 Papers        300          79

I extract this in an excel file, and the rest, I'll manipulate in excel. So, I open this file in excel, I get actually 5 columns, I add 2 other columns, called ORDERING and DIFFRRN. ORDERING will be used to preserve the order of the rows, like I showed in a previous post, its values will be 1, 2, 3, ... and so on. In the first cell of the DIFFRRN column, I put the value 1. For each other row, the value of the DIFFRRN cell will be the difference between the value of the CUSTNN of that row and the value of CUSTNN of the row just above it. In other words, if column CUSTNN is the column E, and the column DIFFRRN is column G, value of cell G(n) = E(n) - E(n-1). To do this on excel, on the 3rd row, you put the formula =E3-E2 as value of the cell G3. After you press Enter, that value is computed. To expand the formula for all rows, just select that cell and drag down the bottom-right corner to the last row of the sheet.

1: Formula Gn=En - En-1 2: Drag down the formula cell to apply to entire column 3: and you get the result

After computing, the value of this column DIFFRRN is 0 if we're on the same MASTER record, because the value of CUSTNN (which was RRN(M)) will be the same. If it is other than 0, then it's a new MASTER record. RRN(M) is then just a trick to get a numeric value with wich I can perform arithmetic substraction. If CUSTNO was numeric and primary key, I wouldn't need the RRN at all. Next step will be to clear the column CUSTNO and TOTAL for each row that has the value 0 for DIFFRRN. We will sort the entire sheet by the value of DIFFRRN in order to get all rows having DIFFRRN=0 side by side. But before that, we will select and copy all values of this column, paste it into a text editor (like Notepad), re-select all this values and copy it to finally paste it back to the excel file. Why? Because the value of DIFFRRN is only mathmatical formulas. If we re-order the sheet, the formulas will be reordered, and the cells they refer to will be replaced in unpredictable ways. When pasting to a neutral text editor, we paste the computed values, not the formulas. And when we recopy this to paste it back to the excel, we put the computed values in each cell, and not the formulas.

After this weird copy/paste stage, we can re-order the sheet and select columns CUSTNO and TOTAL for all rows having value of DIFFRRN equal to 0; we can press Del to empty these cells. Then, we re-order again the sheet, now with the column ORDERING to get back the original ordering of rows. And finally, we can delete the columns CUSTNN, ORDERING and DIFFRRN and voilà.

posté le: 17:20 | path: /data/misc | permalien

Mon, 14 Jul 2008

Keep things simple: Order of an excel file.

One of my co-workers was asked to colorize an excel file. The task is simple but fastidious: for each row that has the value of the column `status' equal to 0, the background should be in red. If the value is 1, it should be with a pink background. Other values should be left untouched. The spreadsheet file contained about 2000 rows, and the order of the rows should not be modified.

I first thought about a macro to perform the task, but if I show her how this tedious task could be done that way, she would come to me everytime she think a task could be achieved by using macro, and she's not geek enough to learn how to write macros. Two more thinkings and I come to a simpler solution.

The first thing to do is to add an additional column to the table. It is called rank and have the value 1, 2, ... and so on incremented by 1 to the number of rows. It's very simple with Microsoft Excel to create a column with values autoincremented by 1: simply by writing 1 on the first row, 2 on the second, selecting these 2 cells and enlarging the selection to the entire column by dragging down the bottom-right corner of the selection. This additional column will help us preserve the original ordering of rows. Next thing is to select the entire sheet, then order it by the column status. All rows with the value 0 or 1 for the column status are now grouped. They can be selected respectively only once by dragging the mouse on the line number column. After having applied the color attributes, I just have to reorder the entire sheet by the column rank I created to get the initial ordering of rows, and to finish, I have just to remove this column and voilà.

The most pleasant thing is that the co-worker understood perfectly all the manipulations I've showed her. And I understood that writing cool and smart macro codes is not always cool and smart; many times, I just need to keep my thoughts simple and stupid.

posté le: 11:43 | path: /data/misc | permalien
Mars 2010
Dim Lun Mar Mer Jeu Ven Sam
 
     

Malagasy miray
About

DotMG's joblog

Work hard at whatever you do. (Ecc. 9. 10a. CEV)
Valid XHTML 1.0 Transitional   Valid CSS!   Powered by blosxom 2.0