2017年7月17日星期一

Reporting notes 01


Working for over 10 years in IT technology, I realized that lots of experiences were lost with my memory when time passed by. So I wrote this note for keep my memory. If anyone is also interested in this matter, please contact me.


Problem: business requires an report in excel format to be automated in web application.


Final web format:


Original data format:



Key points:

  • Data must be summarized by date, week, month, year, and job name dimensions.
  • Fact table contains dozen millions of records for several years;
  • The reporting requires to rotate the data by 90 degrees to display summarized data
  • Multiple data presentations in a single column, witch includes integer number, decimal, and percentage.

Solution:

  1. Get data from data source

    OpenQuery for linked server does not accept variables, therefore we have to use dynamic query statement.
    set @openquery = '
    SELECT  [Date],
            [Day],
            JobName,
            Volume_to_Work,
            Volume_Worked,
            Volume_RPC
    FROM OPENQUERY([S1],''
       SELECT cast(a.Date as Date) [Date],
            datename(dw,date) as [Day],
            a.JobName,
            a.Volume_to_Work,
            a.Volume_Worked,
            isnull(b.Volume_RPC,0) as Volume_RPC
       FROM
       (
            --VOLUME
            select BACKUPDATE date,JobName,sum(VOLUME_SENT) as Volume_to_Work,sum(VOLUME_WORKED) as Volume_Worked
            from TBL_METRICS_RAPPORTVOLUME
            where JobName in ('''''+
            replace(
            case @JOB_NAME_LIST when 'All|' then 'Ucm_Arics_ON_Preview|Ucm_Caprice_QC_Preview|UCM_OB_E_PREVIEW|UCM_OB_F_PREVIEW'
            else @JOB_NAME_LIST end
            ,'|',''''',''''')
            +'''''
            )
            and year(BACKUPDATE) = '+cast(@REPORT_YEAR as varchar)+'
            group by JobName,BACKUPDATE
       ) a
       left join
       (
            --RPC
            select Backupdate,session,count(session) as Volume_RPC
            from TBL_ACCOUNTS_WORKED a
            left join ( select RPC,Result from TBL_REFERENCE_DIALERCODES where site = ''''Internal'''' ) b on a.RELEASECODE= b.RESULT
            where session in ('''''+
            replace(
            case @JOB_NAME_LIST when 'All|' then 'Ucm_Arics_ON_Preview|Ucm_Caprice_QC_Preview|UCM_OB_E_PREVIEW|UCM_OB_F_PREVIEW'
            else @JOB_NAME_LIST end
            ,'|',''''',''''')
            +'''''
            )
            and year(Backupdate) = '+cast(@REPORT_YEAR as varchar)+'
            and b.RPC = 1
            group by Backupdate,session
       ) b on a.Date = b.Backupdate and a.JobName = b.session
       where datename(dw,date) <> ''''Sunday''''''
    )'
      
    create table #campaign ([Date] date,[Day] varchar(20),JobName varchar(255),Volume_to_work int,Volume_worked int,Volume_RPC int)
    insert #campaign exec(@openquery)

    generated openquery look like:
    SELECT  [Date],
            [Day],
            JobName,
            Volume_to_Work,
            Volume_Worked,
            Volume_RPC
    FROM OPENQUERY([BREPP1],'
            SELECT cast(a.Date as Date) [Date],
                   datename(dw,date) as [Day],
                   a.JobName,
                   a.Volume_to_Work,
                   a.Volume_Worked,
                   isnull(b.Volume_RPC,0) as Volume_RPC
            FROM
            (
                    --VOLUME
                    select BACKUPDATE date,JobName,sum(VOLUME_SENT) as Volume_to_Work,sum(VOLUME_WORKED) as Volume_Worked
                    from [Source_Dialer].[dbo].[TBL_METRICS_RAPPORTVOLUME]
                    where JobName in (''All'',''Ucm_Arics_ON_Preview'',''Ucm_Caprice_QC_Preview'',''UCM_OB_E_PREVIEW'',''UCM_OB_F_PREVIEW''
                    )
                    and year(BACKUPDATE) = 2017
                    group by JobName,BACKUPDATE
            ) a
            left join
            (
                    --RPC
                    select Backupdate,session,count(session) as Volume_RPC
                    from Source_Dialer.dbo.TBL_METRICS_ACCOUNTS_WORKED a
                    left join ( select RPC,Result from Source_Dialer.dbo.TBL_METRICS_REFERENCE_DIALERCODES where site = ''Internal'') b on cast(a.RELEASECODE as varchar)= b.RESULT
                    where session in (''All'',''Ucm_Arics_ON_Preview'',''Ucm_Caprice_QC_Preview'',''UCM_OB_E_PREVIEW'',''UCM_OB_F_PREVIEW''
                    )
                    and year(Backupdate) = 2017
                    and b.RPC = 1
                    group by Backupdate,session
            ) b on a.Date = b.Backupdate and a.JobName = b.session
            where datename(dw,date) <> ''Sunday'''
    )

  2. Get summarized data

    select
       row_number() over(order by d.[JobName],d.date) as Id
       ,d.[JobName]
       ,year(d.[Date]) [Year]
       ,right(cast(d.DateISO as varchar),6) + cast(d.DayOfWeek as varchar) [Weekday]
       ,d.[Month]
       ,datepart(wk,d.[Date]) Week
       ,d.[Date]
       ,sum(Volume_to_work) Inventory
       ,sum(Volume_worked) Worked
       ,sum(Volume_RPC) RPC
    into #view_campaign
    from #campaign c
    right join (
       select [JobName],cd.[Date],cd.DateISO,cd.DayOfWeek,cd.month
       from (select distinct [JobName] from #campaign) c
       full outer join dbo.tbl_dim_date cd on 1=1
       where cd.year=@PRM_REPORT_YEAR
    ) d on d.[JobName]=c.[JobName] and c.[Date]=d.[Date]
    group by
         d.[JobName]
         ,year(d.[Date])
         ,d.[Month]
         ,datepart(Wk,d.[Date])
         ,d.[Date]
         ,right(cast(d.DateISO as varchar),6) + cast(d.DayOfWeek as varchar)



  3. Summarized by week, month, year for each jobname, use rollup to get weekly, monthly total columns in final report


  4. Remove duplicated week that cross months

    --Sum the value of the week that cross months (Line38 + Line 44)
    update #tmp set Inventory=t.Inventory, Worked=t.Worked, rpc=t.rpc
    from #tmp v join
    (
       select JobName,Agg
             ,sum([Inventory])[Inventory]
             ,sum([Worked])[Worked]
             ,sum([RPC])[RPC]
       from #tmp
       where Agg like 'w%'
       group by Agg,JobName
    ) t on t.Agg=v.[Agg]
    where v.agg=t.agg and v.JobName=t.JobName


    Delete (Line38) the value of the first week of the duplicated week
    delete from #tmp where agg+cast([Month] as varchar) in
    (select agg+cast(min([Month]) as varchar)
     from #tmp
     where agg like 'w%'
     group by JobName,agg
     having count(agg)>1
    )



  5. Add calculated columns: CompleteRate, ReachRate
    They are the result of division, so dot not use avg to get wrong result
       ALTER TABLE #tmp add CompleteRate decimal(14,4),ReachRate decimal(14,4)
       update #tmp set
    CompleteRate = case [Inventory] when 0 then 0 else [Worked]*1.0/[Inventory] end,
       ReachRate = case [Inventory] when 0 then 0 else [RPC]*1.0/[Inventory] end
       select * from #tmp



  6. Generate dynamic columns definition and header columns

    --@colDesc is for table header, # followed by datatype is for display formatting
    select @col += ',' + quotename(agg)+' varchar(15)'
    from #tmp
    where [year]=@REPORT_YEAR and JobName='All'

    select @colDesc += ',[' + dbo.[ufn_GetCampaignHeader](agg)+'#string] varchar(15)'
    from #tmp
    where [year]=@REPORT_YEAR and JobName='All'

    select @col = stuff(@col,1,1,''), @colDesc = stuff(@colDesc,1,1,'')
    The value of the variable @col =

    [YTD<br>2017] varchar (15),
    [Month1]     varchar (15),
    [2017-01-01] varchar (15),
    [2017-01-02] varchar (15),
    [2017-01-03] varchar (15),
    [2017-01-04] varchar (15),
    [2017-01-05] varchar (15),
    [2017-01-06] varchar (15),
    [2017-01-07] varchar (15),
    [Wk1]       varchar (15),
    [2017-01-08] varchar (15),
    [2017-01-09] varchar (15),
    ......
    [2017-12-28] varchar (15),
    [2017-12-29] varchar (15),
    [2017-12-30] varchar (15),
    [Wk52]       varchar (15),
    [2017-12-31] varchar (15),
    [Wk53]       varchar (15)

    The value of the variable 
    @colDesc =

    [YTD<br>2017#string] varchar(15),
    [Jan#string] varchar(15),
    [Sun<br>170101#string] varchar(15),
    [Mon<br>170102#string] varchar(15),
    [Tue<br>170103#string] varchar(15),
    [Wed<br>170104#string] varchar(15),
    [Thu<br>170105#string] varchar(15),
    [Fri<br>170106#string] varchar(15),
    ......
    [Fri<br>171229#string] varchar(15),
    [Sat<br>171230#string] varchar(15),
    [Wk52#string]          varchar(15),
    [Sun<br>171231#string] varchar(15),
    [Wk53#string]          varchar(15)


  7. Create temp summary table to store pivoted final data
    --table tmp_summary, add spid value in the global temp table name to isolate concurrent users
    For using dynamic sql query we have to use global temp table, or we are unable to access the temp table.
    Seq (sequence) column is for displaying data in order for each job name
       declare @tmptable varchar(1000)
       select @tmptable = '##tmp_summary_' + cast(@@SPID as varchar)
       exec('if object_id(''tempdb..' + @tmptable + ''') is not null drop table ' + @tmptable)
       exec('create table ' + @tmptable + ' ([!Year] int,[JobName#string] varchar(max),[!Seq] int,[Campaign#string] varchar(max),'+ @colDesc + ')')

    --result:
    create table tmp_summary_51 ([Year] int,JobId int,Seq int,JobName varchar(max),[Year2017 Total] varchar(20),[Month1 Total] varchar(20),[2017-01-01] varchar(20),[2017-01-02] varchar(20),[2017-01-03] varchar(20),[2017-01-04] varchar(20),[2017-01-05] varchar(20),[2017-01-06] varchar(20),[2017-01-07] varchar(20),[Week1 Total] varchar(20),[2017-01-08] varchar(20),[2017-01-09]
    ......
    varchar(20),[2017-06-23] varchar(20),[2017-06-24] varchar(20),[Week25 Total] varchar(20),[2017-06-25] varchar(20),[2017-06-26] varchar(20),[2017-06-27] varchar(20),[2017-06-28] varchar(20),[2017-06-29] varchar(20),[2017-06-30] varchar(20),[Week26 Total] varchar(20))

  8. Turning table in 90 degree with pivot

    declare @job varchar(255)
    declare csr cursor FAST_FORWARD for select Name from @tbl_job

    open csr
    fetch next from csr into @job
    while @@fetch_status = 0
    begin
       --get column list
       if @columns is null
              select @columns = replace (@col,' varchar (15)','')
      
       --rotate
       --inventory
       set @sql ='insert into ' + @tmptable + '
       select Year, JobName, 1 seq, Campaign,' + @columns +'
       from (
              select JobName,JobName+'' Inventory'' Campaign,[year],[agg],Inventory
              from #tmp
              where JobName = ''' + @job + '''
              ) x
       pivot (
              sum(Inventory) for [agg] in ('+@columns+')
       ) as pvt
       --where [year] is not null
       order by [year], JobName, seq
       '
       exec(@sql)   
             
      
       --worked
       set @sql ='insert into ' + @tmptable + '
       select Year, JobName, 2 seq, Campaign,' + @columns +'
       from (
              select JobName, JobName + '' Worked'' Campaign, [year], [agg], Worked
              from #tmp
              where JobName = ''' + @job + '''
              ) x
       pivot (
              sum(worked) for [agg] in ('+@columns+')
       ) as pvt'
       exec(@sql)

      
       --% of completion
       set @sql ='insert into ' + @tmptable + '
       select Year, JobName,3 seq, Campaign,' + @columns +'
       from (
              select JobName,''% of Completion'' Campaign, [year], [agg], CompleteRate
              from #tmp
              where JobName = ''' + @job + '''
              ) x
       pivot (
              avg(CompleteRate) for [agg] in ('+@columns+')
       ) as pvt'
       exec(@sql)


       --RPC
       set @sql ='insert into ' + @tmptable + '
       select Year, JobName, 4 seq, Campaign,' + @columns +'
       from (
              select JobName,''RPC'' Campaign, [year], [agg], RPC
              from #tmp
              where JobName = ''' + @job + '''
              ) x
       pivot (
              sum(RPC) for [agg] in ('+@columns+')
       ) as pvt'
       exec(@sql)
      
       --ReachRate
       set @sql ='insert into ' + @tmptable + '
       select Year,JobName,5 seq, Campaign,' + @columns +'
       from (
              select JobName,''Reach Rate'' Campaign, [year], [agg], ReachRate
              from #tmp
              where JobName = ''' + @job + '''
              ) x
       pivot (
              avg(ReachRate) for [agg] in ('+@columns+')
       ) as pvt'
       exec(@sql)
      
    fetch next from csr into @job
    end
    close csr
    deallocate csr

    pivoted data look like this:


    dynamic pivot statement:


  9. Rendered on web page



    Report filter:



  10. 5 temp tables are used

    #campaign              (Date, Day, JobName, Volume_to_work, Volume_worked, Volume_RPC) –stores original data
    #view_campaign  (JobName, Year, Weekday, Month, Week, Date, Inventory, Worked, RPC)  –summarized data
    #tmp                       (JobName, Year, Month, Week, Agg, Inventory, Worked, RPC)                --target aggregation data
    ##tmp_summary_spid(Year, JobName, Seq, Campaign, YTD, Jan, Sun, Mon, Tue, … )      --final output data

  11. Further action:

    Stores pivoted data in a table with ETL process to improve the performance.

Note:
  • the data used in this note is test date, not real.