Working for over 10 years in IT technology, I realized that lots of experiences were lost with my memory when time passed by. So I wrote this note for keep my memory. If anyone is also interested in this matter, please contact me.
Problem: business requires an report in excel format to be
automated in web application.
Final web format:
Original data format:
Key points:
- Data must be summarized by date, week, month, year, and job name dimensions.
- Fact table contains dozen millions of records for several years;
- The reporting requires to rotate the data by 90 degrees to display summarized data
- Multiple data presentations in a single column, witch includes integer number, decimal, and percentage.
Solution:
- Get data from data source
OpenQuery for linked server does not accept variables, therefore we have to use dynamic query statement.set @openquery = 'SELECT [Date],[Day],JobName,Volume_to_Work,Volume_Worked,Volume_RPCFROM OPENQUERY([S1],''SELECT cast(a.Date as Date) [Date],datename(dw,date) as [Day],a.JobName,a.Volume_to_Work,a.Volume_Worked,isnull(b.Volume_RPC,0) as Volume_RPCFROM(--VOLUMEselect BACKUPDATE date,JobName,sum(VOLUME_SENT) as Volume_to_Work,sum(VOLUME_WORKED) as Volume_Workedfrom TBL_METRICS_RAPPORTVOLUMEwhere JobName in ('''''+replace(case @JOB_NAME_LIST when 'All|' then 'Ucm_Arics_ON_Preview|Ucm_Caprice_QC_Preview|UCM_OB_E_PREVIEW|UCM_OB_F_PREVIEW'else @JOB_NAME_LIST end,'|',''''',''''')+''''')and year(BACKUPDATE) = '+cast(@REPORT_YEAR as varchar)+'group by JobName,BACKUPDATE) aleft join(--RPCselect Backupdate,session,count(session) as Volume_RPCfrom TBL_ACCOUNTS_WORKED aleft join ( select RPC,Result from TBL_REFERENCE_DIALERCODES where site = ''''Internal'''' ) b on a.RELEASECODE= b.RESULTwhere session in ('''''+replace(case @JOB_NAME_LIST when 'All|' then 'Ucm_Arics_ON_Preview|Ucm_Caprice_QC_Preview|UCM_OB_E_PREVIEW|UCM_OB_F_PREVIEW'else @JOB_NAME_LIST end,'|',''''',''''')+''''')and year(Backupdate) = '+cast(@REPORT_YEAR as varchar)+'and b.RPC = 1group by Backupdate,session) b on a.Date = b.Backupdate and a.JobName = b.sessionwhere datename(dw,date) <> ''''Sunday'''''')'create table #campaign ([Date] date,[Day] varchar(20),JobName varchar(255),Volume_to_work int,Volume_worked int,Volume_RPC int)insert #campaign exec(@openquery)generated openquery look like:SELECT [Date],[Day],JobName,Volume_to_Work,Volume_Worked,Volume_RPCFROM OPENQUERY([BREPP1],'SELECT cast(a.Date as Date) [Date],datename(dw,date) as [Day],a.JobName,a.Volume_to_Work,a.Volume_Worked,isnull(b.Volume_RPC,0) as Volume_RPCFROM(--VOLUMEselect BACKUPDATE date,JobName,sum(VOLUME_SENT) as Volume_to_Work,sum(VOLUME_WORKED) as Volume_Workedfrom [Source_Dialer].[dbo].[TBL_METRICS_RAPPORTVOLUME]where JobName in (''All'',''Ucm_Arics_ON_Preview'',''Ucm_Caprice_QC_Preview'',''UCM_OB_E_PREVIEW'',''UCM_OB_F_PREVIEW'')and year(BACKUPDATE) = 2017group by JobName,BACKUPDATE) aleft join(--RPCselect Backupdate,session,count(session) as Volume_RPCfrom Source_Dialer.dbo.TBL_METRICS_ACCOUNTS_WORKED aleft join ( select RPC,Result from Source_Dialer.dbo.TBL_METRICS_REFERENCE_DIALERCODES where site = ''Internal'') b on cast(a.RELEASECODE as varchar)= b.RESULTwhere session in (''All'',''Ucm_Arics_ON_Preview'',''Ucm_Caprice_QC_Preview'',''UCM_OB_E_PREVIEW'',''UCM_OB_F_PREVIEW'')and year(Backupdate) = 2017and b.RPC = 1group by Backupdate,session) b on a.Date = b.Backupdate and a.JobName = b.sessionwhere datename(dw,date) <> ''Sunday''') - Get summarized data
selectrow_number() over(order by d.[JobName],d.date) as Id,d.[JobName],year(d.[Date]) [Year],right(cast(d.DateISO as varchar),6) + cast(d.DayOfWeek as varchar) [Weekday],d.[Month],datepart(wk,d.[Date]) Week,d.[Date],sum(Volume_to_work) Inventory,sum(Volume_worked) Worked,sum(Volume_RPC) RPCinto #view_campaignfrom #campaign cright join (select [JobName],cd.[Date],cd.DateISO,cd.DayOfWeek,cd.monthfrom (select distinct [JobName] from #campaign) cfull outer join dbo.tbl_dim_date cd on 1=1where cd.year=@PRM_REPORT_YEAR) d on d.[JobName]=c.[JobName] and c.[Date]=d.[Date]group byd.[JobName],year(d.[Date]),d.[Month],datepart(Wk,d.[Date]),d.[Date] - Summarized by week, month, year for each jobname, use rollup to get weekly, monthly total columns in final report
- Remove duplicated week that cross months--Sum the value of the week that cross months (Line38 + Line 44)update #tmp set Inventory=t.Inventory, Worked=t.Worked, rpc=t.rpcfrom #tmp v join(select JobName,Agg,sum([Inventory])[Inventory],sum([Worked])[Worked],sum([RPC])[RPC]from #tmpwhere Agg like 'w%'group by Agg,JobName) t on t.Agg=v.[Agg]where v.agg=t.agg and v.JobName=t.JobNameDelete (Line38) the value of the first week of the duplicated weekdelete from #tmp where agg+cast([Month] as varchar) in(select agg+cast(min([Month]) as varchar)from #tmpwhere agg like 'w%'group by JobName,agghaving count(agg)>1
- Add calculated columns: CompleteRate, ReachRateThey are the result of division, so dot not use avg to get wrong resultALTER TABLE #tmp add CompleteRate decimal(14,4),ReachRate decimal(14,4)update #tmp setCompleteRate = case [Inventory] when 0 then 0 else [Worked]*1.0/[Inventory] end,ReachRate = case [Inventory] when 0 then 0 else [RPC]*1.0/[Inventory] end
- Generate dynamic columns definition and header columns--@colDesc is for table header, # followed by datatype is for display formattingselect @col += ',' + quotename(agg)+' varchar(15)'from #tmpwhere [year]=@REPORT_YEAR and JobName='All'select @colDesc += ',[' + dbo.[ufn_GetCampaignHeader](agg)+'#string] varchar(15)'from #tmpwhere [year]=@REPORT_YEAR and JobName='All'select @col = stuff(@col,1,1,''), @colDesc = stuff(@colDesc,1,1,'')The value of the variable @col =
[YTD<br>2017] varchar (15),[Month1] varchar (15),[2017-01-01] varchar (15),[2017-01-02] varchar (15),[2017-01-03] varchar (15),[2017-01-04] varchar (15),[2017-01-05] varchar (15),[2017-01-06] varchar (15),[2017-01-07] varchar (15),[Wk1] varchar (15),[2017-01-08] varchar (15),[2017-01-09] varchar (15),
......[2017-12-28] varchar (15),[2017-12-29] varchar (15),[2017-12-30] varchar (15),[Wk52] varchar (15),[2017-12-31] varchar (15),[Wk53] varchar (15)
The value of the variable @colDesc =
[YTD<br>2017#string] varchar(15),[Jan#string] varchar(15),[Sun<br>170101#string] varchar(15),[Mon<br>170102#string] varchar(15),[Tue<br>170103#string] varchar(15),[Wed<br>170104#string] varchar(15),[Thu<br>170105#string] varchar(15),[Fri<br>170106#string] varchar(15),......[Fri<br>171229#string] varchar(15),[Sat<br>171230#string] varchar(15),[Wk52#string] varchar(15),[Sun<br>171231#string] varchar(15),[Wk53#string] varchar(15) - Create temp summary table to store pivoted final data--table tmp_summary, add spid value in the global temp table name to isolate concurrent usersFor using dynamic sql query we have to use global temp table, or we are unable to access the temp table.Seq (sequence) column is for displaying data in order for each job namedeclare @tmptable varchar(1000)select @tmptable = '##tmp_summary_' + cast(@@SPID as varchar)exec('if object_id(''tempdb..' + @tmptable + ''') is not null drop table ' + @tmptable)exec('create table ' + @tmptable + ' ([!Year] int,[JobName#string] varchar(max),[!Seq] int,[Campaign#string] varchar(max),'+ @colDesc + ')')--result:create table tmp_summary_51 ([Year] int,JobId int,Seq int,JobName varchar(max),[Year2017 Total] varchar(20),[Month1 Total] varchar(20),[2017-01-01] varchar(20),[2017-01-02] varchar(20),[2017-01-03] varchar(20),[2017-01-04] varchar(20),[2017-01-05] varchar(20),[2017-01-06] varchar(20),[2017-01-07] varchar(20),[Week1 Total] varchar(20),[2017-01-08] varchar(20),[2017-01-09]
......varchar(20),[2017-06-23] varchar(20),[2017-06-24] varchar(20),[Week25 Total] varchar(20),[2017-06-25] varchar(20),[2017-06-26] varchar(20),[2017-06-27] varchar(20),[2017-06-28] varchar(20),[2017-06-29] varchar(20),[2017-06-30] varchar(20),[Week26 Total] varchar(20)) - Turning table in 90 degree with pivotdeclare @job varchar(255)declare csr cursor FAST_FORWARD for select Name from @tbl_jobopen csrfetch next from csr into @jobwhile @@fetch_status = 0begin--get column listif @columns is nullselect @columns = replace (@col,' varchar (15)','')--rotate--inventoryset @sql ='insert into ' + @tmptable + 'select Year, JobName, 1 seq, Campaign,' + @columns +'from (select JobName,JobName+'' Inventory'' Campaign,[year],[agg],Inventoryfrom #tmpwhere JobName = ''' + @job + ''') xpivot (sum(Inventory) for [agg] in ('+@columns+')) as pvt--where [year] is not nullorder by [year], JobName, seq'exec(@sql)--workedset @sql ='insert into ' + @tmptable + 'select Year, JobName, 2 seq, Campaign,' + @columns +'from (select JobName, JobName + '' Worked'' Campaign, [year], [agg], Workedfrom #tmpwhere JobName = ''' + @job + ''') xpivot (sum(worked) for [agg] in ('+@columns+')) as pvt'exec(@sql)--% of completionset @sql ='insert into ' + @tmptable + 'select Year, JobName,3 seq, Campaign,' + @columns +'from (select JobName,''% of Completion'' Campaign, [year], [agg], CompleteRatefrom #tmpwhere JobName = ''' + @job + ''') xpivot (avg(CompleteRate) for [agg] in ('+@columns+')) as pvt'exec(@sql)--RPCset @sql ='insert into ' + @tmptable + 'select Year, JobName, 4 seq, Campaign,' + @columns +'from (select JobName,''RPC'' Campaign, [year], [agg], RPCfrom #tmpwhere JobName = ''' + @job + ''') xpivot (sum(RPC) for [agg] in ('+@columns+')) as pvt'exec(@sql)--ReachRateset @sql ='insert into ' + @tmptable + 'select Year,JobName,5 seq, Campaign,' + @columns +'from (select JobName,''Reach Rate'' Campaign, [year], [agg], ReachRatefrom #tmpwhere JobName = ''' + @job + ''') xpivot (avg(ReachRate) for [agg] in ('+@columns+')) as pvt'exec(@sql)fetch next from csr into @jobendclose csr
dynamic pivot statement: - Rendered on web page
Report filter: - 5 temp tables are used
#campaign (Date, Day, JobName, Volume_to_work, Volume_worked, Volume_RPC) –stores original data
#view_campaign (JobName, Year, Weekday, Month, Week, Date, Inventory, Worked, RPC) –summarized data
#tmp (JobName, Year, Month, Week, Agg, Inventory, Worked, RPC) --target aggregation data
##tmp_summary_spid(Year, JobName, Seq, Campaign, YTD, Jan, Sun, Mon, Tue, … ) --final output data - Further action:
Stores pivoted data in a table with ETL process to improve the performance.
Note:
- the data used in this note is test date, not real.
没有评论:
发表评论