Using Engrafo SAS Analyzer
Log-files created by using the procedure proc scaproc in SAS can serve as metadata files for Engrafo automatically giving you data catalog, data lineage, procedure statistics, input and output files and much more for the SAS programs generating the logs.
In the following you can learn how to create the logs an automize the log creation, add extra metadata, get insight in automizing the load of metadata and related topics.
This video will show the standard output from analyzing the logs
- 1 Create proc scaproc logs - SCP-logs
- 2 How to automize creations of log files
- 3 Adding extra metadata to your logfiles
- 4 Adding an extra metadata file to your logfiles
- 5 Keeping work-lineage between programs/logs
- 6 Performance considerations
- 7 Upload logs to Engrafo
- 8 Logic vs. physical documentation model
- 9 SAS Analyzing options
Create proc scaproc logs - SCP-logs
Logs are created by adding two PROC SCAPROC procedure calls
The logs must have extension .scp
proc scaproc; record "&logbase/CreatePayrollReport.scp" attr opentimes expandmacros; run;
[your SAS code goes here];
proc scaproc; write; run;
A good approach to structure scaproc files, is to create scaproc-files with content that defines a defined flow in the programs. See https://engrafo.atlassian.net/wiki/spaces/EDV/pages/303529995/Using+Engrafo+SAS+Analyzer#Documentation-structure
How to automize creations of log files
If you have many SAS programs it can be a big job to add the proc scaproc code lines for each program
In this section we describe different methods to handle the process in an automated way
Using a PowerShell script to inject code
A PowerShell script can be used to add code to your existing SAS files.
Make sure to have a backup of the files before altering the files automatically. This code will recursively scan through folders an add the proc scaproc code to you SAS programs. It will make sure to name the log as the same as the filename.
You can also use the script for inspirations for adding code for programs, e.g. if you need exclude a run-programs that only holds %include("SAS-programs") that are captured in another way
param(
[Parameter(Mandatory=$true)]
[string]$StartFolder
)
# Validate the starting folder
if (-not (Test-Path -Path $StartFolder -PathType Container)) {
Write-Error "Error: '$StartFolder' is not a valid folder path."
exit
}
# Get all SAS files and iterate through them
Get-ChildItem -Path $StartFolder -Filter "*.sas" -Recurse | ForEach-Object {
$file = $_
$filePath = $file.FullName
$fileName = $file.BaseName # File name without extension
$folderPath = (Split-Path -Path $file.DirectoryName -NoQualifier).Replace($StartFolder, "").TrimStart("\") # Get the path relative to the StartFolder
# Construct the content to add
$recordStatement = "proc scaproc; record `"$($StartFolder)\$($folderPath)\$($fileName).scp`" attr opentimes expandmacros;run;"
$writeStatement = "proc scaproc; write; run;"
# Read the original content
$originalContent = Get-Content -Path $filePath
# Construct the new content
$newContent = "$recordStatement`n$originalContent`n$writeStatement"
# Write the new content to the file, overwriting the original
Set-Content -Path $filePath -Value $newContent -Encoding UTF8
Write-Host "Modified: $filePath"
}
Write-Host "Script finished."
Open PowerShell, navigate to the directory where you saved the script, and execute it by providing the StartFolder parameter.
e.g.
.\ModifySASFiles.ps1 -StartFolder "C:\Your\Path\To\SASFiles"Using INITSTMT SAS batch command
If you are invoking SAS programs using command line (batch) you can add the option:
-INITSTMT= "proc scaproc; record "&logbase/CreatePayrollReport.scp"attr opentimes expandmacros; run;"
-TERMSTMT "proc scaproc; write; run";e.g.
sas -initstmt 'proc scaproc...' -sysin "SAS-program-to-run.sas"(You actually don't need to use the INITSTMT in that the termination of SAS automatically writes to the log-file defined after execution.)
TERMSTMT: Specifies the SAS statements to execute when SAS terminates.
INITSTMT: Specifies a SAS statement to execute after any statements in the
Using a central scheduling strategy
A third way is to let your centralized schedule routine handle the creation of logs.
It is common to have control over you execution of SAS jobs. Sometimes they depend on each other and you therefore need to evaluate return codes and more.
If you have a schedule mechanism it should be easy to add the needed log code.
In this example the programs to run (using crontab) in batch are defined in a table:
CRONTAB | JobName | Log |
10;00;*;*;*; | pgm1.sas | pgm#1.scp |
10;15;*;*;*; | pgm2.sas | pgm#2.scp |
10;30;*;*;*; | pgm3.sas | pgm#3.scp |
For all programs in table run the SAS batch command with program name and the right target for the logfiles
sas -initstmt "proc scaproc..." -sysin "SAS-program-to-run.sas"
(Iterate over the entities and substitute values in command line)
Adding extra metadata to your logfiles
It is possible to inject extra metadata to the log file produced by the procedure scaproc.
In the log it is possible to add the following lines
/* INIT: Current Node Name (CURRENTNODENAME)...........: [SERVERNAME|-NONE-]
/* INIT: PROC UPLOAD Node Name (UPLOADNODENAME)........: [SERVERNAME|-NONE-]
/* INIT: PROC DOWNLOAD Node Name (DOWNLOADNODENAME)....: [SERVERNAME|-NONE-]
CURRENTNODENAME is the server where the SAS program is executed from
UPLOADNODENAME is the server where data is uploaded to
DOWNLOADNODENAME is the server where data is downloaded to
And if you replaces ATTR with ATTRUPLOAD or ATTRDOWNLOAD Engrafo can handle lineage between multiple environments when using rsubmit, proc upload and proc download
Adding an extra metadata file to your logfiles
It is possible to add a metadata fileto your log files. Metadata is read as value pairs.
paramter | value |
owner | Ownername |
The metadata file should have the same name as the log-file with the extension .meta
Keeping work-lineage between programs/logs
When creating a log file, Engrafo will make sure that the work data is only know within the programs environment.
So, if you have a progam called; CreatePayrollReport, the work-data associated with will be named; WORK(CreatePayrollReport).
If you have 3 programs, creating 3 logs.
ExtractPayrollData.scp
ModelPayrollData.scp
CreatePayrollReport.scp
they will NOT share work data.
If you need the SAS programs to share work data and the work data is used to create lineage between the programs, you can name the logs like this:
Payroll_#1ExtractPayrollData.scp
Payroll_#2ModelPayrollData.scp
Payroll_#3CreatePayrollReport.scp
This will create the work data reference
WORK(Payroll)
The workdata reference is then shared for the 3 programs
Performance considerations
Adding the extra log has none remarkably influence on the job-performance
Upload logs to Engrafo
Engrafo will analyze logs that are uploaded to Engrafo. That can be done in two ways:
Manually uploads
Automatically uploads
Upload Manually
Navigate to Load Metadata → Metadata load (API)
Upload one or more .scp-files
Upload Automatically
Place the .scp-files in the application folder wwwroot/uploads_CSVAUTO/;
Engrafo will automatically load the files from that folder every 10 seconds
A way to automize your SAS program documentation is to make wwwroot/uploads_CSVAuto/; to a shared folder and direct the proc scaproc output files to the shared folder.
Documentation structure
If you need Engrafo to create a structure for you documented SAS jobs, you can place the .scp-files in folders and the structure will be reflected in Engrafo
In that way, placing the folder DeployedSASPrograms and subfolders in /wwwroot/uploads_CSVAuto/; will keep the structure for the programs documented in Engrafo
Logic vs. physical documentation model
Engrafo can create 2 models. One model based on the logical libnames and and one model based on the physical paths
SAS Analyzing options
There are several configuration options for the analyzer. Each option is explained by clicking the info-tag.
For example the SAS-Mappings are used to standardize libnames by physical paths or by using a regex-syntax.