Using Engrafo SAS Analyzer

Using Engrafo SAS Analyzer

Log-files created by using the procedure proc scaproc in SAS can serve as metadata files for Engrafo automatically giving you data catalog, data lineage, procedure statistics, input and output files and much more for the SAS programs generating the logs.
In the following you can learn how to create the logs an automize the log creation, add extra metadata, get insight in automizing the load of metadata and related topics.

This video will show the standard output from analyzing the logs

Create proc scaproc logs - SCP-logs

Logs are created by adding two PROC SCAPROC procedure calls

The logs must have extension .scp

proc scaproc; record "&logbase/CreatePayrollReport.scp" attr opentimes expandmacros; run; [your SAS code goes here]; proc scaproc; write; run;

A good approach to structure scaproc files, is to create scaproc-files with content that defines a defined flow in the programs. See https://engrafo.atlassian.net/wiki/spaces/EDV/pages/303529995/Using+Engrafo+SAS+Analyzer#Documentation-structure

How to automize creations of log files

If you have many SAS programs it can be a big job to add the proc scaproc code lines for each program
In this section we describe different methods to handle the process in an automated way

Using a PowerShell script to inject code

A PowerShell script can be used to add code to your existing SAS files.
Make sure to have a backup of the files before altering the files automatically. This code will recursively scan through folders an add the proc scaproc code to you SAS programs. It will make sure to name the log as the same as the filename.
You can also use the script for inspirations for adding code for programs, e.g. if you need exclude a run-programs that only holds %include("SAS-programs") that are captured in another way

param( [Parameter(Mandatory=$true)] [string]$StartFolder ) # Validate the starting folder if (-not (Test-Path -Path $StartFolder -PathType Container)) { Write-Error "Error: '$StartFolder' is not a valid folder path." exit } # Get all SAS files and iterate through them Get-ChildItem -Path $StartFolder -Filter "*.sas" -Recurse | ForEach-Object { $file = $_ $filePath = $file.FullName $fileName = $file.BaseName # File name without extension $folderPath = (Split-Path -Path $file.DirectoryName -NoQualifier).Replace($StartFolder, "").TrimStart("\") # Get the path relative to the StartFolder # Construct the content to add $recordStatement = "proc scaproc; record `"$($StartFolder)\$($folderPath)\$($fileName).scp`" attr opentimes expandmacros;run;" $writeStatement = "proc scaproc; write; run;" # Read the original content $originalContent = Get-Content -Path $filePath # Construct the new content $newContent = "$recordStatement`n$originalContent`n$writeStatement" # Write the new content to the file, overwriting the original Set-Content -Path $filePath -Value $newContent -Encoding UTF8 Write-Host "Modified: $filePath" } Write-Host "Script finished."


Open PowerShell, navigate to the directory where you saved the script, and execute it by providing the StartFolder parameter.

e.g.

.\ModifySASFiles.ps1 -StartFolder "C:\Your\Path\To\SASFiles"

Using INITSTMT SAS batch command

If you are invoking SAS programs using command line (batch) you can add the option:

-INITSTMT= "proc scaproc; record "&logbase/CreatePayrollReport.scp"attr opentimes expandmacros; run;" -TERMSTMT "proc scaproc; write; run";

e.g.

sas -initstmt 'proc scaproc...' -sysin "SAS-program-to-run.sas"

(You actually don't need to use the INITSTMT in that the termination of SAS automatically writes to the log-file defined after execution.)

TERMSTMT: Specifies the SAS statements to execute when SAS terminates.
INITSTMT: Specifies a SAS statement to execute after any statements in the

Using a central scheduling strategy

A third way is to let your centralized schedule routine handle the creation of logs.
It is common to have control over you execution of SAS jobs. Sometimes they depend on each other and you therefore need to evaluate return codes and more.
If you have a schedule mechanism it should be easy to add the needed log code.

In this example the programs to run (using crontab) in batch are defined in a table:

CRONTAB

JobName

Log

10;00;*;*;*;

pgm1.sas

pgm#1.scp

10;15;*;*;*;

pgm2.sas

pgm#2.scp

10;30;*;*;*;

pgm3.sas

pgm#3.scp

For all programs in table run the SAS batch command with program name and the right target for the logfiles

sas -initstmt "proc scaproc..." -sysin "SAS-program-to-run.sas"
(Iterate over the entities and substitute values in command line)

Adding extra metadata to your logfiles

It is possible to inject extra metadata to the log file produced by the procedure scaproc.

In the log it is possible to add the following lines

/* INIT: Current Node Name (CURRENTNODENAME)...........: [SERVERNAME|-NONE-] /* INIT: PROC UPLOAD Node Name (UPLOADNODENAME)........: [SERVERNAME|-NONE-] /* INIT: PROC DOWNLOAD Node Name (DOWNLOADNODENAME)....: [SERVERNAME|-NONE-]
  • CURRENTNODENAME is the server where the SAS program is executed from

  • UPLOADNODENAME is the server where data is uploaded to

  • DOWNLOADNODENAME is the server where data is downloaded to

And if you replaces ATTR with ATTRUPLOAD or ATTRDOWNLOAD Engrafo can handle lineage between multiple environments when using rsubmit, proc upload and proc download

Adding an extra metadata file to your logfiles

It is possible to add a metadata fileto your log files. Metadata is read as value pairs.

paramter

value

owner

Ownername

The metadata file should have the same name as the log-file with the extension .meta

Keeping work-lineage between programs/logs

When creating a log file, Engrafo will make sure that the work data is only know within the programs environment.
So, if you have a progam called; CreatePayrollReport, the work-data associated with will be named; WORK(CreatePayrollReport).

If you have 3 programs, creating 3 logs.

  1. ExtractPayrollData.scp

  2. ModelPayrollData.scp

  3. CreatePayrollReport.scp

they will NOT share work data.

If you need the SAS programs to share work data and the work data is used to create lineage between the programs, you can name the logs like this:

  1. Payroll_#1ExtractPayrollData.scp

  2. Payroll_#2ModelPayrollData.scp

  3. Payroll_#3CreatePayrollReport.scp

This will create the work data reference

WORK(Payroll)

The workdata reference is then shared for the 3 programs

Performance considerations

Adding the extra log has none remarkably influence on the job-performance

Upload logs to Engrafo

Engrafo will analyze logs that are uploaded to Engrafo. That can be done in two ways:

  1. Manually uploads

  2. Automatically uploads

Upload Manually

  1. Navigate to Load Metadata → Metadata load (API)

  2. Upload one or more .scp-files

image-20250410-150709.png

Upload Automatically

  1. Place the .scp-files in the application folder wwwroot/uploads_CSVAUTO/;

  2. Engrafo will automatically load the files from that folder every 10 seconds

A way to automize your SAS program documentation is to make wwwroot/uploads_CSVAuto/; to a shared folder and direct the proc scaproc output files to the shared folder.

Documentation structure

If you need Engrafo to create a structure for you documented SAS jobs, you can place the .scp-files in folders and the structure will be reflected in Engrafo

image-20250410-151655.png

In that way, placing the folder DeployedSASPrograms and subfolders in /wwwroot/uploads_CSVAuto/; will keep the structure for the programs documented in Engrafo

 

image-20250410-151230.png

Logic vs. physical documentation model

Engrafo can create 2 models. One model based on the logical libnames and and one model based on the physical paths

 

SAS Analyzing options

There are several configuration options for the analyzer. Each option is explained by clicking the info-tag.

image-20250410-152521.png

For example the SAS-Mappings are used to standardize libnames by physical paths or by using a regex-syntax.

image-20250410-153246.png