Functions, Procedures, and Programming NotesSymbEL supports encapsulated, scoped blocks that can return a value. These blocks are referred to as functions and procedures for notational brevity. To complete our picture, we need to make some points about these constructs. Function Return TypesSo far, the only value that functions have returned in the examples has been a scalar type, but functions can return double or string as well. More complex types, covered in later sections, can be returned as well. Functions cannot return arrays because there is no syntactic accommodation for doing so. However, there is a way to get around this limitation for arrays of nonstructured types. See “Returning an Array of Nonstructured Type from a Function” on page 552. ScopeAlthough variables may be declared local to a function, the default semantics for local variables are for them to be the C equivalent of static. Therefore, even though a local variable has an initialization part in a local scope, this initialization is not performed on each entry to the function. It is done once before the first call and never done again. Initializing VariablesVariables can be initialized to values that are compatible with their declared type. This is the case for both simple and structured types. The only exceptional condition in initializing variables is the ability to initialize a global variable with a function call. This capability is supported, but use it with great care. In general, avoid it as bad practice. Arrays can be given initial values through an aggregate initialization. The syntax is identical to that in C. For example: int array[ARRAY_SIZE] = { The size of the array must be large enough to accommodate the aggregate initialization or the parser will flag it as an error. Notes About Arrays and StringsThe type string is native to SymbEL and has no equivalent in C. Since string is an atomic type, it is incorrect to use the subscript operator to access individual characters in the string. For this reason, we need to be able to interchange values between variables that are of type string and char[]. Array AssignmentAlthough pointer types are not allowed in SymbEL, assignment of arrays is allowed provided that the size of the target variable is equal to or greater than the size of the source variable. This is not a pointer assignment, though. Consider it a value assignment where the values of the source array are being copied to the target array. String Type and Character ArraysVariables declared as type string and arrays of type char have interchangeable values to allow access to individual characters contained in a string. For example: char tmp[8]; Accessing the individual characters in the string cannot be done with the s variable by itself. If a subscript was used, it would mean that the variable s was an array of strings, not an array of characters. After any modification to the variable tmp in this example is done, the value could be assigned back to s. Assignment to String VariablesWhen a variable of type string is assigned a new value, the existing value of the variable is freed and a new copy of the source string is allocated and assigned to the variable. This is also the case when string variables are assigned the return value of a function that returns type string. See “Using Return Values from Attached Functions” on page 553. Empty ArraysWhen a function accepts an array as a parameter, it is not always convenient to send an array of the same size as the parameter. For this reason, the empty array declaration was added for use in parameter declarations. This is a notation where no subscript size is included in the declaration, just the [] suffix to the variable name. Here is an example. print_it(int array[]) Upon entry to the function containing the empty array parameter, the parameter variable obtains a size. In this example, the array parameter is given a size of 24 (6 * 4) upon entry to the function print_it. This size will change for every array passed as an actual parameter. RecursionRecursion is not supported. Direct recursion is flagged as an error by the parser. Indirect recursion is silently ignored. The disallowance of recursion is due to a problem in the run time that could not be overcome in the short term and may be fixed in a future release. Examine the following program.
Code View:
Scroll
/
Show All
one() It seems that the output of this program would be Here I am but, in fact, only the first line will be printed out. The second call to one is detected by the run time, and a return from the function is performed before anything is done. SymbEL is not the place to do recursion. Again, if you feel like being tricky, don't be. It probably won't work. Built-in FunctionsSymbEL currently supports a limited set of built-in functions. As the need arises, more built-ins will be added. Many of these built-ins work the same as or similarly to the C library version. For a complete description of those functions, see the manual page for the C function. The current built-in functions are described below.
Dynamic ConstantsThe SymbEL interpreter deals with a few physical resources that have variable quantities on each computer on which the interpreter is run. These are the disks, network interfaces, CPUs, and devices that have an interrupt-counters structure associated with the device. It is often necessary to declare arrays that are bounded by the quantity of such a resource. When this is the case, a value is required that is sufficiently large to prevent subscripting errors when the script is running. This requirement is dealt with by means of dynamic constants. These constants can be used as integer values, and the interpreter views them as such. These dynamic constants are:
main() Attachable FunctionsTo ensure against the rampant effects of “creeping featurism” overtaking the size and complexity of the interpreter, a mechanism had to be devised so many procedures and functions could be “built in” without being a built-in. The solution was to provide a syntactic remedy that defined a shared object that could be attached to the interpreter at run time. This declaration would include the names of functions contained in that shared object. Here is an example. attach "libc.so" { The attach statements are contained in the same se include files as the C counterpart in /usr/include. The man page for fopen, for instance, specifies that the file stdio.h should be included to obtain its declaration. In SymbEL, the include file stdio.se is included to obtain the declaration inside an attach block. Here are some rules governing the use of attached functions:
Ellipsis ParameterFor attached functions only, you can use the ellipsis parameter ( ...) to specify that there are an indeterminate number and type of parameters to follow. Values passed up until the ellipsis argument are type checked, but everything after that is not type checked. The ellipsis parameter allows functions like sscanf to work and therefore makes the language more flexible. For instance, the program attach "libc.so" { yields the output "Found 4 values: hello 1:15.16 f" Attached VariablesGlobal variables contained in shared objects can be declared within an attach block with the keyword extern before the declaration. This declaration causes the values within the internal SymbEL variable to read and be written to the variable as it is used in the execution of the program. Here is an example of the declaration of getopt, with its global variables optind, opterr, and optarg from the include file stdlib.se. attach "libc.so" { This code works for all types, including structures. Built-in VariablesAlthough extern variables can be attached with the extern notation, there are three very special cases of variables that cannot be attached this way. These variables are stdin, stdout, and stderr. These “variables” in C are actually #define directives in the stdio.h include file; they reference the addresses of structure members. Since the address of structures cannot be taken in SymbEL, there is no way to represent these so-called variables. They are, therefore, provided by the interpreter as built-in variables. They can be used without any declaration or include file usage. Parameters to main and Its Return ValueIn C programs, the programmer can declare main as accepting three parameters:
Similarly, the SymbEL main function can be declared as accepting two of these parameters, argc and argv. Here is an example that uses these variables. main(int argc, string argv[]) This example also demonstrates the use of an empty array declaration. When this program is run with the command se test.se one two three four five six argv[0] = test.se It is not necessary to declare these parameters to main. If they are not declared, then the interpreter does not send any values for them. It is also possible to declare main as being an integer function. Although the exit function can be used to exit the application with a specific code, the value can also be returned from main. In this case, the previous example would be: int main(int argc, string argv[]) The value returned by the return statement is the code that the interpreter exits with. StructuresSymbEL supports the aggregate type struct, which is similar to the C variety, with some exceptions. An aggregate is a collection of potentially dissimilar objects collected into a single group. As it turns out, most of the SymbEL code developed will contain structures. As an example, here is what a SymbEL password file entry might look like. struct passwd { The declaration of structure variables differs from C in that the word struct is left out of the variable declaration. So, to declare a variable of type struct passwd, only passwd would be used. Accessing Structure MembersYou access a structure member with dot notation. The first part of the variable is the variable name itself, followed by a dot and then the structure member in question. To access the pw_name member of the passwd structure above, the code could look like this. main() Structure members can be any type, including other structures. A structure may not contain a member of its own type. If it does, the parser posts an error. Arrays of StructuresDeclarations of arrays of structures is the same as for any other type, with the provision stated in the previous paragraph. Notation for accessing members of an array of structures is name[expression].member . Structure AssignmentThe assignment operation is available to variables of the same structure type. Structure ComparisonComparison of variables of structure type is not supported. Structures as ParametersVariables of structure type can be passed as parameters. As with other parameters, they are passed by value so the target function can access its structure parameter as a local variable. Arrays of structures to other SymbEL functions are also passed by value. This is not the case with passing arrays of structures to attached functions (see “Attachable Functions” on page 532). Structures as Return Values of FunctionsFunctions can return structure values. Assigning a variable the value of the result of a function call that returns a structure is the same as a structure assignment between two variables. Language ClassesThe preceding sections have discussed the basic structure of SymbEL. The remainder of this chapter discuss the features that make SymbEL powerful as a language for extracting, analyzing, and manipulating data from the kernel. When generalizing a capability, the next step after creation of a library is the development of a syntactic notation which represents the capability that the library provided. The capability in question here is the retrieval of data from the sources within the kernel that provide performance tuning data. SymbEL provides a solution to this problem through the use of predefined language classes that can be used to declare the type of a variable and to designate it as being a special variable. When a variable with this special designation is accessed, the data from the source that the variable represents is extracted and placed into the variable before it is evaluated. There are four predefined language classes in SymbEL:
Variables of these language classes have the same structure as any other variable. They can be a simple type or a structured type. What needs clarification in the declaration of the variable is
The syntax selected for this capability defines the variable with a name that is the concatenation of the language class name and a dollar sign ( $). This convention allows these prefixes for variables to denote their special status.
Examples of variables declared with a special attribute are: ks_system_misc kstat$misc; // structured type, kstat language class When any of these variables appear in a statement, the values that the variables represent are retrieved from the respective source before the variable is evaluated. Variables declared of the same type but not possessing the special prefix are not evaluated in the same manner. For instance, the variable ks_system_misc tmp_misc; // structured type, no language class specified can be accessed without any data being read from the kstat framework. Variables that use a language class prefix in their name are called active variables. Those that do not are called inactive variables. The kvm Language ClassLet’s look at an example of the use of a kvm variable. main() In this example, there is a local variable of type int. The fact that it is an int is not exceptional. The fact that the name of the variable begins with kvm$ is exceptional. It is the kvm$ prefix that flags the interpreter to look up this value in the kernel via the kvm library. The actual name of the kernel variable is whatever follows the kvm$ prefix. The program need not take special action to read the value from the kernel. Simply accessing the variable by using it as a parameter to the printf() statement (in this example) causes the interpreter to read the value from the kernel and place it in the variable before sending the value to printf(). Use of kvm variables is somewhat limiting since the effective uid of se must be superuser or the effective gid must be sys in order to successfully use the kvm library. In this example, the variable maxusers is a valid variable in the kernel and when accessed is read from the kernel address space. It is possible and legal to declare a kvm$ active variable with the name of a variable that is not in the kernel address space. The value will contain the original initialized value, and refreshing of this type of variable is futile because there is no actual value in the kernel. This technique is useful when dealing with pointers, though, and an example is included in “Using kvm Variables and Functions” on page 553. The kstat Language ClassThe use of kstat variables differs from the use of kvm variables in that all of the kstat types are defined in the header file kstat.se. All kstat variables must be structures because this is how they are defined in the header file. Declaration of an active kstat variable that is not a structure results in a semantic error. Declaration of an active kstat variable that is not of a type declared in the kstat.se header file results in the variable always containing zeros unless the program manually places something else in the variable. Here is an example of using kstat variables. #include <kstat.se> Just as in the kvm example, no explicit access need be done to retrieve the data from the kstat framework. The access to the member of the active ks_system_misc variable in the parameter list of printf() causes the member to be updated by the run time. Multiple InstancesThe kstat.se header file contains many structures that have information that is unique in nature. The ks_system_misc structure is an example. The number of CPUs on the system is unique and does not change depending on something else. However, the activity of each of the individual CPUs does change, depending on which CPU is in question. This is also the case for network interfaces and disks. This situation is handled by the addition to structures of two members that contain data for devices that have multiple instances. These members are name$ and number$. The name$ member contains the name of the device as supplied by kstat. The number$ member is a linear number representing the nth device of this type encountered. It is not the device instance number. This representation allows a for loop to be written such that all of the devices of a particular type can be traversed without the need to skip over instances that are not in the system. It is not unusual, for instance, for a multiprocessor machine to contain CPUs that do not have linear instance numbers. When traversing through all the devices, the program will encounter the end of the list when the number$ member contains a -1. Here is an example of searching through multiple disk instances.
Code View:
Scroll
/
Show All
#include <kstat.se> In this program, kstat$disk.number$ is set initially to zero. The “while part” of the loop is then run, checking the value of kstat$disk.number$ to see if it’s -1. That comparison causes the run time to verify that there is an nth disk. If there is, then the number$ member is left with its value and the body of the loop runs. When the run time evaluates the kstat$disk.name$ value in the printf() statement, it reads the name of the nth disk and places it in the name$ member, which is then sent to printf(). Other Points About kstatHere are some points about how to best use kstat variables in a program. Some of the values contained in the kstat structures are not immediately useful by themselves. For instance, the cpu member of the ks_cpu_sysinfo structure is an array of four unsigned longs representing the number of clock ticks that have occurred since system boot in each of the four CPU states: idle, user, kernel, and wait. This data must be disseminated to be useful. If a program needs to access many members of a kstat variable, then it is in the best interest of the performance of the program and the system to copy the values into an inactive kstat variable by using a structure assignment. The single structure assignment causes all of the members of the structure to be read from the kstat framework with one read and then copied to the inactive variable. When these values are accessed by the inactive variable, no more reads from the kstat framework will be initiated. The net result is a reduction in the number of system calls being performed by the run time, and therefore se does not have a significant impact on the performance of the system. Here is an example. Example kstat Program
Code View:
Scroll
/
Show All
#include <unistd.se> About the Programks_cpu_sysinfo kstat$cpusys; // active kstat variable This code is the declaration of the active and inactive variable. Use of the active variable causes the run time to read the values from the kstat framework for the ks_cpu_sysinfo structure. Later accesses to the inactive variable do not cause the reads to occur. ks_system_misc kstat$misc; // active kstat variable Since the ncpus variable will be used extensively, it is best to put the value into a variable that does not cause continual updates. int old_ints[MAX_CPU]; Since the program computes the rate at which interrupts and context switches are occurring, the values from the previous iteration need to be saved so they can be subtracted from the values of the current iteration. They are arrays bounded by the maximum number of CPUs available on a system. // initialize the old values This code grabs the initial values that will be subtracted from the current values after the first sleep() is completed. For simplicity, no timers are kept, and it is assumed that only one second has elapsed between updates. In practice, the elapsed time would be computed. for(i=0; i<ncpus; i++) { Here, the number$ member is set to the CPU in question, and then the contents of the entire active structure variable are copied into the inactive structure variable. This coding causes only one system call to update the kstat variable. ints = tmp_cpusys.intr - old_ints[i]; This code computes the number of interrupts and context switches for the previous second and prints it out. The current values are then saved as the old values, and the loop continues. Runtime Declaration of kstat StructuresThe kstat framework is dynamic and contains information regarding devices attached to the system. These devices are built by Sun and by third-party manufacturers. The interpreter contains static definitions of many devices, and these definitions are mirrored by the kstat.se include file. However, it is unreasonable to assume that the interpreter will always contain all of the possible definitions for devices. To accommodate this situation, a syntactic element was needed. This is the kstat structure. A kstat structure can define only KSTAT_TYPE_NAMED structures, which are the structures that define devices such as network interfaces. As an example, the following script prints out the values of a kstat structure that is not declared in the kstat.se file but has been part of the kstat framework since the very beginning. kstat struct "kstat_types" ks_types { The kstat structure introduces a few new concepts:
Note that when a dynamic kstat structure declaration replaces a static declaration inside of the interpreter, the old declaration is discarded and replaced with the new one. Therefore, if a kmem_magazine declaration were used to replace the "ks_cache" declaration from kstat.se, the only kstat links seen would be the kmem_magazine members and all of the other cache links (and there are a lot of them) would no longer be seen. Adding New Disk NamesYou can use an internal function, se_add_disk_name( string name ), to add new disk names to the existing list internally. Therefore, if the tape drives and nfs mounts that are recorded in the KSTAT_TYPE_IO section of the kstat framework were to be added to the list of disks for display by any script that shows disk statistics, you could add these lines at the beginning of the script. se_add_disk_name("st"); This function is declared in the se.se include file. The mib Language ClassA lot of data regarding the network resides in the mib variables of the kernel. Unfortunately, these mib variables are not part of the kstat framework. Therefore, a new language class was created to facilitate access to this information. Variables of the mib class have a unique feature in that they can be read, but assigning values generates a warning from the interpreter. This warning is to remind you that assigning values to the members of the mib2_* structures will not result in the information being placed back into the kernel. The mib variables are read-only. mib variables do not have the permissions limitation of kvm variables. Any user can view mib variable values without special access permissions. To view the mib information available from within SymbEL, run the command netstat -s from the command line. All but the IGMP information is available. Since all mib variables are structures, the rules regarding structure assignment being used to cut down on the overhead of the interpreter are the same as for the kstat and kvm classes. Here is an example of using mib class variables. #include <mib.se> The ndd Language ClassSunOS 5.x provides access to variables that define the operation of the network stack through a command called ndd (see ndd(1M)). The ndd language class within SymbEL provides access to the variables within the IP, ICMP, TCP, UDP, and ARP modules. The definitions of the available variables are in the ndd.se include file. For each module, there is a structure that contains all of the variables available for that module. Some of these variables are read-write and others are read-only. If you try to modify a variable that is read-only, the interpreter posts a warning message. Some of the read-only variables are tables that can be quite large. Note that the largest table size that can be handled is 64 kilobytes (65,536 bytes). If an ndd variable is larger than 64 kilobytes, it is truncated. Like kstat and mib variables, all ndd variables are structures. The following program displays the tcp_status variable of the TCP module. This variable is type string and when printed looks like a large table. #include <stdio.se> User-Defined ClassesThe four language classes provide a significant amount of data to a program for analysis. But the analysis of this data can become convoluted and make the program difficult to deal with. This is one of the problems that SymbEL hoped to clear up. Adding more language classes is a potential solution to this problem. An example of an additional language class that would be useful is a vmstat class. This would be a structure that provided all of the information that the vmstat program provides. The problem is that such an addition would make se larger and provide functionality that didn't really require the internals of the interpreter to accomplish. All of what vmstat does can be done by a SymbEL program. In addition to the vmstat class, it would be useful to have classes for iostat, mpstat, nfsstat, netstat, and any other “stat” program that provided this type of statistical information. What was needed to accomplish this task correctly was a language feature that allowed programmers to create their own language classes in SymbEL. This “user defined class” would be a structure and an associated block of code that was called whenever one of the members of the structure was accessed. This idea led to the development of the aggregate type class. A class type is a structure and a block of code inside the structure that are first called when the block that contains the declaration of the class variable is entered. Thereafter, whenever a member of the class variable is accessed, the block is called. To illustrate the class construct, here is a program that continually displays how long a system has been up. The first example is without the use of a class.
Code View:
Scroll
/
Show All
#include <stdio.se> This program continues in an infinite for loop, computing the uptime based on the number of clock ticks the system has received since boot. The computation is contained completely within the main program. This code can be distilled into a user-defined class, as the following code shows.
Code View:
Scroll
/
Show All
#include <unistd.se> The start of the class looks like a structure, but the final “member” of the structure is a block of code called the “class block.” The name used after the class keyword is the type name that will be used in the declaration of the variable. The name of the class block is the prefix used in variable names to denote that the variable is active. Variables declared in a user-defined class type that do not use the prefix in the variable name are inactive. The main() function of the uptime program would now be written to use the uptime class as shown in this example. #include <stdio.se> The previous section discussed how the assignment of entire structures cuts down on the overhead of the system because only one copy is required. Not only is this true here as well, but the structure copy also ensures that the data printed out represents the calculations of one snapshot in time, instead of printing different values for each time that the class block was called to update each member of the class that was used as a parameter to printf(). PitfallsHere are some of the idiosyncrasies of the language that will catch programmers by surprise if they’re accustomed to using a particular feature in C and assume that it will be supported in SymbEL.
#include <stdio.se> TricksAs the creator of a programming language and the developer of the interpreter, it is much easier for me to see through the intricacies of the features to underlying functionality of the interpreter itself. This knowledge manifests itself in programming “tricks” that allow certain operations to be done that may not be obvious. Here are some that I’ve used. If there’s something you need done and it doesn’t seem to fit into any language feature, try to work around it. You may find a loophole that you didn’t know existed. Returning an Array of Nonstructured Type from a FunctionAlthough it is not allowed to declare a function as int [] it is still possible to return an array. Granted, this code is unattractive, but most of the tricks in this section involve something that is not very appealing from the programming standpoint. SymbEL is, after all, just a scripting language. And if it can be done at all, it’s worth doing. So, here’s how to return an array of nonstructured type from a function. #define ARRAY_SIZE 128 Using Return Values from Attached FunctionsIt is common to read input lines by using fgets, then locate the newline character with strchr and change it to a null character. This approach has unexpected results in SymbEL. For instance, the code segment while(fgets(buf, sizeof(buf), stdin)!=nil){ would be expected to null the newline character and print the line (yes, I know this code segment will cause se to exit with a null pointer exception if a line is read with no newline character). But this is not the case because the strchr function will return a string that is assigned to the variable p. When this happens, a new copy of the string returned by strchr is allocated and assigned to p. When the p[0] = '\0'; line is executed, the newline character in the copy is made null. The original buf from the fgets call remains intact. The way around this result (and this workaround should be done only when it is certain that the input lines contain the newline character) is: while(fgets(buf, sizeof(buf), stdin)!=nil){ In this case, the result of the strchr call is never assigned to a variable, and its return value remains uncopied before being sent to the strcpy function. strcpy then copies the string "" onto the newline and in doing so, changes it to the null character. Using kvm Variables and FunctionsUsing the kvm functions and dealing with kvm variables in general is quite confusing because there are so many levels of indirection of pointers. This simple script performs the equivalent of /bin/uname -m.
Code View:
Scroll
/
Show All
#include <stdio.se> Another example of extracting kvm values is with the kvm_declare function. This function allows kernel variables to be declared while the program is running. Instead of declaring a kvm variable for maxusers, for instance, you could do it this way: main() A more general way to peruse integer variables entered at the user’s leisure is shown in this example.
Code View:
Scroll
/
Show All
#include <stdio.se> Using an attach Block to Call Interpreter FunctionsThe attach feature of SymbEL implements the use of the dynamic linking feature of Solaris. The dl functions allow an external library to be attached to a running process, thus making the symbols within that binary available to the program. One of the features of dynamic linking is the ability to access symbols within the binary that is running. That is, a process can look into itself for symbols. This can also be accomplished in SymbEL by using an attach block with no name. With this trick, a script can call functions contained within the interpreter, but the author of the script has to know what functions are available. Currently, the only functions available to the user are listed in the se.se include file. The most useful of these functions is the se_function_call function, which allows the script to call a SymbEL function indirectly. This function can be used for a callback mechanism. It’s the equivalent of a pointer to a function. For example, this script calls the function " callback" indirectly. #include <se.se> The se_function_call function is declared with an ellipsis argument so any number of parameters can be passed (up to the internal limit) to the function being called. Be careful to pass the correct type and number of arguments. An extreme example of this functionality is demonstrated below. The script calls on the interpreter to parse a function from an external file and then run the function. It’s an absurd example, but it demonstrates the tangled web that can be weaved with attached functions and variables.
Code View:
Scroll
/
Show All
// this is the file "other_file" |
|