Executive Function Evaluation System Based on Wiping Desk Behavior in Virtual Space

: In this study, we report the development of a virtual reality (VR) system that evaluates executive function based on cleaning behaviors in real-time. It is known that patients with acquired brain injuries present adverse symptoms such as attention, memory, and functional disorders, as well as aphasia. Current methods to evaluate acquired brain injuries include tests such as the behavioral assessment of the dysexecutive syndrome (BADS) and digital clinical assessment for attention (D-CAT) tests; however, these tests require special/specific toolkits, and the methods are rigorous when applied for real-time dynamic evaluation. Moreover, patients are often burdened by the need to undergo paper tests as required by the evaluation methods. In this context, we propose and verify the efficacy of a method that offers real-time dynamic evaluation of acquired brain injury based on daily-living activities such as cooking, cleaning, and shopping. In this study, we focus on executive function relating to acquired brain injury, and we propose a real-time dynamic evaluation method for executive functioning using VR, which enables the automatic evaluation of subjects’ table-cleaning behavior. Our results indicate that our system can automatically assess subjects’ table cleaning behavior based on BADS test, while affording a cleaning behavior maximum average accuracy of 75.5%.


Introduction
Individuals with acquired brain injuries present adverse symptoms such as attention, memory, and functional disorders, which prevents them from effectively executing the activities of daily living [1], [2]. In Japan, approximately half a million people live with brain injuries, and this number is increasing every year [3]. Meanwhile, worldwide, dementia prevalence has been found to roughly double every 4-5 years from age 65 onward, that is, more than one-third of individuals over 85 likely have dementia [4], [5]. In particular, the World Health Organization (WHO) estimated that 35.6 million people suffered from dementia in 2010, and it expects this number to rise to 65.7 million in 2030 and 115.4 million in 2050 [6].
In order for a doctor to determine the state of a patient's cognitive function, it is often necessary to use dedicated test kits for each separate evaluation. Doctors normally use the Hasegawa dementia rating scalerevised (HDS-R) [7] for dementia evaluation in Japan, the clinical assessment for attention (CAT) [8], clinical assessment for spontaneity (CAS) [8], and digital CAT (D-CAT) [9] tests for attention function evaluation, and the trail making test (TMT) [10] and behavioral assessment of the dysexecutive syndrome (BADS) [11] for executive function evaluation. However, these tests need special toolkits, and the methods are often rigorous when applied for real-time, dynamic evaluation. Moreover, patients are often burdened by the need to undergo paper tests as per the evaluation methods. Meanwhile, researchers have recently developed systems for evaluating cognitive functions using virtual reality (VR): Felipe et al. evaluated physical therapy in stroke patients using VR [12], Cho et al. verified the effects of cognitive training using VR; their findings suggest the practicality of such training [13]. Atkins et al. developed an effective VR functional capacity assessment tool (VRFCAT) for cognitive evaluation [14]. Further, Namisato et al. created a "messy" room in VR and evaluated the brain cognitive function from the subject's cleanup strategy [15].
Against this backdrop, in this study, we propose and verify the efficacy of a real-time method to dynamically evaluate acquired brain injuries based on daily living activities such as cooking, cleaning, and shopping. In particular, we focus on executive function evaluation relating to acquired brain injury, and we propose a realtime and dynamic evaluation method for executive function using VR, which enables the automatic evaluation of the subjects' cleaning behavior in VR. Fig. 1 shows a screenshot of the proposed method to dynamically evaluate the executive function in real-time, involving the cleaning of a desk in a virtual space based on the "key search test" of the BADS test. In this paper, we describe automatically to evaluate a cleaning behavior for VR executive training.

Conventional Evaluation Method of Executive Function
To evaluate the executive function, doctors normally use the BADS test proposed by Wilson et al [11]. In Japan, Kashima et al. have proposed a Japanese version of the BADS test based on Wilson's BADS test [16], which consists of the six following types of subtests.

• Rule shift card test
This test uses playing cards to evaluate the subject's ability to follow the change from one rule to another or keep a new rule in memory.

• Action program test
This test uses items such as water, wire, cork, and beakers, instead of paper and pencil, to test the subject's ability to plan actions and solve problems.
• Key search test This test evaluates the subject's ability in planning an efficient path to find lost keys. In addition, because the act of losing things is an event that can occur daily in patients with brain damage, it is a test in line with daily behavior.

• Temporal judgement test
This test evaluates a prediction time of daily living, for instance, a boiling time of water, a dog's lifespan and more.

• Zoo map test
This test assesses the subject's ability to plan a path according to certain rules on the map of a zoo. In addition, this test evaluates the subject's ability to consider feedback information and minimize mistakes when a rule is violated.

• Modified six elements test
This test evaluates the ability of the subject to plan, systematize, and adjust for six different tasks within the time limit according to the rules indicated.
The Japanese BADS test uses special toolkits for each test and further cannot afford real-time, dynamic evaluation. In addition, the six types of tests listed above cannot be grouped under daily behavior. Therefore, in this study, we focus on a cleaning behavior associated with daily living, and we propose the real-time dynamic evaluation of the executive function involved in such behavior.

Details of Key Search Test
We first explain the details of the key search test, which is the basis of our VR system. This test uses A4sized paper with a 100-mm square drawn in the center of paper and a small black spot drawn 50 mm below the square. Participants start with the small black point and draw a line with a pen in attempting to trace a path in the square to find a key. Fig. 2 shows an image of the key search test [11]. This test considers the following eight evaluation items: i) How to enter the field, ii) Where to find the key, iii) continuity of line trace, iv) parallelism, v) horizontality or verticality of trace, vi) default pattern usage, vii) coverage of desk wiping space, viii) finding of the key.  In our VR executive evaluation based on the cleaning behavior, the user uses a cloth to wipe a desk on which the "daily items" are placed. At this time, the user is instructed to "place the daily items in a place other than on the desk" and further that "wiping the desk with the daily items" is not allowed. This is because the system correctly evaluates the performance function. In other words, the system only evaluates wiping with a cloth. We explain the details of the evaluation in section 3.3.1. Fig. 3 shows the configuration of our system, composed of a PC with a GPU, an Oculus Rift DK2 as the head mount display, Leep Motion sensor for hand behavior recognition, and a development environment based on Unity 5.3.1. This VR system consists of an "examination scene" and a "reflection scene".

Examination Scene
In this system, first, a user places daily-use items such as a book, pen, remote controller, and plastic bottle on a desk at a training start. Fig. 4 shows a layout example of the daily-use items. Here, we note that there is a limit to the number of items that can be placed on the desk. Further, the difficulty level of the VR executive function test can be adjusted by limiting the number of items arranged on the desk.

Evaluation items
Next, we describe the evaluation of the wiping of the desk. There are seven evaluation items, excluding "viii) finding of the key," based on the key search subtest of the BADS test (section 3.1). In addition, we add four items to evaluate the desk wiping. These are viii) distance of daily item movement, ix) number of daily items moved, x) length of path covered by cleaning cloth, and xi) number of times the cloth is held and dropped. Table 1 lists the scores corresponding to some of the evaluation items involved in the cleaning test. • How to enter the field In desk wiping, the point on the desk at which the user begins cleaning is important. Therefore, as shown in a high light of Fig. 5, we scored 3 points for starting from the corner of the desk, 2 points for starting from the side, and 1 point starting from any other place. • Where to find the key As with I), the point at which wiping is completed is an important factor of assessment. Therefore, we scored 3 points for finishing from the corner of the desk, 2 points when finishing from the side, and 1 point for finishing at any other place.
• Continuity line In wiping the desk, it is efficient to use single strokes. Therefore, as shown in Fig. 6, 1 point is given when the wipe locus is not interrupted, and 0 points are given when the locus is interrupted. We also consider misrecognition of the hand and assume that the trajectory breaks when the cloth is separated from the desk by more than 2 s. Fig. 6. Classification/scoring of wipe locus continuity.

• Parallelism
When wiping the desk, it is efficient for the user to perform repetitive and parallel wiping motions. Therefore, we determined the wiping motion parallelism and verticality based on the slope of the trajectory. As shown in Fig. 7, for the inclination of the locus, the angle of coordinate change is calculated, and a movement-amount histogram along 8-or 16-directions is generated. Subsequently, we find the direction with the largest moving distance and the opposite direction, compare these two moving distances, and calculate the difference. If the difference is small, it is judged that there is parallelism, corresponding to a score of 1 point; otherwise, the score is 0. Fig. 7. Part of movement-amount histogram along 8-or 16-directions.

• Horizontal or vertical traverse
When wiping a desk, it is not efficient to wipe it diagonally. That is, the direction along which the user wipes

Volume 9, Number 4, October 2020
the desk needs to be vertical or horizontal. Here, we note that two consecutive points (X1, Y1) and (X2, Y2) on the table are horizontal when they satisfy X1≒X2 and vertical when they satisfy Y1≒Y2. We score 1 point if both these points satisfy the horizontal or vertical traverse conditions, and 0 points otherwise.
• Default patterns In this system, we defined four patterns as prescribed in the BADS test, as shown in Fig. 8, which also shows the score for each pattern. Our system classifies the wipe patterns using the k-nearest neighbors (k-NN) method. Fig. 8. Classification of wipe continuity based on four patterns.

• Coverage of desk space
It is important that the user wipes the desk thoroughly. In this study, we define the coverage rate in the same manner as in the BADS test, and we evaluate the amount of desk-space coverage during wiping. In this system, we score 1 point if >80% of the desk is wiped/covered by the cloth, and 0 points otherwise.
• Distance of daily item movement An important point in efficient desk cleaning is how to move objects on the desk. This system evaluates the count and the distance by which the user has moved the daily-use items on the desk. However, upon movement, if the objects overlap, it is considered as one object , for instance, a cup on a book, a pen on a book is one object. The movement amount M of the daily-use items is calculated using Equation (1) when the daily items are at (xt-1, yt-1) at instant t -1 and (xt, yt) at t. (1) • Number of daily-use item movements As with the amount of movement, the number of times the daily-use items are moved is also important. This system counts and evaluates the number of times a user moves a daily-use item in wiping the desk. As with VIII), when objects overlap, it is considered as one object.
• Distance covered by cleaning cloth For this system, from the viewpoint of evaluating the executive function, we decided that it was inefficient to wipe the same place multiple times. Therefore, we calculated and evaluated the distance traveled by the cloth upon wiping the desk as per Equation (2).

= ∑
(2) • Number of times cloth is picked up and dropped When the user cleans the desk, it is necessary to first move the object and then wipe the table. At this time, the user needs to drop the cloth once. If the number of movements of the object is large, the number of operations of placing/dropping the cloth increases, which is not efficient. Therefore, in this system, the number of times the cloth is picked up/dropped is used as an evaluation item.

Calculation of optimal solution
We evaluated the optimal cleaning behavior based on the evaluation items listed in Section 3.3.1. We note that higher scores for items I to VII and lower scores for items VIII to XI indicate greater efficiency. Here, the score for items I to VII is x_i=(x_1,x_2,…,x_7 )in that order, those of VIII and IX are d_1 and d_2, respectively, and those of X and XI are n_1 and n_2, respectively. Next, we consider the optimization solution of function f(x,d,n), as per Equations (3) to (5). In this research, participants were asked wipe the desk in VR space, and their scores compared with the optimal score.

Reflection Scene
Feeding back the results to the user is another important aspect of the test system [17]. Therefore, the present system compares the optimal solution calculated by the system with the user solution and offers feedback to the user. At this time, in addition to the comparison, a moving image of the correct wiping method is presented for user awareness.

Experiments
We conducted experiments for pattern recognition accuracy to automatically apply the evaluation method described item VI in Section 3.2. In particular, we evaluated the accuracy of the default patterns corresponding to item VI. The participants included 10 students (9 males and 1 female). They traced the 4 default patterns 10 times each. At this time, the 8-direction and 16-direction histograms were used as learning data. Subsequently, we generated verification data with the same method of leaning data. This time, we used the k-NN classification method. In our experiments, k was set to 1, 3, 5, and 9, and we calculated the accuracy, average accuracy, and precision of default wiping patterns. Fig. 9 shows image of the experimental scenario. Fig. 9. Image of experimental scenario. Table 2 lists the accuracy of each pattern and average accuracy of pattern recognition as per the 8-and 16direction histograms for the 4 patterns, whereas Table 3 lists the precision of patterns as per the 8-and 16direction histograms.

155
Volume 9, Number 4, October 2020 The recognition rate for each pattern can be confirmed from Table 2. The recognition rates for Patterns A to D are 77%, 76%, 86%, and 98%, respectively. We note that for patterns A, B, and C, the recognition rate decreases as the value of k increases. On the other hand, the recognition rate of pattern D increases as the value of k increases, and the accuracy rate decreases. We speculate that pattern D is misrecognized because it is similar among other patterns.
From the results of this experiment, we note that the maximum recognition rate of patterns A and B is less than 80%. In order to realize accurate pattern recognition, a recognition rate close to 100 percent is required for all patterns; thus, it is necessary to improve the recognition rate. Consequently, we can improve the recognition rate by learning by adding "non-default" patterns in addition to the four patterns. From the experimental results, we note that the recognition rate decreases as the value of k increases, for instance, when 16-direction histograms, the average accuracy of k=1 is 75.5 percent but k = 9 is 52.75 percent. We speculate that this is due to the lack of learning data.

Conclusion
In this study, we proposed a VR-based real-time executive function evaluation method, and we verified the recognition rate of the pattern recognition technology used in this method and the validity of the evaluation results. As per our results, the maximum recognition rate was 77% for pattern A, 76% for pattern B, 86% for pattern C, and 98% for pattern D. In order to achieve accurate recognition evaluation, a recognition rate close to 100% is required for all patterns; thus, we plan to improve the recognition rate in future. Further, in order to improve the recognition rate, in addition to the four patterns, we plan to add a "non-default pattern" and subsequently verify the recognition rate. In addition, we plan to evaluate and verify the test performance of people with greater brain dysfunction.
Finally, we plan to verify the effectiveness of our VR performance test by calculating the correlation between the proposed VR performance test and the conventional performance test.