LoggingThis section gives you a detailed overview over the actions and events which will be logged by the Bibster system. This information will be used to analyze and evaluate our research theses. Periodically the log files will be sent to a central location. To demonstrate our logging system, an example log file is available here. StartUp and ExitTo measure user satisfaction we log the startup and exit time-stamps. We make the assumption that the time a user spends on the system and the in- or decrease of use over time, is an indication of how the user values the system. Bibliographic itemsThe extraction tool puts the scraped bibliographic items into the local node repository. The user, hopefully, classifies the bibliographic items that are extracted by the extraction tool into ACM topics. These topics together form the expertise of the peer. For future research it is good to store the classified bibliographic items. JXTA TopologyFor logging the network (JXTA) topology we log for each peer the status it has with the other peers. Each time when the status of a peer changes, the entire topology for that peer will be logged. In JXTA there are three possibilities, a peer can be active, inactive or disconnected. Recently experiences from network test we did, showed that peers switch often from active to inactive without any clear reason. Obviously this results in a lot of information to log, because we want to log each change. Therefore we decided when this problem is not solved, we don’t make a difference between active and inactive. The assumption that we make is that peers that are inactive at a certain time point will be active again after a few seconds. When a peer sends a query to another peer that is inactive, JXTA remembers the query and when the peer is active again, the query will still be sent to that peer. Advertisements and Semantic TopologyA receiver will know the expertise of a sender, if the sender sends it an advertisement message. The process of knowing each others expertise by advertisements will result in a semantic topology. This semantic topology is one of the things that we want to measure. If all advertisements that are received by a certain peer are accepted, one could retrieve the semantic topology out of looking at all the received advertisements. However, the peer has the possibility to, for whatever reason, not include an advertisement. Therefor we also need to store which advertisements are stored into the peer’s repository. Local and Global QueriesFor calculating the performance (recall/precision) of the peer selection process, it is important to make a difference between local queries on the local repository and global queries sent to other peers. Results from the local repository can’t be taken into the calculation, however it is scientifically interesting how many answers are already on the users system, and how that changes over time. For example, if a peer always includes results on queries that it receives from other peers, more and more answers can be answered locally. Although we are able to log everything, we only log the query itself and only the number of answers (thus not the answers themselves). This is because otherwise the log file becomes too large. Note that this also holds for global queries. We don’t log the SeRQL query itself but the attributes extracted from the queries. This is because we have to do that anyway, and it can be done much easier at recording step instead of at the analyzing step otherwise we have to parse the query to get the attributes. Also this holds for the global queries. To get an idea of the forwarding process, we also store the received queries sent by other peers. By logging all these results, we can also see how many messages are lost on the network. Including resultsThe user decides what (s)he does with the answers on queries. When a result is selected to be included into the repository, we see this as a indication of the quality of the answer for that user. In this way we can measure how many of the results send over the network are also been used by the user. Duplicate detectionTo test the performance of the duplicate detection algorithm, we store for each item that has been recognized as a duplicate, its complete original description and its recognized duplicate. |


