|
CEQuery is a tool for text analysis and exploration for data sources like the CEQ, and other forms of student evaluation where students give positive and negative written judgements. CEQuery is designed specifically to describe and compare these positive and negative evaluations. Students' evaluations are scanned for a set of primary themes (domains), and sub-themes within each of these (sub-domains). The resulting scoring information is then combined with a student's existing demographic information. The combination of evaluative and demographic information allows student's opinions to be investigated through a variety of charts, extractions of relevant comments, and simple statistics. CEQuery uses a dictionary which defines which sub-domains belong to which domains, and the search terms used to define each sub-domain.
ChartsSeveral styles of charts are created by CEQuery. Overall hit rate charts show the percentage of student comments which can be categorized, at the level of comment type (positive or negative evaluation), domain, or sub-domain. These charts are interactive, allowing users to investigate domains within a comment type, sub-domains within a domain, or the comments within a sub-domain by clicking chart elements.
Other chart types are available which allow users to form group comparisons by one or more groups. For example, the percentage of comments related to staff might compared for a combination of sex and attendance type values, allowing an in-depth look at the interaction of these variables.
Cequery also creates gifs of charts and wraps them in html for printing or presentation purposes.
Comment Extractions Comments can be extracted through the interactive charts described above, or directly through a menu system. The system for specifying which comments to display is flexible, allowing users to select broad groups or tightly defined segments of both students and comment types to define extractions.
StatisticsTo provide an overview of which issues are important to students, CEQuery creates a number of spreadsheets showing absolute counts for domains and sub-domains, as well as ranked frequency. The likelihood of a particular domain/sub-domain being used in a positive or negative evaluation is also calculated. These statistics can also be created for each level of any demographic variable for finer grained analysis.
DictionaryThe dictionary is a structured document (using an XML format) which defines the domain and sub-domain structure, and the search terms used to define them. Multiple search terms can be used for each sub-domain. Domains themselves do not have search terms, as domains are defined by the sub-domains they contain. Search terms use a combination of boolean logic terms (AND, OR, NEAR, NOT) to form relationships between indivudual search words, or groups of words. The logic can be nested to whatever depth is required. A regular expression syntax (a wildcard system) can be used to make individual search words match multiple words (eg staff\w* will match <i>staff, staffing, staffed, staffs, staffers</i>). The combination of boolean logic and wilcards allow very specific, general, or complex, search terms to be constructed. Users may create their own dictonaries, or modify the existing dictionary. |