I have learned a great deal about data mining since coming to USF. My first semester at USF, I took Dr. Norbert Elliot’s class on empirical research. Prior to this class, I had zero knowledge on this topic. I quickly learned the definition of a corpus and how to use tools such as SPSS, NGram analyzer, Atlas.ti, Voyant, RandLex, etc. My project in that class ran a corpus of cover letters (in MyR) through different tools to see if I could figure out what words or phrases signaled ethical stance.
Background: I was a hiring manager for many years at a publishing company. My best hires were always people that had a good work ethic (typically honest, sincere people), but it was very hard to detect those qualities in application material or even in interviews. Anyone can practice the “right” answer for an interview, so it’s really just a type of performance.
So, my project aimed at detecting if you can find ethical qualities in cover letters. While one semester was hardly enough time to perform the actual analysis, I wound up focusing on the use of hedges and boosters. Did words such as “have” or “believe” signal ethical stance? While my project was interesting, it needed more time. In our readings this week, Sinclair and Rockwell made good points about what to expect from using these types of tools. The authors remind us that the tools “do not procude meaing – we do” (288). They also suggest that we need to try things out to see how one tool can help us explore one aspect, while anohter can do something, and a third may not be useful at all. My project did just that, I got interesting results from a combination of an ngram analyzer and Atlas.ti, but nothing useful from my analysis with RandLex.
And building on Rhody’s piece, I dig because I’m interested in the ways to dedect ethical or historical trends in across a corpoa.