Actor Table export file (.csv) for experimentation

By xtian / November 27, 2014 / project / No Comments

	SCP Made Easier! I am inspired to be in this course–helping others make decisions is relevant in all service domains. I’m excited, because our instructors have encouraged us to collaborate. In the Welcome Video professor describes collaboration in our course this way, “Doing this course by applying the techniques to a case, is a very good way to learn it. Working together makes the course even easier, as you can discuss each other’s work. Whether you work on the same case, or on something different does not matter. “ In addition to the Edx course discussion forum and the Interactive Map, there is also the twitter profile @SCPMOOC and hashtag #SCPMOOC[1] to help participants connect. Working through the assignments I asked myself, *“How can we cooperate even more to learn the course material?”* SCPMOOC, the first “O” stands for Open Right now I’m also taking another Edx course called Data, Analytics, and Learning. The knowledge domain of that course is different, but the techniques discussed can be applied to dealing with actors and their goals. I started thinking the data in our Actor Tables could be analyzed using these techniques to compare all the actors in the the SCP course. For example, in my problem space (feedback in employment) I feature the following actors: job applicant small business owner labor management hiring manager current employee customer government organization business competitor commercial third-party job applicant competitor An analysis of actors in all participant’s actor tables could reveal areas of overlap. What gaps, dilemmas and problem statements have you not considered only because they belong in another topic domain? Sharing your data Here is a link to my Actor Table in a spreadsheet. It’s hosted on Google Docs. Take a look, download it, investigate. If more course participants made their Actor Tables available in this form together we could analyze and compare our data as a group using Microsoft Excel, Python’s NLTK Toolkit, R’s text mining package or even Gehpi. If you would like to participate simply share your table as .csv file or share it in Google Docs (and one more thing [2]) “Realistically, what information can I gather from sharing my Actor Table?” Truthfully, to make such analysis really effective some cleanup of the data is necessary, even though the actor element is a category–which does a lot of this work for us. The problem is not everyone will use the same actor values. Some data might include punctuations or use different word orders which will disassociate some values which should be included together. This is to be expected, and is the harder part of this analysis called Identity Resolution or Identity Analytics. It’s were similar names are grouped together, such as ’employee’, ‘worker’, ‘hired-person’ to refer to the same actor. Let’s say, out of the course, 50 people were interested in this analysis and posted their actor tables as a spreadsheet (or .csv). Each person’s actor table, within the limiting context of the course assignment, could contain between 3 and 10 actors. This results in 150-500 rows in a combined spreadsheet. Taking the max value the question is, How many of those values are equivalent–refer to the same actor? And in this case, 500 rows is not so many values one person (myself) couldn’t manually sort and join as equivalent those actors who are the same. (Just thinking about the probability of having 50 people and 500 actors all in fact being different is exciting prospect–what a diverse ecosystem!) If nothing else, you could just look at my list! View the Google Docs spreadsheet. Maybe there’s some ideas there you didn’t thing about yet. Footnotes: [1] Another hashtag I’ve seen used on twitter is #SCP. Unfortunately this tag is also used in the topics spaces for PortugueseSoccer, Sporting_CP, Supercarphotographer and another twitter user who’s posts are displayed in Korean Hangul. [2] We also need to add a unique ID column to the table to differentiate contributed data. I’m looking into what online-tools we can use for free. Right now I’m thinking we can simply hash student user IDs. For example, a hash of my user ID `xtiansimon` using this online site, will produce this SHA-1 string: `9f2a27b2eea66551cb3440b0bc4c996d599448ea`. Then just DIY–add the same string to each row of your data. See my spreadsheet linked above. Check back here for more if you’re interested to participate. Tags: csv, raw-data Tweet