Home |
Accounts |
Credentials |
Peers |
Projects |
Upload |
De-duplicate |
Cluster |
Tag Clouds |
View |
Browse |
Search |
Buckets |
Datasets |
Assign |
Notifications |
Toolbox |
Code |
Bookmarks |
Validate |
Report |
FAQ |
Service Levels |
Ideas for PCAT Improvements |
PCAT Wiki ToDo List |
ContactPCAT can cluster similar comments and highlight unique text in your dataset¶
Once you have identified the exact duplicate comments, the next step is to cluster similar comments and to automatically highlight the unique text, deletions and edits found in the clustered, modified form letters. First, click on the “Archive Menu” arrow at the right end of the original archive name.
Then, select “Generate Near-duplicate Clustering”.
PCAT will estimate the time required to perform the clustering operation. The user can set the clustering threshold on the drop down menu. The total number of near duplicate documents to be reviewed will not change. What will change is the number of clusters and the number of items in a particular cluster.
During the clustering operation, you can review progress inside the archive menu.
When the operation is complete, the user will get a notification a second, expandable sub-archive will appear with all the clusters. Click on the plus sign next to the “Similar Clusters” link to expand the cluster hierarchy.
Once the cluster hierarchy is revealed, you can browse and search the cluster. The cluster can also be assigned to a bucket or converted into a codeable dataset via the menu to the left of each cluster. When you navigate to a cluster, a small icon appears next to the active cluster. Click on this

icon to rename a cluster.
Most Frequently Asked Questions
Why would I use this system? |
Where do I get FDMS bulk downloads? |
Does PCAT identify duplicates? |
What is QDAP?
© 2009 - 2010
Qualitative Data Analysis Program (QDAP), in the
University Center for Social and Urban Research, at the
University of Pittsburgh, and
QDAP-UMass, in the
College of Social and Behavioral Sciences, at the
University of Massachusetts Amherst. As of 2010, PCAT and this PCAT Help Wiki are maintained and improved by personnel from
Texifter, LLC, which is a software start-up located in North Amherst & Springfield, MA and online at
http://texifter.com/.
Content on this website was made possible with the following grants from the National Science Foundation:
III-0705566 "Collaborative Research III-COR: From a Pile of Documents to a Collection of Information: A Framework for Multi-Dimensional Text Analysis" and
IIS-0429293 "Collaborative Research: Language Processing Technology for Electronic Rulemaking." We are also grateful for financial support from the U.S. Environmental Protection Agency and the U.S. Fish & Wildlife Service. **Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the National Science Foundation.**
==
==
Home |
Accounts |
Credentials |
Peers |
Projects |
Upload |
De-duplicate |
Cluster |
Tag Clouds |
View |
Browse |
Search |
Buckets |
Datasets |
Assign |
Notifications |
Toolbox |
Code |
Bookmarks |
Validate |
Report |
FAQ |
Service Levels |
Ideas for PCAT Improvements |
PCAT Wiki ToDo List |
Contact