Home |
Accounts |
Credentials |
Peers |
Projects |
Upload |
De-duplicate |
Cluster |
Tag Clouds |
View |
Browse |
Search |
Buckets |
Datasets |
Assign |
Notifications |
Toolbox |
Code |
Bookmarks |
Validate |
Report |
FAQ |
Service Levels |
Ideas for PCAT Improvements |
PCAT Wiki ToDo List |
ContactPCAT can de-duplicate your public comment archive
PCAT contains a “one-click” methods for removing duplicates from an archive. Once you have uploaded an archive, the system will first process and then index the archive. This can take a few minutes, depending on the size of the archive. Smaller archives process and index very quickly. Once the processing is done, you will be able to browse the files in the archive. When the full-text indexing is done, you can search the archive. The system produces a notification in each case.
To remove duplicaticate with suspected duplicates, click on the small arrow in a circle to expose the “
Archive Menu”.
Select “Generate Exact Duplicates”.
A prompt will ask to confirm that you want to run the de-duplication algorithm.
Click OK. PCAT will issue a notification when the duplicate clustering is complete. This is a “before” picture of the small 997 item sample mercury archive.
Click on the "EPA Mercury Small Sample", a new sub-archive will appear with the title “De-duplicate Files” and the document count for the sub-archive in parentheses.
Most Frequently Asked Questions
Why would I use this system? |
Where do I get FDMS bulk downloads? |
Does PCAT identify duplicates? |
What is QDAP?
© 2009 - 2010
Qualitative Data Analysis Program (QDAP), in the
University Center for Social and Urban Research, at the
University of Pittsburgh, and
QDAP-UMass, in the
College of Social and Behavioral Sciences, at the
University of Massachusetts Amherst. As of 2010, PCAT and this PCAT Help Wiki are maintained and improved by personnel from
Texifter, LLC, which is a software start-up located in North Amherst & Springfield, MA and online at
http://texifter.com/.
Content on this website was made possible with the following grants from the National Science Foundation:
III-0705566 "Collaborative Research III-COR: From a Pile of Documents to a Collection of Information: A Framework for Multi-Dimensional Text Analysis" and
IIS-0429293 "Collaborative Research: Language Processing Technology for Electronic Rulemaking." We are also grateful for financial support from the U.S. Environmental Protection Agency and the U.S. Fish & Wildlife Service. **Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the National Science Foundation.**
==
==
Home |
Accounts |
Credentials |
Peers |
Projects |
Upload |
De-duplicate |
Cluster |
Tag Clouds |
View |
Browse |
Search |
Buckets |
Datasets |
Assign |
Notifications |
Toolbox |
Code |
Bookmarks |
Validate |
Report |
FAQ |
Service Levels |
Ideas for PCAT Improvements |
PCAT Wiki ToDo List |
Contact