Cnn gmail hack

1/3/2023

Cnn gmail hack

Read Now

├── log_lifestyle log output of the script ├── lifestyle_urls.txt urls collected for scraping ├── lifestyle_7000.txt 7000 training samples for lifestyle ├── lifestyle.txt cleaned data for lifestyle │ ├── original_law.txt original scraped data for law │ ├── law_7000.txt 7000 training samples for law │ └── test_finance.txt test sample for finance │ ├── original_finance.txt Original scraped file │ ├── finance_urls.txt urls scraped for finance ├── finance From original scraped data and cleaned one │ ├── test_fashion.txt test data for python 1001 samples │ ├── fashion_original.txt Original scraped data ├── fashion From original scraped data and cleaned one │ └── lifestyle_7000.txt 7000 training data for class lifestyle │ ├── law_7000.txt 7000 training data for class law

│ ├── finance_7000.txt 7000 training data for class finance │ ├── fashion_7000.txt 7000 training data for class fashion ├── collect_url_data.py Python script that scrapes articles Raw_data/ Contains files related to train and test The folder structure and the data files description is as follows: Code for the same is uploaded in the Github. I have used Goose and BeautifulSoup to scrape the articles.

Scraping the data from the same source would be help in keeping the homogeneity in the articles. I would describe the files and the procedure I followed to get the data, train the model, test the model and the results.įirst, I went to the leading newspaper TheGuardian and looked for the labels i.e Finance, Law, Fashion, Lifestyle. I have used Denny Britz code for implementing the CNN( convolutional neural network). I had researched on text classification libraries and different approaches to solve this problem and decided to use CNN. In past, I had used NLTK and python to solve the above problem, but neural networks have proven to be more accurate when it comes to NLP.

0 Comments

Cnn gmail hack

Leave a Reply.

Author

Archives

Categories