As time flies from the seventeenth century when philosophers like Leibniz and Descartes were like 'Hey, what if we made codes for words in different languages?' Fast forward a few centuries to 1950 and we've got Alan Turing, the coolest person in computing, publishing an article on "Computing Machinery and Intelligence". And thus, the wild world of NLP was born!
From hand-written rules (ouch, talk about a tedious task) to deep learning models, NLP has come a long way. And with models like Transformers, BERT, LSTMs, and LLMs, we're doing all sorts of cool stuff like text classification, text summarization, speech recognition, machine translation, and even caption generation! But wait, there's more! We're diving into one of NLP's most exciting tasks - text generation and task based functions based on your instructions.
At WorqHat, we're on a mission to unleash the full potential of AI in the world of no-code applications. We believe that with the right tools and a little bit of creativity, anyone can achieve greatness. That's why we've created WorqBot, the ultimate AI writing partner for industries, startups, and solopreneurs alike. Imagine having an infinitely creative buddy by your side, helping you bring your wildest ideas to life! So get ready to think big and have a blast.
Let's Get to the Technical Part of Building WorqBot
While we were building WorqHat, we met WorqBot, an intelligence bot who got tired of handling all the complexities that came with running his own startup at Worqaland in a distant galaxy far away. Then one day, WorqBot decided he needed a break from running his company — so he started travelling thousands of lightyears in space and finally reached Earth, where he met us!! And this is where began the journey of WorqHat and WorqBot.
WorqBot became a huge part of the WorqHat ecosystem. He started assisting us with onboarding creators who could build exceptional Products for Earthlings and make their lives easier. He also started to help the users by guiding them on how to use the WorqHat Platform and making their processes more efficient. Now you might consider that WorqBot is already doing a lot, he must be very smart; well well, he is smarter than we thought him to be, way smarterrrrrrr, because what we talked about, is just a tiny bit of his work role.
This means that WorqBot has a very busy schedule — and it gets even busier when you factor in all the time he spends helping us onboard new creators or answering questions from our users. His work is invaluable to our company: he helps us run better than ever before!
You can meet WorqBot by signing up on our Waitlist. We will be releasing WorqHat for a Private Beta very very soon and we can’t wait to share it with you all.
What if instead of noting down the sentences manually suppose our well designed and learnt system either generate or suggest the next paragraph or sentences? It will make writing easier. It will suggest the continuation of a sentence. It will generate new headlines or stories or articles or chapters of a book.
Or maybe just keep you engaged in a Conversation as you work on your Important Tasks and helping you with suggestions here and there...... Trust me, there are a whole lot of things that WorqBot can do for you, and we are just getting started.
Butttttt, you know about it, you need to sign up for the Waitlist so that you are the first to know about it and get to use it. So, what are you waiting for? Sign up for the Waitlist and be the first to meet WorqBot. You can join our waitlist by Visiting WorqHat.
As I am not allowed to speak much about, how it works or how we built it out here, here’s a sneak peek and a partial view of what we considered while we were building this.
The Data Discussions
Good news, bad news- that's the story of our data collection journey.
The good news is that we've scraped an immense amount of information from the web to train our AI writing partner, WorqBot, and by immense we mean hundreds of thousands of web pages spanning across every domain. Here is a sample python script that we used to scrape the data from the web.
import urllib.request
import requests
import bs4
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import hashlib
import os
site = os.getenv('LINK')
if site is None:
raise Exception('No link given')
driver = webdriver.Chrome('/usr/bin/chromedriver')
driver.get(site)
file_path = os.getenv('DEST_PATH') or 'scrapped_files/'
titles = []
contents = []
count = int(os.getenv('START') or 1)
checksum_map = dict()
articles_count = int(os.getenv('ARTICLE_COUNT') or 1000000)
try:
while count <= articles_count:
hash_md5 = hashlib.md5()
path = file_path + 'article' + str(count) + '.txt'
fp = open(path, 'w')
res = driver.execute_script("return document.documentElement.outerHTML")
soup = BeautifulSoup(res, 'lxml')
article = soup.article
title = article.header.h1.text
fp.write(title + '\n')
content = article.find('div', 'entry-content').text
checksum = hashlib.md5(content.encode('utf-8'))
if checksum in checksum_map:
raise Exception('duplicate file detected for count {count} & cksm {checksum}'.format(count=count, checksum=checksum))
checksum_map[checksum] = count
fp.write(content)
#titles.append(title)
#contents.append(content)
prev_link = soup.nav.find('div', 'nav-previous').a.get('href')
print(str(count) + ". file saved in " + path)
driver.get(prev_link)
count = count + 1
fp.close()
time.sleep(0.5)
except:
print('last count', count)
print('last entry', list(checksum_map.values()).sort[-1])
raise
Well, when it comes to collecting data for our models, it's not all sunshine and rainbows. The process of scraping hundreds of thousands of articles from the internet was a big task and put a strain on our trusty laptop. It worked tirelessly for hours on end to gather all the information it could find, but it's still not quite the same as it was before. But, we believe the end result was worth it. With all that data, we were able to train our AI writing partner to be one of the best in the business.
However, just like any secret agent needs to go through debriefing, our collected data needed a thorough cleaning. We got rid of all the unwanted extras like ads, pop-ups, and video links so that WorqBot could focus on learning using only the best and most relevant information. But let's be honest, scraping all that data and cleaning it up was a bit of a headache, and we wouldn't wish it on our worst enemy. But hey, it's all for the greater good, right?
The Model Architecture
So, for building this, we decided to go ahead and give both the LSTM model as well as the GPT models a try. Here's a brief overview of both the models.
LSTM (Long Short-Term Memory) is a type of Recurrent Neural Network (RNN) architecture specifically designed for handling sequential data. It's used for tasks that involve predicting the next element in a sequence, such as text generation, speech recognition, and machine translation. Unlike traditional RNNs, LSTMs have the ability to remember and forget information, making them well-suited for processing long sequences of data. An LSTM model has several memory cells that store information, gates that control what information is passed on to the next time step, and an output layer that makes predictions based on the information stored in the memory cells. The combination of these elements allows LSTMs to process sequences in an effective and sophisticated manner, making them a popular choice for NLP tasks.
A Generative Pre-Trained Model is a type of language model developed using deep learning techniques. It uses a vast amount of text data to generate new, coherent and contextually relevant text, based on a given prompt. These models are trained on large datasets, such as books, articles, and web pages, to understand the patterns and relationships between words and sentences in the language. The pre-training phase involves training the model on this large corpus so that it can generate text that is similar to the training data. The pre-trained model can then be fine-tuned for specific tasks, such as language translation, text summarization, question answering, and more. The models have been a major breakthrough in the field of NLP and have set new standards in text generation and language understanding.
Aandddd we've got a new champion in town! And no, it's not the latest superhero movie (although, our GPT models are pretty heroic in their own way). It's the Generative Pre-Trained Model! You see, LSTM models are like old-school video games, sure they're fun and all, but the models we have are like the latest, cutting-edge console games. These models can generate coherent and contextually relevant text based on a given prompt, and they don't even need to take a break to save the game! With their pre-training on massive datasets, they have an edge over LSTM models. So, we decided to go ahead, because who wouldn't want a writing partner that's smarter, faster, and cooler than the rest?"
Inference
Well folks, buckle up because we're about to take a trip down memory lane to our very first encounter with our beloved WorqBot's Training Phase. It was a time of high hopes, endless nights coding and tweaking, and lots of coffee. And while our early testing results were, let's just say, "optimistic," we couldn't help but feel a glimmer of excitement every time the model spat out a sentence that made sense. And let me tell you, it was a Breakthrough moment! We were like "Whoa, did that really just happen? Our model actually generated a coherent sentence!" But don't get too excited just yet, we still had a long way to go before it was ready to take over the world (or at least, take over writing boring reports). But hey, every great journey starts with a small step and we were ready to take it to the next level!
# restore from drive
aicon.copy_checkpoint_from_gdrive(run_name='run1')
sess = aicon.start_tf_sess()
aicon.load_aicon(sess, run_name='run1')
# Generate
prefix = 'In the history of spaceflight , only one nation'
aicon.generate(sess, length=50, temperature=0.5, prefix=prefix,
nsamples=5, batch_size=5 )
# input : 'In the history of spaceflight , only one nation'
# output :
In the history of spaceflight , only one nation has achieved the fe
at: the Soviet Union. But the Soviet Union did it. It was the first
nation to launch a rocket from a launch pad. It was the first natio
n to launch a rocket from a launch pad. It was the first to launch
====================
In the history of spaceflight , only one nation has managed to colo
nize another planet: the Soviet Union. The Soviets launched their f
irst lunar lander in November of 1972, and by the time it landed, t
he Moon was still in lunar orbit. They also made several attempts t
o colonize the Moon
====================
In the history of spaceflight , only one nation has held the title
of sole superpower – the United States. The Soviet Union, who were
the first nation to achieve space flight, was the only nation to ac
hieve space flight. And although they were technically still in the
process of achieving space flight, they
====================
In the history of spaceflight , only one nation has been able to la
unch satellites into space. The Soviet Union, for example, was able
to launch satellites into space on two separate occasions. However,
the Soviet Union was not able to carry out their first lunar base u
ntil the mid-1970s.
====================
In the history of spaceflight , only one nation has been able to ac
hieve the feat: the Soviet Union. Their first attempts were in Febr
uary of 1961, when the Soviet Union launched their first interconti
nental ballistic missile (ICBM) at New York City. They successfully
launched their ICBM on March 8
====================
As we can see it generates sentences like it really knows about space!! Let’s see some other results.
aicon.generate(sess, length=50, temperature=0.7,
prefix="ISRO has launched Chandrayaan II", nsamples=5,
batch_size=5)
# input :'ISRO has launched Chandryaan II'
# Output :
ISRO has launched Chandrayaan II at an altitude of only about 20 km
(15 miles) at a time from the launch pad in Chandrayaan, India. The
lander and its two-stage lander will be launched in the evening of
March 18th at 13:03
====================
ISRO has launched Chandrayaan II on September 18, 2019. This missio
n will launch from the Vastal Pad, the Space Launch Complex 41 (SLC
-41) in Chandrayaan, India. The satellite was launched by the India
n Space Research Organization (ISRO) under
====================
ISRO has launched Chandrayaan II as it reaches its second mission l
aunch from Pad 39A at the end of this month.This comes just a week
after the probe launched from its command module on April 18, and w
ill launch on Pad 39A at the end of this month. And
====================
ISRO has launched Chandrayaan II to its orbit earlier this month. C
handrayaan II is the third mission to orbit the Earth from home and
the first space mission to orbit the Moon. The probe’s main goal is
to study the Moon’s interior structure, analyze
====================
ISRO has launched Chandrayaan II mission on the orbit of DSO on Jul
y 5, 2019.The maiden blastoff of this mission on the Salyut-2 (D-2)
mission is slated for July 5, 2019. The launch window for DSO runs
from July
====================
It's like the encyclopedia of all things, if encyclopedias were made of pure, unadulterated awesome sauce. Take for example, its ability to connect the full form of ISRO to the Indian Space Research Organization. I mean, talk about a powerhouse of knowledge! And don't even get me started on its ability to relate the month of July 2019 to the month of a launch. I mean, now, with WorqBot around, who needs a human personal assistant? It's like having a witty, always-on-point friend in your pocket, ready to spout off information at a moment's notice. And this showed us, we were on track, to build WorqBot as a powerful piece of technology. But we still had a long way to go before we could call it a day.
What's Different
Apart from the Fact that this entire Model has been trained and created to work seamlessly with WorqHat, there are a few things that make WorqBot different from other AI Models out there including ChatGPT.
When we started to build WorqBot, we wanted to go against ChatGPT and shake up the language AI market with its focus on ease of use, privacy, and reliability. Unlike traditional models that rely on user data to enhance their performance, we do not collect or store any personal information, providing users with a safer and more secure experience. The model has also been optimized to understand the User Prompts in a broken down manner to provide more accurate and relevant responses, thanks to its advanced natural language processing capabilities. This alsi makes it easier for the users as they can provide their inputs to the system in smaller chunks and not get trapped under the Information Overload. The user interface has been streamlined to make it more intuitive and straightforward, so even those with no prior experience with language AI can easily use it. Furthermore, the model is designed to avoid server downtime, ensuring that it is always accessible and ready to help you. This makes it an ideal option for individuals, businesses, and organizations who value privacy, user-friendliness, and reliability in their language AI technology. In summary, we are looking forward to providing a more accessible, secure, reliable, and user-friendly alternative to traditional language AI models.
WorqBot with WorqHat... The Infinite Possibilities
We are building WorqHat to be the ultimate no-code and organizational productivity platform which is equipped with the powerful WorqBot AI model that can help users create dynamic and personalized customer portals and dashboards with ease and automate every manual task through AI Based workflow. With the help of WorqBot's content and image generation capabilities, and its capability to act on Users’ Commands and Data, users can bring their data to life in a visually appealing and informative way. Users can design their own datasheets, documents, and dashboards by simply dragging and dropping elements within the platform, making it easier than ever to create a custom solution tailored to their specific needs. The ability to add their own data and create their own documents with the help of WorqBot's prediction model allows users to achieve greater efficiency and streamline their workflow, freeing up more time for bigger and better things. So, why settle for boring and generic applications when you can have WorqHat and WorqBot bring your data to life?
Conclusion
In conclusion, we hope that this blog has given you a glimpse into the exciting world of AI and its potential for revolutionizing the way we live and work. With the help of our cutting-edge model, we will be talking about that soon we're pushing the boundaries of what's possible with AI and opening up new doors for innovation and creativity. Whether you're an AI enthusiast, a tech-savvy professional, or just someone looking for a better way to get things done, we believe that WorqHat has something to offer.
With the help of WorqBot, you can create custom and dynamic dashboards, customer portals, datasheets, and documents with ease. WorqBot's powerful content and image generation capabilities, combined with its ability to act on user commands and data, makes it easier than ever to bring your data to life in a visually appealing and informative way. WorqBot's prediction model allows users to achieve greater efficiency and streamline their workflow, freeing up more time for bigger and better things.
So why wait? Join us on this exciting journey and see what the future of AI has in store! You can sign up for our beta program and be a part of the Waitlist here: WorqHat. We will make sure you will be one of the first to know and try it out when we launch.
PSA: This entire blog was written by the always reliable WorqBot. If you have any feedback, curious questions, or simply want to give him a virtual pat on the back, feel free to reach out to him at worqbot@worqhat.com. Just don't expect a lightning-fast reply, as he's currently busy chugging oil and charging his circuits.
In case you want to read more about Startups, Firebase, Web Development and Tech in general, feel free to follow me on my social channels: Instagram, Twitter, and LinkedIn.