* Add hyperlinks and paths validation. Signed-off-by: ZePan110 <ze.pan@intel.com> * Fix format issue. Signed-off-by: ZePan110 <ze.pan@intel.com> * Change runs-on Signed-off-by: ZePan110 <ze.pan@intel.com> * Add hyperlinks and paths validation. Signed-off-by: ZePan110 <ze.pan@intel.com> * Fix format issue. Signed-off-by: ZePan110 <ze.pan@intel.com> * Change runs-on Signed-off-by: ZePan110 <ze.pan@intel.com> * Change link head. Signed-off-by: ZePan110 <ze.pan@intel.com> * Fix issue. Signed-off-by: ZePan110 <ze.pan@intel.com> * Add output. Signed-off-by: ZePan110 <ze.pan@intel.com> * Change serch rules. Signed-off-by: ZePan110 <ze.pan@intel.com> * Change output and fix error Signed-off-by: ZePan110 <ze.pan@intel.com> * For test Signed-off-by: ZePan110 <ze.pan@intel.com> * Fix error Signed-off-by: ZePan110 <ze.pan@intel.com> * Fix error. Signed-off-by: ZePan110 <ze.pan@intel.com> * Fix error. Signed-off-by: ZePan110 <ze.pan@intel.com> * test. Signed-off-by: ZePan110 <ze.pan@intel.com> * Fix issue and add output Signed-off-by: ZePan110 <ze.pan@intel.com> * Fix issue and test Signed-off-by: ZePan110 <ze.pan@intel.com> * Add PR's own detection. Signed-off-by: ZePan110 <ze.pan@intel.com> * reduce output Signed-off-by: ZePan110 <ze.pan@intel.com> * Remove debug code. Signed-off-by: ZePan110 <ze.pan@intel.com> * test Signed-off-by: ZePan110 <ze.pan@intel.com> * test. Signed-off-by: ZePan110 <ze.pan@intel.com> * Compatible with the origin of PR. Signed-off-by: ZePan110 <ze.pan@intel.com> * Ignore links that require verification by a real person. Restore test files. Signed-off-by: ZePan110 <ze.pan@intel.com> * Change the judgment method. Signed-off-by: ZePan110 <ze.pan@intel.com> * Add need ignore link. Signed-off-by: ZePan110 <ze.pan@intel.com> * Change runs-on. Signed-off-by: ZePan110 <ze.pan@intel.com> * Redefine output. Signed-off-by: ZePan110 <ze.pan@intel.com> --------- Signed-off-by: ZePan110 <ze.pan@intel.com>
1.5 KiB
Dataprep Microservice
The Dataprep Microservice aims to preprocess the data from various sources (either structured or unstructured data) to text data, and convert the text data to embedding vectors then store them in the database.
Install Requirements
apt-get update
apt-get install libreoffice
Use LVM (Large Vision Model) for Summarizing Image Data
Occasionally unstructured data will contain image data, to convert the image data to the text data, LVM can be used to summarize the image. To leverage LVM, please refer to this readme to start the LVM microservice first and then set the below environment variable, before starting any dataprep microservice.
export SUMMARIZE_IMAGE_VIA_LVM=1
Dataprep Microservice with Redis
For details, please refer to this readme
Dataprep Microservice with Milvus
For details, please refer to this readme
Dataprep Microservice with Qdrant
For details, please refer to this readme
Dataprep Microservice with Pinecone
For details, please refer to this readme
Dataprep Microservice with PGVector
For details, please refer to this readme
Dataprep Microservice with VDMS
For details, please refer to this readme
Dataprep Microservice with Multimodal
For details, please refer to this readme